Production Health Checks
This runbook lists routine checks to confirm the system is healthy.
Daily checks
Cronjob status
# Verify that critical cronjobs have run recently
ls -lt /var/log/logidav/ | head -20
# Check for recent errors
grep -i "error\|exception\|fatal" /var/log/logidav/*.log | tail -50
Queue status
# Check queue state
mysql -e "SELECT COUNT(*), status FROM queue_table GROUP BY status;"
# Verify the processor is running
ps aux | grep "meduse:queue:processor"
Disk space
# Check available space
df -h
# Check log size
du -sh /var/log/logidav/
Weekly checks
Cronjob coverage
# Check freshness of cronjob documentation
# Files in docs/cronjobs/ should have been updated recently
Performance
# Check execution times of critical cronjobs
# Look for abnormally long executions in the logs
Cleanup
# Rotate logs if needed
# Clean up old label PDFs
find /path/to/web/pdf_dpd/ -name "*.pdf" -mtime +30 -ls
Monthly checks
- Review documentation for critical cronjobs
- Check certificates and credentials for external APIs
- Monitor database size
- Update the system crontab if needed
Quick checklist
| Check | Command | Acceptable threshold |
|---|---|---|
| Active cron processes | ps aux | grep bin/console | All critical cronjobs visible |
| Queue | SELECT COUNT(*) FROM queue WHERE status='error' | 0 errors |
| Queue processor | ps aux | grep queue:processor | 1 active process |
| Disk space | df -h | > 20% free |
| Error logs | grep -c error *.log | No sudden spike |