Skip to main content

Production Health Checks

This runbook lists routine checks to confirm the system is healthy.

Daily checks

Cronjob status

# Verify that critical cronjobs have run recently
ls -lt /var/log/logidav/ | head -20

# Check for recent errors
grep -i "error\|exception\|fatal" /var/log/logidav/*.log | tail -50

Queue status

# Check queue state
mysql -e "SELECT COUNT(*), status FROM queue_table GROUP BY status;"

# Verify the processor is running
ps aux | grep "meduse:queue:processor"

Disk space

# Check available space
df -h

# Check log size
du -sh /var/log/logidav/

Weekly checks

Cronjob coverage

# Check freshness of cronjob documentation
# Files in docs/cronjobs/ should have been updated recently

Performance

# Check execution times of critical cronjobs
# Look for abnormally long executions in the logs

Cleanup

# Rotate logs if needed
# Clean up old label PDFs
find /path/to/web/pdf_dpd/ -name "*.pdf" -mtime +30 -ls

Monthly checks

  • Review documentation for critical cronjobs
  • Check certificates and credentials for external APIs
  • Monitor database size
  • Update the system crontab if needed

Quick checklist

CheckCommandAcceptable threshold
Active cron processesps aux | grep bin/consoleAll critical cronjobs visible
QueueSELECT COUNT(*) FROM queue WHERE status='error'0 errors
Queue processorps aux | grep queue:processor1 active process
Disk spacedf -h> 20% free
Error logsgrep -c error *.logNo sudden spike