Cron Incident Triage
When to use this runbook
- Production alert on a failing cronjob
- Cronjob no longer producing expected results
- Abnormal behavior detected (missing data, processing delays)
Step 1: Identify the affected cronjob
# Check recent logs for the affected cronjob
tail -100 /var/log/logidav/<command-name>.log
# Check if the process is still running
ps aux | grep "<command-name>"
# Check locks (if LockableTrait is used)
# Adapt to your project's lock mechanism
Step 2: Classify severity
Use the risk classification:
| Level | Affected commands | Action |
|---|---|---|
| Critical | Sales, payments, stock, refunds | Immediate intervention |
| High | Products, sync, import/export | Intervene within the hour |
| Medium | Logs, stats, alerts, notifications | Intervene within the day |
Step 3: Diagnose the cause
Quick checks:
- Is the cronjob still running? — zombie process, stuck lock
- Is the external API responding? — Magento timeout, carrier unavailable
- Is the database accessible? — lost connection, deadlock
- Is the input data valid? — corrupted Magento payload, missing data
- Was there a recent deployment? — code regression, changed configuration
Step 4: Corrective actions
Zombie process / stuck lock
# Identify the PID
ps aux | grep "<command-name>"
# Kill the process if necessary (with confirmation)
kill <PID>
# Manually restart the cronjob
php bin/console <command-name>
External API unavailable
- Check connectivity:
curl -I <api-url> - Check the external service's status page
- Wait for recovery or contact the provider's support
- Restart the cronjob after recovery
Data error
- Identify the problematic record in the logs
- Fix the data if possible
- Restart the cronjob
- If the cursor has advanced, see Sales Import for the replay strategy
Step 5: Post-incident verification
- Confirm that the cronjob has resumed its normal cycle
- Verify produced data (completeness, consistency)
- Monitor subsequent executions
- Document the incident and the resolution