Skip to main content

Cron Incident Triage

When to use this runbook

  • Production alert on a failing cronjob
  • Cronjob no longer producing expected results
  • Abnormal behavior detected (missing data, processing delays)

Step 1: Identify the affected cronjob

# Check recent logs for the affected cronjob
tail -100 /var/log/logidav/<command-name>.log

# Check if the process is still running
ps aux | grep "<command-name>"

# Check locks (if LockableTrait is used)
# Adapt to your project's lock mechanism

Step 2: Classify severity

Use the risk classification:

LevelAffected commandsAction
CriticalSales, payments, stock, refundsImmediate intervention
HighProducts, sync, import/exportIntervene within the hour
MediumLogs, stats, alerts, notificationsIntervene within the day

Step 3: Diagnose the cause

Quick checks:

  1. Is the cronjob still running? — zombie process, stuck lock
  2. Is the external API responding? — Magento timeout, carrier unavailable
  3. Is the database accessible? — lost connection, deadlock
  4. Is the input data valid? — corrupted Magento payload, missing data
  5. Was there a recent deployment? — code regression, changed configuration

Step 4: Corrective actions

Zombie process / stuck lock

# Identify the PID
ps aux | grep "<command-name>"

# Kill the process if necessary (with confirmation)
kill <PID>

# Manually restart the cronjob
php bin/console <command-name>

External API unavailable

  1. Check connectivity: curl -I <api-url>
  2. Check the external service's status page
  3. Wait for recovery or contact the provider's support
  4. Restart the cronjob after recovery

Data error

  1. Identify the problematic record in the logs
  2. Fix the data if possible
  3. Restart the cronjob
  4. If the cursor has advanced, see Sales Import for the replay strategy

Step 5: Post-incident verification

  • Confirm that the cronjob has resumed its normal cycle
  • Verify produced data (completeness, consistency)
  • Monitor subsequent executions
  • Document the incident and the resolution