Cron Incident Triage

When to use this runbook

Production alert on a failing cronjob
Cronjob no longer producing expected results
Abnormal behavior detected (missing data, processing delays)

Step 1: Identify the affected cronjob

# Check recent logs for the affected cronjob
tail -100 /var/log/logidav/<command-name>.log

# Check if the process is still running
ps aux | grep "<command-name>"

# Check locks (if LockableTrait is used)
# Adapt to your project's lock mechanism

Step 2: Classify severity

Use the risk classification:

Level	Affected commands	Action
Critical	Sales, payments, stock, refunds	Immediate intervention
High	Products, sync, import/export	Intervene within the hour
Medium	Logs, stats, alerts, notifications	Intervene within the day

Step 3: Diagnose the cause

Quick checks:

Is the cronjob still running? — zombie process, stuck lock
Is the external API responding? — Magento timeout, carrier unavailable
Is the database accessible? — lost connection, deadlock
Is the input data valid? — corrupted Magento payload, missing data
Was there a recent deployment? — code regression, changed configuration

Step 4: Corrective actions

Zombie process / stuck lock

# Identify the PID
ps aux | grep "<command-name>"

# Kill the process if necessary (with confirmation)
kill <PID>

# Manually restart the cronjob
php bin/console <command-name>

External API unavailable

Check connectivity: curl -I <api-url>
Check the external service's status page
Wait for recovery or contact the provider's support
Restart the cronjob after recovery

Data error

Identify the problematic record in the logs
Fix the data if possible
Restart the cronjob
If the cursor has advanced, see Sales Import for the replay strategy

Step 5: Post-incident verification

Confirm that the cronjob has resumed its normal cycle
Verify produced data (completeness, consistency)
Monitor subsequent executions
Document the incident and the resolution

When to use this runbook​

Step 1: Identify the affected cronjob​

Step 2: Classify severity​

Step 3: Diagnose the cause​

Step 4: Corrective actions​

Zombie process / stuck lock​

External API unavailable​

Data error​

Step 5: Post-incident verification​

When to use this runbook

Step 1: Identify the affected cronjob

Step 2: Classify severity

Step 3: Diagnose the cause

Step 4: Corrective actions

Zombie process / stuck lock

External API unavailable

Data error

Step 5: Post-incident verification