# Container Crash Debugging Guide ## Overview This guide helps diagnose and fix crashes in the `usda-vision-api` container. ## Quick Diagnostic Run the diagnostic script: ```bash ./scripts/diagnose_container_crashes.sh ``` ## Common Causes of Crashes ### 1. MQTT Connection Failure **Symptoms:** Container exits immediately after startup **Check:** ```bash docker logs usda-vision-api | grep -i mqtt ``` **Fix:** Ensure MQTT broker is accessible at `192.168.1.110:1883` ### 2. Camera SDK Initialization Failure **Symptoms:** Container crashes during camera initialization **Check:** ```bash docker logs usda-vision-api | grep -i "camera\|sdk" ``` **Fix:** Check camera hardware connection and SDK installation ### 3. Storage Path Issues **Symptoms:** Container fails to start storage manager **Check:** ```bash docker exec usda-vision-api ls -la /mnt/nfs_share ``` **Fix:** Ensure `/mnt/nfs_share` is mounted and writable ### 4. Out of Memory (OOM) **Symptoms:** Container killed by system, exit code 137 **Check:** ```bash dmesg | grep -i "killed process" docker stats usda-vision-api ``` **Fix:** Add memory limits or increase available memory ### 5. Missing Configuration File **Symptoms:** Container starts but exits quickly **Check:** ```bash docker exec usda-vision-api cat /app/config.compose.json ``` **Fix:** Ensure `config.compose.json` exists in the container ### 6. Python Exception Not Caught **Symptoms:** Container exits with Python traceback **Check:** ```bash docker logs usda-vision-api --tail 100 ``` **Fix:** Check logs for unhandled exceptions ## Recent Improvements ### Enhanced Error Handling - System now continues running even if some non-critical components fail - Added consecutive error tracking to prevent infinite crash loops - Better logging with full exception traces ### Improved Startup Command - Only installs packages if missing (faster startup) - Better error messages during startup - Graceful exit with delay for log flushing ### Health Checks - Added Docker healthcheck to monitor container health - Health endpoint: `http://localhost:8000/health` ## Debugging Steps 1. **Check container status:** ```bash docker ps -a | grep usda-vision-api ``` 2. **View recent logs:** ```bash docker logs usda-vision-api --tail 100 -f ``` 3. **Check exit code:** ```bash docker inspect usda-vision-api --format='{{.State.ExitCode}}' ``` - `0` = Normal exit - `1` = Application error - `137` = Killed (usually OOM) 4. **Check restart count:** ```bash docker inspect usda-vision-api --format='{{.RestartCount}}' ``` 5. **Run with debug logging:** Edit `docker-compose.yml` and change the command to: ```yaml python main.py --config config.compose.json --debug --verbose ``` 6. **Check resource usage:** ```bash docker stats usda-vision-api ``` ## Manual Testing To test the container manually: ```bash docker exec -it usda-vision-api bash python main.py --config config.compose.json --debug ``` ## Prevention The container now has: - ✅ Automatic restart policy (`restart: unless-stopped`) - ✅ Health checks - ✅ Better error handling - ✅ Graceful shutdown on signals - ✅ Partial operation if some components fail ## Getting Help If crashes persist: 1. Run the diagnostic script 2. Collect logs: `docker logs usda-vision-api > crash_logs.txt` 3. Check system resources: `docker stats usda-vision-api` 4. Review recent changes to configuration or code