- Added health check to the camera management API service in docker-compose.yml for better container reliability. - Updated installation scripts in Dockerfile to check for existing dependencies before installation, improving efficiency. - Enhanced error handling in the USDAVisionSystem class to allow partial operation if some components fail to start, preventing immediate shutdown. - Improved logging throughout the application, including more detailed error messages and critical error handling in the main loop. - Refactored WebSocketManager and CameraMonitor classes to use debug logging for connection events, reducing log noise.
3.4 KiB
Container Crash Debugging Guide
Overview
This guide helps diagnose and fix crashes in the usda-vision-api container.
Quick Diagnostic
Run the diagnostic script:
./scripts/diagnose_container_crashes.sh
Common Causes of Crashes
1. MQTT Connection Failure
Symptoms: Container exits immediately after startup Check:
docker logs usda-vision-api | grep -i mqtt
Fix: Ensure MQTT broker is accessible at 192.168.1.110:1883
2. Camera SDK Initialization Failure
Symptoms: Container crashes during camera initialization Check:
docker logs usda-vision-api | grep -i "camera\|sdk"
Fix: Check camera hardware connection and SDK installation
3. Storage Path Issues
Symptoms: Container fails to start storage manager Check:
docker exec usda-vision-api ls -la /mnt/nfs_share
Fix: Ensure /mnt/nfs_share is mounted and writable
4. Out of Memory (OOM)
Symptoms: Container killed by system, exit code 137 Check:
dmesg | grep -i "killed process"
docker stats usda-vision-api
Fix: Add memory limits or increase available memory
5. Missing Configuration File
Symptoms: Container starts but exits quickly Check:
docker exec usda-vision-api cat /app/config.compose.json
Fix: Ensure config.compose.json exists in the container
6. Python Exception Not Caught
Symptoms: Container exits with Python traceback Check:
docker logs usda-vision-api --tail 100
Fix: Check logs for unhandled exceptions
Recent Improvements
Enhanced Error Handling
- System now continues running even if some non-critical components fail
- Added consecutive error tracking to prevent infinite crash loops
- Better logging with full exception traces
Improved Startup Command
- Only installs packages if missing (faster startup)
- Better error messages during startup
- Graceful exit with delay for log flushing
Health Checks
- Added Docker healthcheck to monitor container health
- Health endpoint:
http://localhost:8000/health
Debugging Steps
-
Check container status:
docker ps -a | grep usda-vision-api -
View recent logs:
docker logs usda-vision-api --tail 100 -f -
Check exit code:
docker inspect usda-vision-api --format='{{.State.ExitCode}}'0= Normal exit1= Application error137= Killed (usually OOM)
-
Check restart count:
docker inspect usda-vision-api --format='{{.RestartCount}}' -
Run with debug logging: Edit
docker-compose.ymland change the command to:python main.py --config config.compose.json --debug --verbose -
Check resource usage:
docker stats usda-vision-api
Manual Testing
To test the container manually:
docker exec -it usda-vision-api bash
python main.py --config config.compose.json --debug
Prevention
The container now has:
- ✅ Automatic restart policy (
restart: unless-stopped) - ✅ Health checks
- ✅ Better error handling
- ✅ Graceful shutdown on signals
- ✅ Partial operation if some components fail
Getting Help
If crashes persist:
- Run the diagnostic script
- Collect logs:
docker logs usda-vision-api > crash_logs.txt - Check system resources:
docker stats usda-vision-api - Review recent changes to configuration or code