Update Docker configuration, enhance error handling, and improve logging
- Added health check to the camera management API service in docker-compose.yml for better container reliability. - Updated installation scripts in Dockerfile to check for existing dependencies before installation, improving efficiency. - Enhanced error handling in the USDAVisionSystem class to allow partial operation if some components fail to start, preventing immediate shutdown. - Improved logging throughout the application, including more detailed error messages and critical error handling in the main loop. - Refactored WebSocketManager and CameraMonitor classes to use debug logging for connection events, reducing log noise.
This commit is contained in:
141
docs/CONTAINER_CRASH_DEBUGGING.md
Normal file
141
docs/CONTAINER_CRASH_DEBUGGING.md
Normal file
@@ -0,0 +1,141 @@
|
||||
# Container Crash Debugging Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This guide helps diagnose and fix crashes in the `usda-vision-api` container.
|
||||
|
||||
## Quick Diagnostic
|
||||
|
||||
Run the diagnostic script:
|
||||
```bash
|
||||
./scripts/diagnose_container_crashes.sh
|
||||
```
|
||||
|
||||
## Common Causes of Crashes
|
||||
|
||||
### 1. MQTT Connection Failure
|
||||
**Symptoms:** Container exits immediately after startup
|
||||
**Check:**
|
||||
```bash
|
||||
docker logs usda-vision-api | grep -i mqtt
|
||||
```
|
||||
**Fix:** Ensure MQTT broker is accessible at `192.168.1.110:1883`
|
||||
|
||||
### 2. Camera SDK Initialization Failure
|
||||
**Symptoms:** Container crashes during camera initialization
|
||||
**Check:**
|
||||
```bash
|
||||
docker logs usda-vision-api | grep -i "camera\|sdk"
|
||||
```
|
||||
**Fix:** Check camera hardware connection and SDK installation
|
||||
|
||||
### 3. Storage Path Issues
|
||||
**Symptoms:** Container fails to start storage manager
|
||||
**Check:**
|
||||
```bash
|
||||
docker exec usda-vision-api ls -la /mnt/nfs_share
|
||||
```
|
||||
**Fix:** Ensure `/mnt/nfs_share` is mounted and writable
|
||||
|
||||
### 4. Out of Memory (OOM)
|
||||
**Symptoms:** Container killed by system, exit code 137
|
||||
**Check:**
|
||||
```bash
|
||||
dmesg | grep -i "killed process"
|
||||
docker stats usda-vision-api
|
||||
```
|
||||
**Fix:** Add memory limits or increase available memory
|
||||
|
||||
### 5. Missing Configuration File
|
||||
**Symptoms:** Container starts but exits quickly
|
||||
**Check:**
|
||||
```bash
|
||||
docker exec usda-vision-api cat /app/config.compose.json
|
||||
```
|
||||
**Fix:** Ensure `config.compose.json` exists in the container
|
||||
|
||||
### 6. Python Exception Not Caught
|
||||
**Symptoms:** Container exits with Python traceback
|
||||
**Check:**
|
||||
```bash
|
||||
docker logs usda-vision-api --tail 100
|
||||
```
|
||||
**Fix:** Check logs for unhandled exceptions
|
||||
|
||||
## Recent Improvements
|
||||
|
||||
### Enhanced Error Handling
|
||||
- System now continues running even if some non-critical components fail
|
||||
- Added consecutive error tracking to prevent infinite crash loops
|
||||
- Better logging with full exception traces
|
||||
|
||||
### Improved Startup Command
|
||||
- Only installs packages if missing (faster startup)
|
||||
- Better error messages during startup
|
||||
- Graceful exit with delay for log flushing
|
||||
|
||||
### Health Checks
|
||||
- Added Docker healthcheck to monitor container health
|
||||
- Health endpoint: `http://localhost:8000/health`
|
||||
|
||||
## Debugging Steps
|
||||
|
||||
1. **Check container status:**
|
||||
```bash
|
||||
docker ps -a | grep usda-vision-api
|
||||
```
|
||||
|
||||
2. **View recent logs:**
|
||||
```bash
|
||||
docker logs usda-vision-api --tail 100 -f
|
||||
```
|
||||
|
||||
3. **Check exit code:**
|
||||
```bash
|
||||
docker inspect usda-vision-api --format='{{.State.ExitCode}}'
|
||||
```
|
||||
- `0` = Normal exit
|
||||
- `1` = Application error
|
||||
- `137` = Killed (usually OOM)
|
||||
|
||||
4. **Check restart count:**
|
||||
```bash
|
||||
docker inspect usda-vision-api --format='{{.RestartCount}}'
|
||||
```
|
||||
|
||||
5. **Run with debug logging:**
|
||||
Edit `docker-compose.yml` and change the command to:
|
||||
```yaml
|
||||
python main.py --config config.compose.json --debug --verbose
|
||||
```
|
||||
|
||||
6. **Check resource usage:**
|
||||
```bash
|
||||
docker stats usda-vision-api
|
||||
```
|
||||
|
||||
## Manual Testing
|
||||
|
||||
To test the container manually:
|
||||
```bash
|
||||
docker exec -it usda-vision-api bash
|
||||
python main.py --config config.compose.json --debug
|
||||
```
|
||||
|
||||
## Prevention
|
||||
|
||||
The container now has:
|
||||
- ✅ Automatic restart policy (`restart: unless-stopped`)
|
||||
- ✅ Health checks
|
||||
- ✅ Better error handling
|
||||
- ✅ Graceful shutdown on signals
|
||||
- ✅ Partial operation if some components fail
|
||||
|
||||
## Getting Help
|
||||
|
||||
If crashes persist:
|
||||
1. Run the diagnostic script
|
||||
2. Collect logs: `docker logs usda-vision-api > crash_logs.txt`
|
||||
3. Check system resources: `docker stats usda-vision-api`
|
||||
4. Review recent changes to configuration or code
|
||||
|
||||
Reference in New Issue
Block a user