- Added health check to the camera management API service in docker-compose.yml for better container reliability. - Updated installation scripts in Dockerfile to check for existing dependencies before installation, improving efficiency. - Enhanced error handling in the USDAVisionSystem class to allow partial operation if some components fail to start, preventing immediate shutdown. - Improved logging throughout the application, including more detailed error messages and critical error handling in the main loop. - Refactored WebSocketManager and CameraMonitor classes to use debug logging for connection events, reducing log noise.
142 lines
3.4 KiB
Markdown
142 lines
3.4 KiB
Markdown
# Container Crash Debugging Guide
|
|
|
|
## Overview
|
|
|
|
This guide helps diagnose and fix crashes in the `usda-vision-api` container.
|
|
|
|
## Quick Diagnostic
|
|
|
|
Run the diagnostic script:
|
|
```bash
|
|
./scripts/diagnose_container_crashes.sh
|
|
```
|
|
|
|
## Common Causes of Crashes
|
|
|
|
### 1. MQTT Connection Failure
|
|
**Symptoms:** Container exits immediately after startup
|
|
**Check:**
|
|
```bash
|
|
docker logs usda-vision-api | grep -i mqtt
|
|
```
|
|
**Fix:** Ensure MQTT broker is accessible at `192.168.1.110:1883`
|
|
|
|
### 2. Camera SDK Initialization Failure
|
|
**Symptoms:** Container crashes during camera initialization
|
|
**Check:**
|
|
```bash
|
|
docker logs usda-vision-api | grep -i "camera\|sdk"
|
|
```
|
|
**Fix:** Check camera hardware connection and SDK installation
|
|
|
|
### 3. Storage Path Issues
|
|
**Symptoms:** Container fails to start storage manager
|
|
**Check:**
|
|
```bash
|
|
docker exec usda-vision-api ls -la /mnt/nfs_share
|
|
```
|
|
**Fix:** Ensure `/mnt/nfs_share` is mounted and writable
|
|
|
|
### 4. Out of Memory (OOM)
|
|
**Symptoms:** Container killed by system, exit code 137
|
|
**Check:**
|
|
```bash
|
|
dmesg | grep -i "killed process"
|
|
docker stats usda-vision-api
|
|
```
|
|
**Fix:** Add memory limits or increase available memory
|
|
|
|
### 5. Missing Configuration File
|
|
**Symptoms:** Container starts but exits quickly
|
|
**Check:**
|
|
```bash
|
|
docker exec usda-vision-api cat /app/config.compose.json
|
|
```
|
|
**Fix:** Ensure `config.compose.json` exists in the container
|
|
|
|
### 6. Python Exception Not Caught
|
|
**Symptoms:** Container exits with Python traceback
|
|
**Check:**
|
|
```bash
|
|
docker logs usda-vision-api --tail 100
|
|
```
|
|
**Fix:** Check logs for unhandled exceptions
|
|
|
|
## Recent Improvements
|
|
|
|
### Enhanced Error Handling
|
|
- System now continues running even if some non-critical components fail
|
|
- Added consecutive error tracking to prevent infinite crash loops
|
|
- Better logging with full exception traces
|
|
|
|
### Improved Startup Command
|
|
- Only installs packages if missing (faster startup)
|
|
- Better error messages during startup
|
|
- Graceful exit with delay for log flushing
|
|
|
|
### Health Checks
|
|
- Added Docker healthcheck to monitor container health
|
|
- Health endpoint: `http://localhost:8000/health`
|
|
|
|
## Debugging Steps
|
|
|
|
1. **Check container status:**
|
|
```bash
|
|
docker ps -a | grep usda-vision-api
|
|
```
|
|
|
|
2. **View recent logs:**
|
|
```bash
|
|
docker logs usda-vision-api --tail 100 -f
|
|
```
|
|
|
|
3. **Check exit code:**
|
|
```bash
|
|
docker inspect usda-vision-api --format='{{.State.ExitCode}}'
|
|
```
|
|
- `0` = Normal exit
|
|
- `1` = Application error
|
|
- `137` = Killed (usually OOM)
|
|
|
|
4. **Check restart count:**
|
|
```bash
|
|
docker inspect usda-vision-api --format='{{.RestartCount}}'
|
|
```
|
|
|
|
5. **Run with debug logging:**
|
|
Edit `docker-compose.yml` and change the command to:
|
|
```yaml
|
|
python main.py --config config.compose.json --debug --verbose
|
|
```
|
|
|
|
6. **Check resource usage:**
|
|
```bash
|
|
docker stats usda-vision-api
|
|
```
|
|
|
|
## Manual Testing
|
|
|
|
To test the container manually:
|
|
```bash
|
|
docker exec -it usda-vision-api bash
|
|
python main.py --config config.compose.json --debug
|
|
```
|
|
|
|
## Prevention
|
|
|
|
The container now has:
|
|
- ✅ Automatic restart policy (`restart: unless-stopped`)
|
|
- ✅ Health checks
|
|
- ✅ Better error handling
|
|
- ✅ Graceful shutdown on signals
|
|
- ✅ Partial operation if some components fail
|
|
|
|
## Getting Help
|
|
|
|
If crashes persist:
|
|
1. Run the diagnostic script
|
|
2. Collect logs: `docker logs usda-vision-api > crash_logs.txt`
|
|
3. Check system resources: `docker stats usda-vision-api`
|
|
4. Review recent changes to configuration or code
|
|
|