Files
usda-vision/docs/CONTAINER_CRASH_DEBUGGING.md
salirezav 933d4417a5 Update Docker configuration, enhance error handling, and improve logging
- Added health check to the camera management API service in docker-compose.yml for better container reliability.
- Updated installation scripts in Dockerfile to check for existing dependencies before installation, improving efficiency.
- Enhanced error handling in the USDAVisionSystem class to allow partial operation if some components fail to start, preventing immediate shutdown.
- Improved logging throughout the application, including more detailed error messages and critical error handling in the main loop.
- Refactored WebSocketManager and CameraMonitor classes to use debug logging for connection events, reducing log noise.
2025-12-03 17:23:31 -05:00

142 lines
3.4 KiB
Markdown

# Container Crash Debugging Guide
## Overview
This guide helps diagnose and fix crashes in the `usda-vision-api` container.
## Quick Diagnostic
Run the diagnostic script:
```bash
./scripts/diagnose_container_crashes.sh
```
## Common Causes of Crashes
### 1. MQTT Connection Failure
**Symptoms:** Container exits immediately after startup
**Check:**
```bash
docker logs usda-vision-api | grep -i mqtt
```
**Fix:** Ensure MQTT broker is accessible at `192.168.1.110:1883`
### 2. Camera SDK Initialization Failure
**Symptoms:** Container crashes during camera initialization
**Check:**
```bash
docker logs usda-vision-api | grep -i "camera\|sdk"
```
**Fix:** Check camera hardware connection and SDK installation
### 3. Storage Path Issues
**Symptoms:** Container fails to start storage manager
**Check:**
```bash
docker exec usda-vision-api ls -la /mnt/nfs_share
```
**Fix:** Ensure `/mnt/nfs_share` is mounted and writable
### 4. Out of Memory (OOM)
**Symptoms:** Container killed by system, exit code 137
**Check:**
```bash
dmesg | grep -i "killed process"
docker stats usda-vision-api
```
**Fix:** Add memory limits or increase available memory
### 5. Missing Configuration File
**Symptoms:** Container starts but exits quickly
**Check:**
```bash
docker exec usda-vision-api cat /app/config.compose.json
```
**Fix:** Ensure `config.compose.json` exists in the container
### 6. Python Exception Not Caught
**Symptoms:** Container exits with Python traceback
**Check:**
```bash
docker logs usda-vision-api --tail 100
```
**Fix:** Check logs for unhandled exceptions
## Recent Improvements
### Enhanced Error Handling
- System now continues running even if some non-critical components fail
- Added consecutive error tracking to prevent infinite crash loops
- Better logging with full exception traces
### Improved Startup Command
- Only installs packages if missing (faster startup)
- Better error messages during startup
- Graceful exit with delay for log flushing
### Health Checks
- Added Docker healthcheck to monitor container health
- Health endpoint: `http://localhost:8000/health`
## Debugging Steps
1. **Check container status:**
```bash
docker ps -a | grep usda-vision-api
```
2. **View recent logs:**
```bash
docker logs usda-vision-api --tail 100 -f
```
3. **Check exit code:**
```bash
docker inspect usda-vision-api --format='{{.State.ExitCode}}'
```
- `0` = Normal exit
- `1` = Application error
- `137` = Killed (usually OOM)
4. **Check restart count:**
```bash
docker inspect usda-vision-api --format='{{.RestartCount}}'
```
5. **Run with debug logging:**
Edit `docker-compose.yml` and change the command to:
```yaml
python main.py --config config.compose.json --debug --verbose
```
6. **Check resource usage:**
```bash
docker stats usda-vision-api
```
## Manual Testing
To test the container manually:
```bash
docker exec -it usda-vision-api bash
python main.py --config config.compose.json --debug
```
## Prevention
The container now has:
- ✅ Automatic restart policy (`restart: unless-stopped`)
- ✅ Health checks
- ✅ Better error handling
- ✅ Graceful shutdown on signals
- ✅ Partial operation if some components fail
## Getting Help
If crashes persist:
1. Run the diagnostic script
2. Collect logs: `docker logs usda-vision-api > crash_logs.txt`
3. Check system resources: `docker stats usda-vision-api`
4. Review recent changes to configuration or code