Files
usda-vision/docs/database_schema.md
salirezav f6a37ca1ba Remove deprecated files and scripts to streamline the codebase
- Deleted unused API test files, RTSP diagnostic scripts, and development utility scripts to reduce clutter.
- Removed outdated database schema and modularization proposal documents to maintain focus on current architecture.
- Cleaned up configuration files and logging scripts that are no longer in use, enhancing project maintainability.
2025-11-02 10:07:59 -05:00

340 lines
15 KiB
Markdown

# USDA Vision Database Schema
This document provides a comprehensive overview of the database schema for the USDA Vision pecan processing experiment management system.
## Database Overview
- **Database Type**: PostgreSQL (via Supabase)
- **Version**: PostgreSQL 17
- **Authentication**: Supabase Auth with Row Level Security (RLS)
- **Extensions**: uuid-ossp for UUID generation
## Core Tables
### 1. Authentication & User Management
#### `auth.users` (Supabase managed)
- Standard Supabase authentication table
- Contains user authentication data
#### `public.roles`
```sql
CREATE TABLE public.roles (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
name TEXT UNIQUE NOT NULL CHECK (name IN ('admin', 'conductor', 'analyst', 'data recorder')),
description TEXT NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
```
**Roles:**
- `admin`: Full system access with user management capabilities
- `conductor`: Operational access for conducting experiments and managing data
- `analyst`: Read-only access for data analysis and reporting
- `data recorder`: Data entry and recording capabilities
#### `public.user_profiles`
```sql
CREATE TABLE public.user_profiles (
id UUID PRIMARY KEY REFERENCES auth.users(id) ON DELETE CASCADE,
email TEXT NOT NULL,
role_id UUID REFERENCES public.roles(id), -- Nullable (legacy)
status TEXT DEFAULT 'active' CHECK (status IN ('active', 'disabled')),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
```
#### `public.user_roles` (Junction Table)
```sql
CREATE TABLE public.user_roles (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
user_id UUID NOT NULL REFERENCES public.user_profiles(id) ON DELETE CASCADE,
role_id UUID NOT NULL REFERENCES public.roles(id) ON DELETE CASCADE,
assigned_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
assigned_by UUID REFERENCES public.user_profiles(id),
UNIQUE(user_id, role_id)
);
```
### 2. Experiment Management
#### `public.experiment_phases`
```sql
CREATE TABLE public.experiment_phases (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
name TEXT NOT NULL UNIQUE,
description TEXT, -- Optional description for the phase
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
created_by UUID NOT NULL REFERENCES public.user_profiles(id)
);
```
**Purpose**: Groups experiments into logical phases for better organization and navigation.
#### `public.experiments`
```sql
CREATE TABLE public.experiments (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
experiment_number INTEGER UNIQUE NOT NULL,
reps_required INTEGER NOT NULL CHECK (reps_required > 0),
soaking_duration_hr FLOAT NOT NULL CHECK (soaking_duration_hr >= 0),
air_drying_time_min INTEGER NOT NULL CHECK (air_drying_time_min >= 0),
plate_contact_frequency_hz FLOAT NOT NULL CHECK (plate_contact_frequency_hz > 0),
throughput_rate_pecans_sec FLOAT NOT NULL CHECK (throughput_rate_pecans_sec > 0),
crush_amount_in FLOAT NOT NULL CHECK (crush_amount_in >= 0),
entry_exit_height_diff_in FLOAT NOT NULL,
results_status TEXT NOT NULL DEFAULT 'valid' CHECK (results_status IN ('valid', 'invalid')),
completion_status BOOLEAN NOT NULL DEFAULT false,
phase_id UUID REFERENCES public.experiment_phases(id) ON DELETE SET NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
created_by UUID NOT NULL REFERENCES public.user_profiles(id)
);
```
**Purpose**: Experiment blueprints/templates that define the parameters for pecan processing experiments.
#### `public.experiment_repetitions`
```sql
CREATE TABLE public.experiment_repetitions (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
experiment_id UUID NOT NULL REFERENCES public.experiments(id) ON DELETE CASCADE,
repetition_number INTEGER NOT NULL CHECK (repetition_number > 0),
scheduled_date TIMESTAMP WITH TIME ZONE,
schedule_status TEXT NOT NULL DEFAULT 'pending schedule'
CHECK (schedule_status IN ('pending schedule', 'scheduled', 'canceled', 'aborted')),
completion_status BOOLEAN NOT NULL DEFAULT false,
is_locked BOOLEAN NOT NULL DEFAULT false,
locked_at TIMESTAMP WITH TIME ZONE,
locked_by UUID REFERENCES public.user_profiles(id),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
created_by UUID NOT NULL REFERENCES public.user_profiles(id),
CONSTRAINT unique_repetition_per_experiment UNIQUE (experiment_id, repetition_number)
);
```
**Purpose**: Individual repetitions of experiment blueprints that can be scheduled and executed.
### 3. Data Entry System
#### `public.experiment_phase_drafts`
```sql
CREATE TABLE public.experiment_phase_drafts (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
experiment_id UUID NOT NULL REFERENCES public.experiments(id) ON DELETE CASCADE,
repetition_id UUID NOT NULL REFERENCES public.experiment_repetitions(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES public.user_profiles(id),
phase_name TEXT NOT NULL CHECK (phase_name IN ('pre-soaking', 'air-drying', 'cracking', 'shelling')),
status TEXT NOT NULL DEFAULT 'draft' CHECK (status IN ('draft', 'submitted', 'withdrawn')),
draft_name TEXT, -- Optional name for the draft
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
submitted_at TIMESTAMP WITH TIME ZONE,
withdrawn_at TIMESTAMP WITH TIME ZONE
);
```
**Purpose**: Phase-specific draft records for experiment data entry with status tracking.
#### `public.experiment_phase_data`
```sql
CREATE TABLE public.experiment_phase_data (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
phase_draft_id UUID NOT NULL REFERENCES public.experiment_phase_drafts(id) ON DELETE CASCADE,
phase_name TEXT NOT NULL CHECK (phase_name IN ('pre-soaking', 'air-drying', 'cracking', 'shelling')),
-- Pre-soaking phase data
batch_initial_weight_lbs FLOAT CHECK (batch_initial_weight_lbs >= 0),
initial_shell_moisture_pct FLOAT CHECK (initial_shell_moisture_pct >= 0 AND initial_shell_moisture_pct <= 100),
initial_kernel_moisture_pct FLOAT CHECK (initial_kernel_moisture_pct >= 0 AND initial_kernel_moisture_pct <= 100),
soaking_start_time TIMESTAMP WITH TIME ZONE,
-- Air-drying phase data
airdrying_start_time TIMESTAMP WITH TIME ZONE,
post_soak_weight_lbs FLOAT CHECK (post_soak_weight_lbs >= 0),
post_soak_kernel_moisture_pct FLOAT CHECK (post_soak_kernel_moisture_pct >= 0 AND post_soak_kernel_moisture_pct <= 100),
post_soak_shell_moisture_pct FLOAT CHECK (post_soak_shell_moisture_pct >= 0 AND post_soak_shell_moisture_pct <= 100),
avg_pecan_diameter_in FLOAT CHECK (avg_pecan_diameter_in >= 0),
-- Cracking phase data
cracking_start_time TIMESTAMP WITH TIME ZONE,
-- Shelling phase data
shelling_start_time TIMESTAMP WITH TIME ZONE,
bin_1_weight_lbs FLOAT CHECK (bin_1_weight_lbs >= 0),
bin_2_weight_lbs FLOAT CHECK (bin_2_weight_lbs >= 0),
bin_3_weight_lbs FLOAT CHECK (bin_3_weight_lbs >= 0),
discharge_bin_weight_lbs FLOAT CHECK (discharge_bin_weight_lbs >= 0),
bin_1_full_yield_oz FLOAT CHECK (bin_1_full_yield_oz >= 0),
bin_2_full_yield_oz FLOAT CHECK (bin_2_full_yield_oz >= 0),
bin_3_full_yield_oz FLOAT CHECK (bin_3_full_yield_oz >= 0),
bin_1_half_yield_oz FLOAT CHECK (bin_1_half_yield_oz >= 0),
bin_2_half_yield_oz FLOAT CHECK (bin_2_half_yield_oz >= 0),
bin_3_half_yield_oz FLOAT CHECK (bin_3_half_yield_oz >= 0),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
CONSTRAINT unique_phase_per_draft UNIQUE (phase_draft_id, phase_name)
);
```
**Purpose**: Phase-specific measurement data for experiments.
#### `public.pecan_diameter_measurements`
```sql
CREATE TABLE public.pecan_diameter_measurements (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
phase_data_id UUID NOT NULL REFERENCES public.experiment_phase_data(id) ON DELETE CASCADE,
measurement_number INTEGER NOT NULL CHECK (measurement_number >= 1 AND measurement_number <= 10),
diameter_in FLOAT NOT NULL CHECK (diameter_in >= 0),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
CONSTRAINT unique_measurement_per_phase UNIQUE (phase_data_id, measurement_number)
);
```
**Purpose**: Individual pecan diameter measurements (up to 10 per phase).
### 3. Conductor Availability Management
#### `public.conductor_availability`
```sql
CREATE TABLE public.conductor_availability (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
user_id UUID NOT NULL REFERENCES public.user_profiles(id) ON DELETE CASCADE,
available_from TIMESTAMP WITH TIME ZONE NOT NULL,
available_to TIMESTAMP WITH TIME ZONE NOT NULL,
notes TEXT, -- Optional notes about the availability
status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'cancelled')),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
created_by UUID NOT NULL REFERENCES public.user_profiles(id),
-- Ensure available_to is after available_from
CONSTRAINT valid_time_range CHECK (available_to > available_from),
-- Ensure availability is in the future (can be modified if needed for past records)
CONSTRAINT future_availability CHECK (available_from >= NOW() - INTERVAL '1 day')
);
```
**Purpose**: Stores conductor availability windows for experiment scheduling with overlap prevention.
#### `public.experiment_phase_assignments`
```sql
CREATE TABLE public.experiment_phase_assignments (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
experiment_id UUID NOT NULL REFERENCES public.experiments(id) ON DELETE CASCADE,
repetition_id UUID NOT NULL REFERENCES public.experiment_repetitions(id) ON DELETE CASCADE,
conductor_id UUID NOT NULL REFERENCES public.user_profiles(id) ON DELETE CASCADE,
phase_name TEXT NOT NULL CHECK (phase_name IN ('pre-soaking', 'air-drying', 'cracking', 'shelling')),
scheduled_start_time TIMESTAMP WITH TIME ZONE NOT NULL,
scheduled_end_time TIMESTAMP WITH TIME ZONE NOT NULL,
status TEXT NOT NULL DEFAULT 'scheduled' CHECK (status IN ('scheduled', 'in-progress', 'completed', 'cancelled')),
notes TEXT, -- Optional notes about the assignment
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
created_by UUID NOT NULL REFERENCES public.user_profiles(id),
-- Ensure scheduled_end_time is after scheduled_start_time
CONSTRAINT valid_scheduled_time_range CHECK (scheduled_end_time > scheduled_start_time),
-- Ensure unique assignment per conductor per phase per repetition
CONSTRAINT unique_conductor_phase_assignment UNIQUE (repetition_id, conductor_id, phase_name)
);
```
**Purpose**: Assigns conductors to specific experiment repetition phases with scheduled times for future scheduling functionality.
## Key Functions
### Authentication & Authorization
- `public.get_user_role()`: Returns the first role for a user (backward compatibility)
- `public.get_user_roles()`: Returns all roles for a user as an array
- `public.is_admin()`: Checks if user has admin role
- `public.has_role(role_name TEXT)`: Checks if user has specific role
- `public.has_any_role(role_names TEXT[])`: Checks if user has any of the specified roles
- `public.can_manage_experiments()`: Checks if user can manage experiments (admin or conductor)
### User Management
- `public.generate_temp_password()`: Generates secure temporary passwords
- `public.create_user_with_roles(user_email, role_names, temp_password)`: Creates users with multiple roles
### Data Validation
- `validate_repetition_number()`: Ensures repetition numbers don't exceed experiment requirements
- `public.check_repetition_lock_before_withdrawal()`: Prevents withdrawal of locked repetitions
- `public.check_availability_overlap()`: Prevents overlapping availabilities for the same conductor
- `public.adjust_overlapping_availability()`: Automatically adjusts overlapping availabilities (alternative approach)
### Availability Management
- `public.get_available_conductors(start_time, end_time)`: Returns conductors available for a specific time range
- `public.is_conductor_available(conductor_id, start_time, end_time)`: Checks if a conductor is available for a specific time range
### Timestamp Management
- `public.handle_updated_at()`: Updates the updated_at timestamp
- `public.handle_phase_draft_status_change()`: Manages submitted_at and withdrawn_at timestamps
## Row Level Security (RLS) Policies
### Access Control Summary
- **Admin**: Full access to all tables and operations
- **Conductor**: Can manage experiments, experiment phases, and repetitions, view all data, manage their own availability
- **Analyst**: Read-only access to all data
- **Data Recorder**: Can create and manage their own phase drafts and data
### Key RLS Features
- All authenticated users can view experiments, experiment phases, and repetitions
- Admin and conductor roles can manage experiment phases
- Users can only modify their own phase drafts (unless admin)
- Users can only manage their own availability (unless admin)
- Conductors can view their own experiment phase assignments
- Locked repetitions prevent draft modifications
- Submitted drafts cannot be withdrawn if repetition is locked
- Role-based access control for user management
## Indexes
The database includes comprehensive indexing for performance:
- Primary key indexes on all tables
- Foreign key indexes for joins
- Status and date indexes for filtering
- Composite indexes for common query patterns
## Constraints
### Data Integrity
- Check constraints on numeric values (weights, percentages, etc.)
- Enum constraints on status fields
- Unique constraints to prevent duplicates
- Foreign key constraints for referential integrity
### Business Rules
- Repetition numbers cannot exceed experiment requirements
- Only one submitted draft per user per phase per repetition
- Moisture percentages must be between 0-100
- Weights and measurements must be non-negative
- Conductor availabilities cannot overlap for the same user
- Availability windows must have valid time ranges (end > start)
- Only one conductor assignment per phase per repetition
- Scheduled times must have valid ranges (end > start)
## Migration History
The schema has evolved through several migrations:
1. **RBAC Schema**: Initial role-based access control
2. **Multiple Roles Support**: Enhanced user role management
3. **Experiments Table**: Core experiment management
4. **Repetitions System**: Separated experiments from repetitions
5. **Data Entry System**: Phase-specific draft and data management
6. **Constraint Fixes**: Refined business rules and constraints
7. **Experiment Phases**: Added experiment grouping for better organization
8. **Conductor Availability**: Added availability management and experiment phase assignments for scheduling
This schema supports a comprehensive pecan processing experiment management system with robust security, data integrity, and flexible role-based access control.