Files
usda-vision/docs/database_schema.md
salirezav f6a37ca1ba Remove deprecated files and scripts to streamline the codebase
- Deleted unused API test files, RTSP diagnostic scripts, and development utility scripts to reduce clutter.
- Removed outdated database schema and modularization proposal documents to maintain focus on current architecture.
- Cleaned up configuration files and logging scripts that are no longer in use, enhancing project maintainability.
2025-11-02 10:07:59 -05:00

15 KiB

USDA Vision Database Schema

This document provides a comprehensive overview of the database schema for the USDA Vision pecan processing experiment management system.

Database Overview

  • Database Type: PostgreSQL (via Supabase)
  • Version: PostgreSQL 17
  • Authentication: Supabase Auth with Row Level Security (RLS)
  • Extensions: uuid-ossp for UUID generation

Core Tables

1. Authentication & User Management

auth.users (Supabase managed)

  • Standard Supabase authentication table
  • Contains user authentication data

public.roles

CREATE TABLE public.roles (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    name TEXT UNIQUE NOT NULL CHECK (name IN ('admin', 'conductor', 'analyst', 'data recorder')),
    description TEXT NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Roles:

  • admin: Full system access with user management capabilities
  • conductor: Operational access for conducting experiments and managing data
  • analyst: Read-only access for data analysis and reporting
  • data recorder: Data entry and recording capabilities

public.user_profiles

CREATE TABLE public.user_profiles (
    id UUID PRIMARY KEY REFERENCES auth.users(id) ON DELETE CASCADE,
    email TEXT NOT NULL,
    role_id UUID REFERENCES public.roles(id), -- Nullable (legacy)
    status TEXT DEFAULT 'active' CHECK (status IN ('active', 'disabled')),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

public.user_roles (Junction Table)

CREATE TABLE public.user_roles (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    user_id UUID NOT NULL REFERENCES public.user_profiles(id) ON DELETE CASCADE,
    role_id UUID NOT NULL REFERENCES public.roles(id) ON DELETE CASCADE,
    assigned_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    assigned_by UUID REFERENCES public.user_profiles(id),
    UNIQUE(user_id, role_id)
);

2. Experiment Management

public.experiment_phases

CREATE TABLE public.experiment_phases (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    name TEXT NOT NULL UNIQUE,
    description TEXT, -- Optional description for the phase
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_by UUID NOT NULL REFERENCES public.user_profiles(id)
);

Purpose: Groups experiments into logical phases for better organization and navigation.

public.experiments

CREATE TABLE public.experiments (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    experiment_number INTEGER UNIQUE NOT NULL,
    reps_required INTEGER NOT NULL CHECK (reps_required > 0),
    soaking_duration_hr FLOAT NOT NULL CHECK (soaking_duration_hr >= 0),
    air_drying_time_min INTEGER NOT NULL CHECK (air_drying_time_min >= 0),
    plate_contact_frequency_hz FLOAT NOT NULL CHECK (plate_contact_frequency_hz > 0),
    throughput_rate_pecans_sec FLOAT NOT NULL CHECK (throughput_rate_pecans_sec > 0),
    crush_amount_in FLOAT NOT NULL CHECK (crush_amount_in >= 0),
    entry_exit_height_diff_in FLOAT NOT NULL,
    results_status TEXT NOT NULL DEFAULT 'valid' CHECK (results_status IN ('valid', 'invalid')),
    completion_status BOOLEAN NOT NULL DEFAULT false,
    phase_id UUID REFERENCES public.experiment_phases(id) ON DELETE SET NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_by UUID NOT NULL REFERENCES public.user_profiles(id)
);

Purpose: Experiment blueprints/templates that define the parameters for pecan processing experiments.

public.experiment_repetitions

CREATE TABLE public.experiment_repetitions (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    experiment_id UUID NOT NULL REFERENCES public.experiments(id) ON DELETE CASCADE,
    repetition_number INTEGER NOT NULL CHECK (repetition_number > 0),
    scheduled_date TIMESTAMP WITH TIME ZONE,
    schedule_status TEXT NOT NULL DEFAULT 'pending schedule' 
        CHECK (schedule_status IN ('pending schedule', 'scheduled', 'canceled', 'aborted')),
    completion_status BOOLEAN NOT NULL DEFAULT false,
    is_locked BOOLEAN NOT NULL DEFAULT false,
    locked_at TIMESTAMP WITH TIME ZONE,
    locked_by UUID REFERENCES public.user_profiles(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_by UUID NOT NULL REFERENCES public.user_profiles(id),
    
    CONSTRAINT unique_repetition_per_experiment UNIQUE (experiment_id, repetition_number)
);

Purpose: Individual repetitions of experiment blueprints that can be scheduled and executed.

3. Data Entry System

public.experiment_phase_drafts

CREATE TABLE public.experiment_phase_drafts (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    experiment_id UUID NOT NULL REFERENCES public.experiments(id) ON DELETE CASCADE,
    repetition_id UUID NOT NULL REFERENCES public.experiment_repetitions(id) ON DELETE CASCADE,
    user_id UUID NOT NULL REFERENCES public.user_profiles(id),
    phase_name TEXT NOT NULL CHECK (phase_name IN ('pre-soaking', 'air-drying', 'cracking', 'shelling')),
    status TEXT NOT NULL DEFAULT 'draft' CHECK (status IN ('draft', 'submitted', 'withdrawn')),
    draft_name TEXT, -- Optional name for the draft
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    submitted_at TIMESTAMP WITH TIME ZONE,
    withdrawn_at TIMESTAMP WITH TIME ZONE
);

Purpose: Phase-specific draft records for experiment data entry with status tracking.

public.experiment_phase_data

CREATE TABLE public.experiment_phase_data (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    phase_draft_id UUID NOT NULL REFERENCES public.experiment_phase_drafts(id) ON DELETE CASCADE,
    phase_name TEXT NOT NULL CHECK (phase_name IN ('pre-soaking', 'air-drying', 'cracking', 'shelling')),
    
    -- Pre-soaking phase data
    batch_initial_weight_lbs FLOAT CHECK (batch_initial_weight_lbs >= 0),
    initial_shell_moisture_pct FLOAT CHECK (initial_shell_moisture_pct >= 0 AND initial_shell_moisture_pct <= 100),
    initial_kernel_moisture_pct FLOAT CHECK (initial_kernel_moisture_pct >= 0 AND initial_kernel_moisture_pct <= 100),
    soaking_start_time TIMESTAMP WITH TIME ZONE,
    
    -- Air-drying phase data
    airdrying_start_time TIMESTAMP WITH TIME ZONE,
    post_soak_weight_lbs FLOAT CHECK (post_soak_weight_lbs >= 0),
    post_soak_kernel_moisture_pct FLOAT CHECK (post_soak_kernel_moisture_pct >= 0 AND post_soak_kernel_moisture_pct <= 100),
    post_soak_shell_moisture_pct FLOAT CHECK (post_soak_shell_moisture_pct >= 0 AND post_soak_shell_moisture_pct <= 100),
    avg_pecan_diameter_in FLOAT CHECK (avg_pecan_diameter_in >= 0),
    
    -- Cracking phase data
    cracking_start_time TIMESTAMP WITH TIME ZONE,
    
    -- Shelling phase data
    shelling_start_time TIMESTAMP WITH TIME ZONE,
    bin_1_weight_lbs FLOAT CHECK (bin_1_weight_lbs >= 0),
    bin_2_weight_lbs FLOAT CHECK (bin_2_weight_lbs >= 0),
    bin_3_weight_lbs FLOAT CHECK (bin_3_weight_lbs >= 0),
    discharge_bin_weight_lbs FLOAT CHECK (discharge_bin_weight_lbs >= 0),
    bin_1_full_yield_oz FLOAT CHECK (bin_1_full_yield_oz >= 0),
    bin_2_full_yield_oz FLOAT CHECK (bin_2_full_yield_oz >= 0),
    bin_3_full_yield_oz FLOAT CHECK (bin_3_full_yield_oz >= 0),
    bin_1_half_yield_oz FLOAT CHECK (bin_1_half_yield_oz >= 0),
    bin_2_half_yield_oz FLOAT CHECK (bin_2_half_yield_oz >= 0),
    bin_3_half_yield_oz FLOAT CHECK (bin_3_half_yield_oz >= 0),
    
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    
    CONSTRAINT unique_phase_per_draft UNIQUE (phase_draft_id, phase_name)
);

Purpose: Phase-specific measurement data for experiments.

public.pecan_diameter_measurements

CREATE TABLE public.pecan_diameter_measurements (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    phase_data_id UUID NOT NULL REFERENCES public.experiment_phase_data(id) ON DELETE CASCADE,
    measurement_number INTEGER NOT NULL CHECK (measurement_number >= 1 AND measurement_number <= 10),
    diameter_in FLOAT NOT NULL CHECK (diameter_in >= 0),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    
    CONSTRAINT unique_measurement_per_phase UNIQUE (phase_data_id, measurement_number)
);

Purpose: Individual pecan diameter measurements (up to 10 per phase).

3. Conductor Availability Management

public.conductor_availability

CREATE TABLE public.conductor_availability (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    user_id UUID NOT NULL REFERENCES public.user_profiles(id) ON DELETE CASCADE,
    available_from TIMESTAMP WITH TIME ZONE NOT NULL,
    available_to TIMESTAMP WITH TIME ZONE NOT NULL,
    notes TEXT, -- Optional notes about the availability
    status TEXT NOT NULL DEFAULT 'active' CHECK (status IN ('active', 'cancelled')),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_by UUID NOT NULL REFERENCES public.user_profiles(id),
    
    -- Ensure available_to is after available_from
    CONSTRAINT valid_time_range CHECK (available_to > available_from),
    
    -- Ensure availability is in the future (can be modified if needed for past records)
    CONSTRAINT future_availability CHECK (available_from >= NOW() - INTERVAL '1 day')
);

Purpose: Stores conductor availability windows for experiment scheduling with overlap prevention.

public.experiment_phase_assignments

CREATE TABLE public.experiment_phase_assignments (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    experiment_id UUID NOT NULL REFERENCES public.experiments(id) ON DELETE CASCADE,
    repetition_id UUID NOT NULL REFERENCES public.experiment_repetitions(id) ON DELETE CASCADE,
    conductor_id UUID NOT NULL REFERENCES public.user_profiles(id) ON DELETE CASCADE,
    phase_name TEXT NOT NULL CHECK (phase_name IN ('pre-soaking', 'air-drying', 'cracking', 'shelling')),
    scheduled_start_time TIMESTAMP WITH TIME ZONE NOT NULL,
    scheduled_end_time TIMESTAMP WITH TIME ZONE NOT NULL,
    status TEXT NOT NULL DEFAULT 'scheduled' CHECK (status IN ('scheduled', 'in-progress', 'completed', 'cancelled')),
    notes TEXT, -- Optional notes about the assignment
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_by UUID NOT NULL REFERENCES public.user_profiles(id),
    
    -- Ensure scheduled_end_time is after scheduled_start_time
    CONSTRAINT valid_scheduled_time_range CHECK (scheduled_end_time > scheduled_start_time),
    
    -- Ensure unique assignment per conductor per phase per repetition
    CONSTRAINT unique_conductor_phase_assignment UNIQUE (repetition_id, conductor_id, phase_name)
);

Purpose: Assigns conductors to specific experiment repetition phases with scheduled times for future scheduling functionality.

Key Functions

Authentication & Authorization

  • public.get_user_role(): Returns the first role for a user (backward compatibility)
  • public.get_user_roles(): Returns all roles for a user as an array
  • public.is_admin(): Checks if user has admin role
  • public.has_role(role_name TEXT): Checks if user has specific role
  • public.has_any_role(role_names TEXT[]): Checks if user has any of the specified roles
  • public.can_manage_experiments(): Checks if user can manage experiments (admin or conductor)

User Management

  • public.generate_temp_password(): Generates secure temporary passwords
  • public.create_user_with_roles(user_email, role_names, temp_password): Creates users with multiple roles

Data Validation

  • validate_repetition_number(): Ensures repetition numbers don't exceed experiment requirements
  • public.check_repetition_lock_before_withdrawal(): Prevents withdrawal of locked repetitions
  • public.check_availability_overlap(): Prevents overlapping availabilities for the same conductor
  • public.adjust_overlapping_availability(): Automatically adjusts overlapping availabilities (alternative approach)

Availability Management

  • public.get_available_conductors(start_time, end_time): Returns conductors available for a specific time range
  • public.is_conductor_available(conductor_id, start_time, end_time): Checks if a conductor is available for a specific time range

Timestamp Management

  • public.handle_updated_at(): Updates the updated_at timestamp
  • public.handle_phase_draft_status_change(): Manages submitted_at and withdrawn_at timestamps

Row Level Security (RLS) Policies

Access Control Summary

  • Admin: Full access to all tables and operations
  • Conductor: Can manage experiments, experiment phases, and repetitions, view all data, manage their own availability
  • Analyst: Read-only access to all data
  • Data Recorder: Can create and manage their own phase drafts and data

Key RLS Features

  • All authenticated users can view experiments, experiment phases, and repetitions
  • Admin and conductor roles can manage experiment phases
  • Users can only modify their own phase drafts (unless admin)
  • Users can only manage their own availability (unless admin)
  • Conductors can view their own experiment phase assignments
  • Locked repetitions prevent draft modifications
  • Submitted drafts cannot be withdrawn if repetition is locked
  • Role-based access control for user management

Indexes

The database includes comprehensive indexing for performance:

  • Primary key indexes on all tables
  • Foreign key indexes for joins
  • Status and date indexes for filtering
  • Composite indexes for common query patterns

Constraints

Data Integrity

  • Check constraints on numeric values (weights, percentages, etc.)
  • Enum constraints on status fields
  • Unique constraints to prevent duplicates
  • Foreign key constraints for referential integrity

Business Rules

  • Repetition numbers cannot exceed experiment requirements
  • Only one submitted draft per user per phase per repetition
  • Moisture percentages must be between 0-100
  • Weights and measurements must be non-negative
  • Conductor availabilities cannot overlap for the same user
  • Availability windows must have valid time ranges (end > start)
  • Only one conductor assignment per phase per repetition
  • Scheduled times must have valid ranges (end > start)

Migration History

The schema has evolved through several migrations:

  1. RBAC Schema: Initial role-based access control
  2. Multiple Roles Support: Enhanced user role management
  3. Experiments Table: Core experiment management
  4. Repetitions System: Separated experiments from repetitions
  5. Data Entry System: Phase-specific draft and data management
  6. Constraint Fixes: Refined business rules and constraints
  7. Experiment Phases: Added experiment grouping for better organization
  8. Conductor Availability: Added availability management and experiment phase assignments for scheduling

This schema supports a comprehensive pecan processing experiment management system with robust security, data integrity, and flexible role-based access control.