Files
usda-vision/docs/database_design_analysis.md
salirezav 73849b40a8 Add MQTT publish request and response models, and implement publish route
- Introduced MQTTPublishRequest and MQTTPublishResponse models for handling MQTT message publishing.
- Implemented a new POST route for publishing MQTT messages, including error handling and logging.
- Enhanced the StandaloneAutoRecorder with improved logging during manual recording start.
- Updated the frontend to include an MQTT Debug Panel for better monitoring and debugging capabilities.
2025-12-01 13:07:36 -05:00

12 KiB

Database Design Analysis: Experiment Phases & Repetitions

Current Design Issues

1. Critical UNIQUE Constraint Problem

The phase tables (soaking, airdrying, cracking, shelling) have a fundamental flaw:

-- Current problematic constraints
CONSTRAINT unique_soaking_per_experiment UNIQUE (experiment_id),
CONSTRAINT unique_soaking_per_repetition UNIQUE (repetition_id)

Problem: The UNIQUE (experiment_id) constraint prevents having multiple phase records for different repetitions of the same experiment. If an experiment has 3 repetitions, you can only store phase data for ONE repetition!

Impact: This completely breaks the repetition system - you cannot have separate phase executions for each repetition.

2. Ambiguous Data Model

The phase tables reference both experiment_id AND repetition_id, creating confusion:

  • If data belongs to the experiment, why have repetition_id?
  • If data belongs to the repetition, why have UNIQUE (experiment_id)?

3. Missing Automatic Phase Creation

When repetitions are created, corresponding phase entries are NOT automatically created. This leads to:

  • Incomplete data model
  • Manual work required
  • Potential data inconsistencies

4. Sequential Timing Calculation Issues

The triggers that calculate sequential start times look up by experiment_id:

SELECT s.scheduled_end_time INTO NEW.scheduled_start_time
FROM public.soaking s
WHERE s.experiment_id = NEW.experiment_id  -- WRONG! Should be repetition_id

This will fail when you have multiple repetitions because:

  • It can't determine which repetition's soaking to use
  • It might pick the wrong repetition's data

Design Assessment: Separate Tables vs Unified Table

Pros:

  • Clear separation of concerns
  • Type-specific columns can be different
  • Easier to understand at a glance

Cons:

  • Schema duplication (same structure repeated 4 times)
  • Harder to query across phases
  • More complex to maintain
  • Difficult to enforce sequential relationships
  • More tables to manage
  • Harder to get "all phases for a repetition" queries
  • Current UNIQUE constraints break the repetition model

Pros:

  • Single source of truth for phase executions
  • Easy to query all phases for a repetition
  • Simple sequential phase calculations
  • Easier to enforce business rules (one phase per repetition)
  • Less schema duplication
  • Better for reporting and analytics
  • Easier to add new phases in the future
  • Can use database views for phase-specific access

Cons:

  • Requires phase-specific columns to be nullable (or use JSONB)
  • Slightly more complex queries for phase-specific data
CREATE TABLE public.experiment_phase_executions (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    repetition_id UUID NOT NULL REFERENCES public.experiment_repetitions(id) ON DELETE CASCADE,
    phase_type TEXT NOT NULL CHECK (phase_type IN ('soaking', 'airdrying', 'cracking', 'shelling')),
    
    -- Scheduling fields (common to all phases)
    scheduled_start_time TIMESTAMP WITH TIME ZONE NOT NULL,
    scheduled_end_time TIMESTAMP WITH TIME ZONE,
    actual_start_time TIMESTAMP WITH TIME ZONE,
    actual_end_time TIMESTAMP WITH TIME ZONE,
    
    -- Phase-specific parameters (nullable, only relevant for specific phases)
    -- Soaking
    soaking_duration_minutes INTEGER CHECK (soaking_duration_minutes > 0),
    
    -- Airdrying  
    duration_minutes INTEGER CHECK (duration_minutes > 0),
    
    -- Cracking
    machine_type_id UUID REFERENCES public.machine_types(id),
    
    -- Status tracking
    status TEXT NOT NULL DEFAULT 'pending' 
        CHECK (status IN ('pending', 'scheduled', 'in_progress', 'completed', 'cancelled')),
    
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_by UUID NOT NULL REFERENCES public.user_profiles(id),
    
    -- Ensure one execution per phase type per repetition
    CONSTRAINT unique_phase_per_repetition UNIQUE (repetition_id, phase_type)
);

-- Indexes
CREATE INDEX idx_phase_executions_repetition_id ON public.experiment_phase_executions(repetition_id);
CREATE INDEX idx_phase_executions_phase_type ON public.experiment_phase_executions(phase_type);
CREATE INDEX idx_phase_executions_status ON public.experiment_phase_executions(status);

-- Function to automatically calculate sequential start times
CREATE OR REPLACE FUNCTION calculate_sequential_phase_start_time()
RETURNS TRIGGER AS $$
DECLARE
    prev_phase_end_time TIMESTAMP WITH TIME ZONE;
    experiment_phase_config UUID;
    phase_order TEXT[] := ARRAY['soaking', 'airdrying', 'cracking', 'shelling'];
    current_phase_index INT;
    prev_phase_name TEXT;
BEGIN
    -- Get the experiment phase configuration
    SELECT e.phase_id INTO experiment_phase_config
    FROM public.experiments e
    JOIN public.experiment_repetitions er ON e.id = er.experiment_id
    WHERE er.id = NEW.repetition_id;
    
    -- Find current phase index
    SELECT array_position(phase_order, NEW.phase_type) INTO current_phase_index;
    
    -- If not the first phase, get previous phase's end time
    IF current_phase_index > 1 THEN
        prev_phase_name := phase_order[current_phase_index - 1];
        
        SELECT scheduled_end_time INTO prev_phase_end_time
        FROM public.experiment_phase_executions
        WHERE repetition_id = NEW.repetition_id
          AND phase_type = prev_phase_name
        ORDER BY created_at DESC
        LIMIT 1;
        
        -- If previous phase exists, use its end time as start time
        IF prev_phase_end_time IS NOT NULL THEN
            NEW.scheduled_start_time := prev_phase_end_time;
        END IF;
    END IF;
    
    -- Calculate end time based on duration
    IF NEW.phase_type = 'soaking' AND NEW.soaking_duration_minutes IS NOT NULL THEN
        NEW.scheduled_end_time := NEW.scheduled_start_time + (NEW.soaking_duration_minutes || ' minutes')::INTERVAL;
    ELSIF NEW.phase_type = 'airdrying' AND NEW.duration_minutes IS NOT NULL THEN
        NEW.scheduled_end_time := NEW.scheduled_start_time + (NEW.duration_minutes || ' minutes')::INTERVAL;
    END IF;
    
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Trigger to automatically calculate sequential times
CREATE TRIGGER trigger_calculate_sequential_phase_times
    BEFORE INSERT OR UPDATE ON public.experiment_phase_executions
    FOR EACH ROW
    EXECUTE FUNCTION calculate_sequential_phase_start_time();

-- Function to automatically create phase executions when repetition is created
CREATE OR REPLACE FUNCTION create_phase_executions_for_repetition()
RETURNS TRIGGER AS $$
DECLARE
    exp_phase_config RECORD;
    phase_type_list TEXT[] := ARRAY[]::TEXT[];
    phase_name TEXT;
    soaking_start_time TIMESTAMP WITH TIME ZONE;
BEGIN
    -- Get experiment phase configuration
    SELECT ep.*, e.soaking_duration_hr, e.air_drying_time_min
    INTO exp_phase_config
    FROM public.experiments e
    JOIN public.experiment_phases ep ON e.phase_id = ep.id
    JOIN public.experiment_repetitions er ON e.id = er.experiment_id
    WHERE er.id = NEW.id;
    
    -- Build list of phases to create based on experiment configuration
    IF exp_phase_config.has_soaking THEN
        phase_type_list := array_append(phase_type_list, 'soaking');
    END IF;
    IF exp_phase_config.has_airdrying THEN
        phase_type_list := array_append(phase_type_list, 'airdrying');
    END IF;
    IF exp_phase_config.has_cracking THEN
        phase_type_list := array_append(phase_type_list, 'cracking');
    END IF;
    IF exp_phase_config.has_shelling THEN
        phase_type_list := array_append(phase_type_list, 'shelling');
    END IF;
    
    -- Create phase executions for each required phase
    FOREACH phase_name IN ARRAY phase_type_list
    LOOP
        INSERT INTO public.experiment_phase_executions (
            repetition_id,
            phase_type,
            scheduled_start_time,
            scheduled_end_time,
            status,
            created_by,
            -- Phase-specific parameters
            CASE 
                WHEN phase_name = 'soaking' THEN 
                    soaking_duration_minutes := (exp_phase_config.soaking_duration_hr * 60)::INTEGER
                WHEN phase_name = 'airdrying' THEN 
                    duration_minutes := exp_phase_config.air_drying_time_min
            END
        )
        VALUES (
            NEW.id,
            phase_name,
            NOW(), -- Default start time, will be updated when scheduled
            NULL,  -- Will be calculated by trigger
            'pending',
            NEW.created_by
        );
    END LOOP;
    
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Trigger to auto-create phases when repetition is created
CREATE TRIGGER trigger_create_phase_executions
    AFTER INSERT ON public.experiment_repetitions
    FOR EACH ROW
    EXECUTE FUNCTION create_phase_executions_for_repetition();

Option 2: Keep Separate Tables but Fix Constraints (Alternative)

If you prefer to keep separate tables, you MUST fix the constraints:

-- Remove experiment_id UNIQUE constraints
ALTER TABLE public.soaking DROP CONSTRAINT IF EXISTS unique_soaking_per_experiment;
ALTER TABLE public.airdrying DROP CONSTRAINT IF EXISTS unique_airdrying_per_experiment;
ALTER TABLE public.cracking DROP CONSTRAINT IF EXISTS unique_cracking_per_experiment;
ALTER TABLE public.shelling DROP CONSTRAINT IF EXISTS unique_shelling_per_experiment;

-- Keep only repetition_id UNIQUE constraints (one phase per repetition)
-- These already exist, which is good

-- Optionally, remove experiment_id if not needed
-- OR keep it for easier querying but without UNIQUE constraint

Then add triggers to auto-create phase entries when repetitions are created (similar to Option 1's trigger).

Comparison Matrix

Aspect Separate Tables (Current) Unified Table (Recommended)
Repetition Support Broken (UNIQUE constraint) Works correctly
Query Complexity Requires UNION or multiple queries Simple single query
Sequential Calculations Complex, error-prone Simple, reliable
Schema Maintenance 4x duplication Single source
Auto-creation Manual or missing Automatic via trigger
Adding New Phases Requires new table Just add enum value
Reporting Complex joins/unions Straightforward
Data Integrity Can have inconsistencies Easier to enforce

Recommendation

Use Option 1 (Unified Table) because:

  1. Fixes the repetition constraint issue
  2. Simplifies the data model
  3. Makes sequential phase calculations straightforward
  4. Automatically creates phases when repetitions are created
  5. Better for future scalability
  6. Easier to maintain and query

You can still create database views for phase-specific access if needed:

CREATE VIEW soaking_executions AS
SELECT * FROM experiment_phase_executions WHERE phase_type = 'soaking';

CREATE VIEW airdrying_executions AS
SELECT * FROM experiment_phase_executions WHERE phase_type = 'airdrying';
-- etc.

This gives you the benefits of a unified table while maintaining phase-specific interfaces for your application code.

Migration Path

  1. Create new experiment_phase_executions table
  2. Migrate existing data from separate tables
  3. Create views for backward compatibility (if needed)
  4. Update application code to use new table
  5. Drop old tables after verification