# Database Design Analysis: Experiment Phases & Repetitions ## Current Design Issues ### 1. **Critical UNIQUE Constraint Problem** The phase tables (`soaking`, `airdrying`, `cracking`, `shelling`) have a fundamental flaw: ```sql -- Current problematic constraints CONSTRAINT unique_soaking_per_experiment UNIQUE (experiment_id), CONSTRAINT unique_soaking_per_repetition UNIQUE (repetition_id) ``` **Problem**: The `UNIQUE (experiment_id)` constraint prevents having multiple phase records for different repetitions of the same experiment. If an experiment has 3 repetitions, you can only store phase data for ONE repetition! **Impact**: This completely breaks the repetition system - you cannot have separate phase executions for each repetition. ### 2. **Ambiguous Data Model** The phase tables reference both `experiment_id` AND `repetition_id`, creating confusion: - If data belongs to the experiment, why have `repetition_id`? - If data belongs to the repetition, why have `UNIQUE (experiment_id)`? ### 3. **Missing Automatic Phase Creation** When repetitions are created, corresponding phase entries are NOT automatically created. This leads to: - Incomplete data model - Manual work required - Potential data inconsistencies ### 4. **Sequential Timing Calculation Issues** The triggers that calculate sequential start times look up by `experiment_id`: ```sql SELECT s.scheduled_end_time INTO NEW.scheduled_start_time FROM public.soaking s WHERE s.experiment_id = NEW.experiment_id -- WRONG! Should be repetition_id ``` This will fail when you have multiple repetitions because: - It can't determine which repetition's soaking to use - It might pick the wrong repetition's data ## Design Assessment: Separate Tables vs Unified Table ### Current Approach: Separate Tables (❌ Not Recommended) **Pros:** - Clear separation of concerns - Type-specific columns can be different - Easier to understand at a glance **Cons:** - ❌ Schema duplication (same structure repeated 4 times) - ❌ Harder to query across phases - ❌ More complex to maintain - ❌ Difficult to enforce sequential relationships - ❌ More tables to manage - ❌ Harder to get "all phases for a repetition" queries - ❌ Current UNIQUE constraints break the repetition model ### Recommended Approach: Unified Phase Execution Table (✅ Best Practice) **Pros:** - ✅ Single source of truth for phase executions - ✅ Easy to query all phases for a repetition - ✅ Simple sequential phase calculations - ✅ Easier to enforce business rules (one phase per repetition) - ✅ Less schema duplication - ✅ Better for reporting and analytics - ✅ Easier to add new phases in the future - ✅ Can use database views for phase-specific access **Cons:** - Requires phase-specific columns to be nullable (or use JSONB) - Slightly more complex queries for phase-specific data ## Recommended Solution: Unified Phase Execution Table ### Option 1: Single Table with Phase Type Enum (Recommended) ```sql CREATE TABLE public.experiment_phase_executions ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), repetition_id UUID NOT NULL REFERENCES public.experiment_repetitions(id) ON DELETE CASCADE, phase_type TEXT NOT NULL CHECK (phase_type IN ('soaking', 'airdrying', 'cracking', 'shelling')), -- Scheduling fields (common to all phases) scheduled_start_time TIMESTAMP WITH TIME ZONE NOT NULL, scheduled_end_time TIMESTAMP WITH TIME ZONE, actual_start_time TIMESTAMP WITH TIME ZONE, actual_end_time TIMESTAMP WITH TIME ZONE, -- Phase-specific parameters (nullable, only relevant for specific phases) -- Soaking soaking_duration_minutes INTEGER CHECK (soaking_duration_minutes > 0), -- Airdrying duration_minutes INTEGER CHECK (duration_minutes > 0), -- Cracking machine_type_id UUID REFERENCES public.machine_types(id), -- Status tracking status TEXT NOT NULL DEFAULT 'pending' CHECK (status IN ('pending', 'scheduled', 'in_progress', 'completed', 'cancelled')), created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), created_by UUID NOT NULL REFERENCES public.user_profiles(id), -- Ensure one execution per phase type per repetition CONSTRAINT unique_phase_per_repetition UNIQUE (repetition_id, phase_type) ); -- Indexes CREATE INDEX idx_phase_executions_repetition_id ON public.experiment_phase_executions(repetition_id); CREATE INDEX idx_phase_executions_phase_type ON public.experiment_phase_executions(phase_type); CREATE INDEX idx_phase_executions_status ON public.experiment_phase_executions(status); -- Function to automatically calculate sequential start times CREATE OR REPLACE FUNCTION calculate_sequential_phase_start_time() RETURNS TRIGGER AS $$ DECLARE prev_phase_end_time TIMESTAMP WITH TIME ZONE; experiment_phase_config UUID; phase_order TEXT[] := ARRAY['soaking', 'airdrying', 'cracking', 'shelling']; current_phase_index INT; prev_phase_name TEXT; BEGIN -- Get the experiment phase configuration SELECT e.phase_id INTO experiment_phase_config FROM public.experiments e JOIN public.experiment_repetitions er ON e.id = er.experiment_id WHERE er.id = NEW.repetition_id; -- Find current phase index SELECT array_position(phase_order, NEW.phase_type) INTO current_phase_index; -- If not the first phase, get previous phase's end time IF current_phase_index > 1 THEN prev_phase_name := phase_order[current_phase_index - 1]; SELECT scheduled_end_time INTO prev_phase_end_time FROM public.experiment_phase_executions WHERE repetition_id = NEW.repetition_id AND phase_type = prev_phase_name ORDER BY created_at DESC LIMIT 1; -- If previous phase exists, use its end time as start time IF prev_phase_end_time IS NOT NULL THEN NEW.scheduled_start_time := prev_phase_end_time; END IF; END IF; -- Calculate end time based on duration IF NEW.phase_type = 'soaking' AND NEW.soaking_duration_minutes IS NOT NULL THEN NEW.scheduled_end_time := NEW.scheduled_start_time + (NEW.soaking_duration_minutes || ' minutes')::INTERVAL; ELSIF NEW.phase_type = 'airdrying' AND NEW.duration_minutes IS NOT NULL THEN NEW.scheduled_end_time := NEW.scheduled_start_time + (NEW.duration_minutes || ' minutes')::INTERVAL; END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; -- Trigger to automatically calculate sequential times CREATE TRIGGER trigger_calculate_sequential_phase_times BEFORE INSERT OR UPDATE ON public.experiment_phase_executions FOR EACH ROW EXECUTE FUNCTION calculate_sequential_phase_start_time(); -- Function to automatically create phase executions when repetition is created CREATE OR REPLACE FUNCTION create_phase_executions_for_repetition() RETURNS TRIGGER AS $$ DECLARE exp_phase_config RECORD; phase_type_list TEXT[] := ARRAY[]::TEXT[]; phase_name TEXT; soaking_start_time TIMESTAMP WITH TIME ZONE; BEGIN -- Get experiment phase configuration SELECT ep.*, e.soaking_duration_hr, e.air_drying_time_min INTO exp_phase_config FROM public.experiments e JOIN public.experiment_phases ep ON e.phase_id = ep.id JOIN public.experiment_repetitions er ON e.id = er.experiment_id WHERE er.id = NEW.id; -- Build list of phases to create based on experiment configuration IF exp_phase_config.has_soaking THEN phase_type_list := array_append(phase_type_list, 'soaking'); END IF; IF exp_phase_config.has_airdrying THEN phase_type_list := array_append(phase_type_list, 'airdrying'); END IF; IF exp_phase_config.has_cracking THEN phase_type_list := array_append(phase_type_list, 'cracking'); END IF; IF exp_phase_config.has_shelling THEN phase_type_list := array_append(phase_type_list, 'shelling'); END IF; -- Create phase executions for each required phase FOREACH phase_name IN ARRAY phase_type_list LOOP INSERT INTO public.experiment_phase_executions ( repetition_id, phase_type, scheduled_start_time, scheduled_end_time, status, created_by, -- Phase-specific parameters CASE WHEN phase_name = 'soaking' THEN soaking_duration_minutes := (exp_phase_config.soaking_duration_hr * 60)::INTEGER WHEN phase_name = 'airdrying' THEN duration_minutes := exp_phase_config.air_drying_time_min END ) VALUES ( NEW.id, phase_name, NOW(), -- Default start time, will be updated when scheduled NULL, -- Will be calculated by trigger 'pending', NEW.created_by ); END LOOP; RETURN NEW; END; $$ LANGUAGE plpgsql; -- Trigger to auto-create phases when repetition is created CREATE TRIGGER trigger_create_phase_executions AFTER INSERT ON public.experiment_repetitions FOR EACH ROW EXECUTE FUNCTION create_phase_executions_for_repetition(); ``` ### Option 2: Keep Separate Tables but Fix Constraints (Alternative) If you prefer to keep separate tables, you MUST fix the constraints: ```sql -- Remove experiment_id UNIQUE constraints ALTER TABLE public.soaking DROP CONSTRAINT IF EXISTS unique_soaking_per_experiment; ALTER TABLE public.airdrying DROP CONSTRAINT IF EXISTS unique_airdrying_per_experiment; ALTER TABLE public.cracking DROP CONSTRAINT IF EXISTS unique_cracking_per_experiment; ALTER TABLE public.shelling DROP CONSTRAINT IF EXISTS unique_shelling_per_experiment; -- Keep only repetition_id UNIQUE constraints (one phase per repetition) -- These already exist, which is good -- Optionally, remove experiment_id if not needed -- OR keep it for easier querying but without UNIQUE constraint ``` Then add triggers to auto-create phase entries when repetitions are created (similar to Option 1's trigger). ## Comparison Matrix | Aspect | Separate Tables (Current) | Unified Table (Recommended) | |--------|---------------------------|------------------------------| | **Repetition Support** | ❌ Broken (UNIQUE constraint) | ✅ Works correctly | | **Query Complexity** | ❌ Requires UNION or multiple queries | ✅ Simple single query | | **Sequential Calculations** | ❌ Complex, error-prone | ✅ Simple, reliable | | **Schema Maintenance** | ❌ 4x duplication | ✅ Single source | | **Auto-creation** | ❌ Manual or missing | ✅ Automatic via trigger | | **Adding New Phases** | ❌ Requires new table | ✅ Just add enum value | | **Reporting** | ❌ Complex joins/unions | ✅ Straightforward | | **Data Integrity** | ❌ Can have inconsistencies | ✅ Easier to enforce | ## Recommendation **Use Option 1 (Unified Table)** because: 1. ✅ Fixes the repetition constraint issue 2. ✅ Simplifies the data model 3. ✅ Makes sequential phase calculations straightforward 4. ✅ Automatically creates phases when repetitions are created 5. ✅ Better for future scalability 6. ✅ Easier to maintain and query You can still create database **views** for phase-specific access if needed: ```sql CREATE VIEW soaking_executions AS SELECT * FROM experiment_phase_executions WHERE phase_type = 'soaking'; CREATE VIEW airdrying_executions AS SELECT * FROM experiment_phase_executions WHERE phase_type = 'airdrying'; -- etc. ``` This gives you the benefits of a unified table while maintaining phase-specific interfaces for your application code. ## Migration Path 1. Create new `experiment_phase_executions` table 2. Migrate existing data from separate tables 3. Create views for backward compatibility (if needed) 4. Update application code to use new table 5. Drop old tables after verification