# Database Design Analysis: Experiment Phases & Repetitions

## Current Design Issues

### 1. **Critical UNIQUE Constraint Problem**

The phase tables (`soaking`, `airdrying`, `cracking`, `shelling`) have a fundamental flaw:

```sql
-- Current problematic constraints
CONSTRAINT unique_soaking_per_experiment UNIQUE (experiment_id),
CONSTRAINT unique_soaking_per_repetition UNIQUE (repetition_id)
```

**Problem**: The `UNIQUE (experiment_id)` constraint prevents having multiple phase records for different repetitions of the same experiment. If an experiment has 3 repetitions, you can only store phase data for ONE repetition!

**Impact**: This completely breaks the repetition system - you cannot have separate phase executions for each repetition.

### 2. **Ambiguous Data Model**

The phase tables reference both `experiment_id` AND `repetition_id`, creating confusion:
- If data belongs to the experiment, why have `repetition_id`?
- If data belongs to the repetition, why have `UNIQUE (experiment_id)`?

### 3. **Missing Automatic Phase Creation**

When repetitions are created, corresponding phase entries are NOT automatically created. This leads to:
- Incomplete data model
- Manual work required
- Potential data inconsistencies

### 4. **Sequential Timing Calculation Issues**

The triggers that calculate sequential start times look up by `experiment_id`:
```sql
SELECT s.scheduled_end_time INTO NEW.scheduled_start_time
FROM public.soaking s
WHERE s.experiment_id = NEW.experiment_id  -- WRONG! Should be repetition_id
```

This will fail when you have multiple repetitions because:
- It can't determine which repetition's soaking to use
- It might pick the wrong repetition's data

## Design Assessment: Separate Tables vs Unified Table

### Current Approach: Separate Tables (❌ Not Recommended)

**Pros:**
- Clear separation of concerns
- Type-specific columns can be different
- Easier to understand at a glance

**Cons:**
- ❌ Schema duplication (same structure repeated 4 times)
- ❌ Harder to query across phases
- ❌ More complex to maintain
- ❌ Difficult to enforce sequential relationships
- ❌ More tables to manage
- ❌ Harder to get "all phases for a repetition" queries
- ❌ Current UNIQUE constraints break the repetition model

### Recommended Approach: Unified Phase Execution Table (✅ Best Practice)

**Pros:**
- ✅ Single source of truth for phase executions
- ✅ Easy to query all phases for a repetition
- ✅ Simple sequential phase calculations
- ✅ Easier to enforce business rules (one phase per repetition)
- ✅ Less schema duplication
- ✅ Better for reporting and analytics
- ✅ Easier to add new phases in the future
- ✅ Can use database views for phase-specific access

**Cons:**
- Requires phase-specific columns to be nullable (or use JSONB)
- Slightly more complex queries for phase-specific data

## Recommended Solution: Unified Phase Execution Table

### Option 1: Single Table with Phase Type Enum (Recommended)

```sql
CREATE TABLE public.experiment_phase_executions (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    repetition_id UUID NOT NULL REFERENCES public.experiment_repetitions(id) ON DELETE CASCADE,
    phase_type TEXT NOT NULL CHECK (phase_type IN ('soaking', 'airdrying', 'cracking', 'shelling')),
    
    -- Scheduling fields (common to all phases)
    scheduled_start_time TIMESTAMP WITH TIME ZONE NOT NULL,
    scheduled_end_time TIMESTAMP WITH TIME ZONE,
    actual_start_time TIMESTAMP WITH TIME ZONE,
    actual_end_time TIMESTAMP WITH TIME ZONE,
    
    -- Phase-specific parameters (nullable, only relevant for specific phases)
    -- Soaking
    soaking_duration_minutes INTEGER CHECK (soaking_duration_minutes > 0),
    
    -- Airdrying  
    duration_minutes INTEGER CHECK (duration_minutes > 0),
    
    -- Cracking
    machine_type_id UUID REFERENCES public.machine_types(id),
    
    -- Status tracking
    status TEXT NOT NULL DEFAULT 'pending' 
        CHECK (status IN ('pending', 'scheduled', 'in_progress', 'completed', 'cancelled')),
    
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_by UUID NOT NULL REFERENCES public.user_profiles(id),
    
    -- Ensure one execution per phase type per repetition
    CONSTRAINT unique_phase_per_repetition UNIQUE (repetition_id, phase_type)
);

-- Indexes
CREATE INDEX idx_phase_executions_repetition_id ON public.experiment_phase_executions(repetition_id);
CREATE INDEX idx_phase_executions_phase_type ON public.experiment_phase_executions(phase_type);
CREATE INDEX idx_phase_executions_status ON public.experiment_phase_executions(status);

-- Function to automatically calculate sequential start times
CREATE OR REPLACE FUNCTION calculate_sequential_phase_start_time()
RETURNS TRIGGER AS $$
DECLARE
    prev_phase_end_time TIMESTAMP WITH TIME ZONE;
    experiment_phase_config UUID;
    phase_order TEXT[] := ARRAY['soaking', 'airdrying', 'cracking', 'shelling'];
    current_phase_index INT;
    prev_phase_name TEXT;
BEGIN
    -- Get the experiment phase configuration
    SELECT e.phase_id INTO experiment_phase_config
    FROM public.experiments e
    JOIN public.experiment_repetitions er ON e.id = er.experiment_id
    WHERE er.id = NEW.repetition_id;
    
    -- Find current phase index
    SELECT array_position(phase_order, NEW.phase_type) INTO current_phase_index;
    
    -- If not the first phase, get previous phase's end time
    IF current_phase_index > 1 THEN
        prev_phase_name := phase_order[current_phase_index - 1];
        
        SELECT scheduled_end_time INTO prev_phase_end_time
        FROM public.experiment_phase_executions
        WHERE repetition_id = NEW.repetition_id
          AND phase_type = prev_phase_name
        ORDER BY created_at DESC
        LIMIT 1;
        
        -- If previous phase exists, use its end time as start time
        IF prev_phase_end_time IS NOT NULL THEN
            NEW.scheduled_start_time := prev_phase_end_time;
        END IF;
    END IF;
    
    -- Calculate end time based on duration
    IF NEW.phase_type = 'soaking' AND NEW.soaking_duration_minutes IS NOT NULL THEN
        NEW.scheduled_end_time := NEW.scheduled_start_time + (NEW.soaking_duration_minutes || ' minutes')::INTERVAL;
    ELSIF NEW.phase_type = 'airdrying' AND NEW.duration_minutes IS NOT NULL THEN
        NEW.scheduled_end_time := NEW.scheduled_start_time + (NEW.duration_minutes || ' minutes')::INTERVAL;
    END IF;
    
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Trigger to automatically calculate sequential times
CREATE TRIGGER trigger_calculate_sequential_phase_times
    BEFORE INSERT OR UPDATE ON public.experiment_phase_executions
    FOR EACH ROW
    EXECUTE FUNCTION calculate_sequential_phase_start_time();

-- Function to automatically create phase executions when repetition is created
CREATE OR REPLACE FUNCTION create_phase_executions_for_repetition()
RETURNS TRIGGER AS $$
DECLARE
    exp_phase_config RECORD;
    phase_type_list TEXT[] := ARRAY[]::TEXT[];
    phase_name TEXT;
    soaking_start_time TIMESTAMP WITH TIME ZONE;
BEGIN
    -- Get experiment phase configuration
    SELECT ep.*, e.soaking_duration_hr, e.air_drying_time_min
    INTO exp_phase_config
    FROM public.experiments e
    JOIN public.experiment_phases ep ON e.phase_id = ep.id
    JOIN public.experiment_repetitions er ON e.id = er.experiment_id
    WHERE er.id = NEW.id;
    
    -- Build list of phases to create based on experiment configuration
    IF exp_phase_config.has_soaking THEN
        phase_type_list := array_append(phase_type_list, 'soaking');
    END IF;
    IF exp_phase_config.has_airdrying THEN
        phase_type_list := array_append(phase_type_list, 'airdrying');
    END IF;
    IF exp_phase_config.has_cracking THEN
        phase_type_list := array_append(phase_type_list, 'cracking');
    END IF;
    IF exp_phase_config.has_shelling THEN
        phase_type_list := array_append(phase_type_list, 'shelling');
    END IF;
    
    -- Create phase executions for each required phase
    FOREACH phase_name IN ARRAY phase_type_list
    LOOP
        INSERT INTO public.experiment_phase_executions (
            repetition_id,
            phase_type,
            scheduled_start_time,
            scheduled_end_time,
            status,
            created_by,
            -- Phase-specific parameters
            CASE 
                WHEN phase_name = 'soaking' THEN 
                    soaking_duration_minutes := (exp_phase_config.soaking_duration_hr * 60)::INTEGER
                WHEN phase_name = 'airdrying' THEN 
                    duration_minutes := exp_phase_config.air_drying_time_min
            END
        )
        VALUES (
            NEW.id,
            phase_name,
            NOW(), -- Default start time, will be updated when scheduled
            NULL,  -- Will be calculated by trigger
            'pending',
            NEW.created_by
        );
    END LOOP;
    
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Trigger to auto-create phases when repetition is created
CREATE TRIGGER trigger_create_phase_executions
    AFTER INSERT ON public.experiment_repetitions
    FOR EACH ROW
    EXECUTE FUNCTION create_phase_executions_for_repetition();
```

### Option 2: Keep Separate Tables but Fix Constraints (Alternative)

If you prefer to keep separate tables, you MUST fix the constraints:

```sql
-- Remove experiment_id UNIQUE constraints
ALTER TABLE public.soaking DROP CONSTRAINT IF EXISTS unique_soaking_per_experiment;
ALTER TABLE public.airdrying DROP CONSTRAINT IF EXISTS unique_airdrying_per_experiment;
ALTER TABLE public.cracking DROP CONSTRAINT IF EXISTS unique_cracking_per_experiment;
ALTER TABLE public.shelling DROP CONSTRAINT IF EXISTS unique_shelling_per_experiment;

-- Keep only repetition_id UNIQUE constraints (one phase per repetition)
-- These already exist, which is good

-- Optionally, remove experiment_id if not needed
-- OR keep it for easier querying but without UNIQUE constraint
```

Then add triggers to auto-create phase entries when repetitions are created (similar to Option 1's trigger).

## Comparison Matrix

| Aspect | Separate Tables (Current) | Unified Table (Recommended) |
|--------|---------------------------|------------------------------|
| **Repetition Support** | ❌ Broken (UNIQUE constraint) | ✅ Works correctly |
| **Query Complexity** | ❌ Requires UNION or multiple queries | ✅ Simple single query |
| **Sequential Calculations** | ❌ Complex, error-prone | ✅ Simple, reliable |
| **Schema Maintenance** | ❌ 4x duplication | ✅ Single source |
| **Auto-creation** | ❌ Manual or missing | ✅ Automatic via trigger |
| **Adding New Phases** | ❌ Requires new table | ✅ Just add enum value |
| **Reporting** | ❌ Complex joins/unions | ✅ Straightforward |
| **Data Integrity** | ❌ Can have inconsistencies | ✅ Easier to enforce |

## Recommendation

**Use Option 1 (Unified Table)** because:
1. ✅ Fixes the repetition constraint issue
2. ✅ Simplifies the data model
3. ✅ Makes sequential phase calculations straightforward
4. ✅ Automatically creates phases when repetitions are created
5. ✅ Better for future scalability
6. ✅ Easier to maintain and query

You can still create database **views** for phase-specific access if needed:
```sql
CREATE VIEW soaking_executions AS
SELECT * FROM experiment_phase_executions WHERE phase_type = 'soaking';

CREATE VIEW airdrying_executions AS
SELECT * FROM experiment_phase_executions WHERE phase_type = 'airdrying';
-- etc.
```

This gives you the benefits of a unified table while maintaining phase-specific interfaces for your application code.

## Migration Path

1. Create new `experiment_phase_executions` table
2. Migrate existing data from separate tables
3. Create views for backward compatibility (if needed)
4. Update application code to use new table
5. Drop old tables after verification