Experimental Analysis System
=============================

OpenHCS provides a comprehensive experimental analysis system for processing high-content screening data from ThermoFisher CX5 and MetaXpress systems, with support for complex experimental designs, replicate management, and statistical analysis.

## System Overview

The experimental analysis system handles the complete workflow from experimental design configuration to statistical analysis and visualization:

- **Configuration parsing**: Excel-based experimental design definition
- **Data ingestion**: Support for CX5 and MetaXpress result formats
- **Replicate management**: Biological and technical replicate handling
- **Statistical analysis**: Control-based normalization and dose-response analysis
- **Result export**: Compiled results and heatmap visualization

## Architecture Components

### Modern Registry-Based Architecture

The experimental analysis system uses a registry pattern to eliminate code duplication and provide a unified interface for multiple microscope formats.

**openhcs.processing.backends.experimental_analysis.unified_analysis_engine**
  Main entry point - ``ExperimentalAnalysisEngine`` class provides unified analysis with automatic format detection

**openhcs.processing.backends.experimental_analysis.format_registry_service**
  ``FormatRegistryService`` - automatic discovery and management of format handlers

**openhcs.processing.backends.experimental_analysis.format_registry**
  ``MicroscopeFormatRegistryBase`` - abstract base class defining the registry interface

**openhcs.processing.backends.experimental_analysis.cx5_registry**
  ``CX5FormatRegistry`` - ThermoFisher CX5 format implementation

**openhcs.processing.backends.experimental_analysis.metaxpress_registry**
  ``MetaXpressFormatRegistry`` - Molecular Devices MetaXpress format implementation

### Legacy Modules (Deprecated)

**openhcs.formats.experimental_analysis**
  Legacy analysis functions - use ``ExperimentalAnalysisEngine`` for new code

**openhcs.formats.metaxpress**
  Legacy MetaXpress support - integrated into registry system

### Registry System Architecture

The experimental analysis system uses a registry pattern to handle multiple microscope formats through a unified interface.

#### ExperimentalAnalysisEngine

The main entry point for experimental analysis:

.. code-block:: python

   from openhcs.processing.backends.experimental_analysis import ExperimentalAnalysisEngine
   from openhcs.core.config import ExperimentalAnalysisConfig

   # Create configuration
   config = ExperimentalAnalysisConfig(
       normalization_method=NormalizationMethod.FOLD_CHANGE,
       export_heatmaps=True,
       auto_detect_format=True
   )

   # Initialize engine
   engine = ExperimentalAnalysisEngine(config)

   # Run analysis with automatic format detection
   results = engine.run_analysis(
       results_path="microscope_results.xlsx",
       config_file="config.xlsx",
       compiled_results_path="compiled_results.xlsx",
       heatmap_path="heatmaps.xlsx"
   )

#### FormatRegistryService

Automatic discovery and management of format handlers:

.. code-block:: python

   from openhcs.processing.backends.experimental_analysis import FormatRegistryService

   # Get all available formats
   registries = FormatRegistryService.get_all_format_registries()
   # Returns: {'EDDU_CX5': CX5FormatRegistry, 'EDDU_metaxpress': MetaXpressFormatRegistry}

   # Get specific format handler
   cx5_registry = FormatRegistryService.get_registry_instance_for_format('EDDU_CX5')

   # Automatic format detection from file
   format_name = FormatRegistryService.detect_format_from_file('results.xlsx')

**Discovery Mechanism**:
- Uses ``pkgutil.walk_packages`` to find all registry implementations
- No hardcoded imports required
- Automatically registers new format handlers
- Caches registry instances for performance

#### MicroscopeFormatRegistryBase

Abstract base class defining the registry interface:

.. code-block:: python

   class MicroscopeFormatRegistryBase(ABC):
       FORMAT_NAME: str
       SHEET_NAME: Optional[str]
       SUPPORTED_EXTENSIONS: Tuple[str, ...]

       @abstractmethod
       def extract_features(self, raw_df: pd.DataFrame) -> List[str]:
           """Extract feature column names from raw data."""

       @abstractmethod
       def extract_plate_names(self, raw_df: pd.DataFrame) -> List[str]:
           """Extract plate identifiers from raw data."""

       @abstractmethod
       def create_plates_dict(self, raw_df: pd.DataFrame) -> Dict:
           """Create nested dictionary structure for plate data."""

       def process_data(self, results_path: str) -> Dict:
           """Complete data processing pipeline."""

**Registry Pattern Benefits**:
- Single interface for all formats
- Format-specific logic isolated in subclasses
- Easy to add new formats
- Testable and maintainable

### Excel Configuration Files

The system uses Excel-based configuration files with a structured format:

.. code-block:: python

   def read_plate_layout(config_path, design_sheet_name='drug_curve_map'):
       """Parse experimental configuration from Excel file."""
       xls = pd.ExcelFile(config_path)
       # Sheet name is configurable via ExperimentalAnalysisConfig.design_sheet_name
       df = pd.read_excel(xls, design_sheet_name, index_col=0, header=None)

       # Parse global parameters
       N = None          # Number of biological replicates
       scope = None      # Microscope format (EDDU_CX5, EDDU_metaxpress)

       # Parse experimental conditions
       conditions = []   # List of experimental conditions
       layout = {}       # Condition-to-wells mapping

       # Parse control definitions
       ctrl_positions = None  # Control well positions for normalization

**Configuration Structure**:
- **Global parameters**: N (replicates), Scope (microscope format)
- **Control definitions**: Control wells for normalization
- **Condition blocks**: Experimental conditions with dose-response mapping
- **Plate groups**: Biological replicate to physical plate mapping (configurable via ``plate_groups_sheet_name``)

### Data Processing Pipeline

#### Phase 1: Configuration Parsing

.. code-block:: python

   # Parse experimental design
   scope, plate_layout, conditions, ctrl_positions = read_plate_layout(config_file)
   plate_groups = load_plate_groups(config_file)
   
   # Create experiment location mapping
   experiment_dict_locations = make_experiment_dict_locations(
       plate_groups, plate_layout, conditions
   )

**Output**: Structured mapping of conditions → replicates → doses → wells

#### Phase 2: Data Ingestion

.. code-block:: python

   def read_results(results_path, scope=None):
       """Read results from microscope-specific Excel format."""
       xls = pd.ExcelFile(results_path)
       if scope == "EDDU_CX5":
           raw_df = pd.read_excel(xls, 'Rawdata')
       elif scope == "EDDU_metaxpress":
           raw_df = pd.read_excel(xls, xls.sheet_names[0])
       return raw_df

**Format Support**:
- **CX5 format**: ThermoFisher CX5 'Rawdata' sheet structure
- **MetaXpress format**: Molecular Devices MetaXpress export format

#### Phase 3: Data Structure Creation

.. code-block:: python

   # Create well-based data structures
   well_dict = create_well_dict(df, scope=scope)
   plates_dict = create_plates_dict(df, scope=scope)
   plates_dict = fill_plates_dict(df, plates_dict, scope=scope)

**Data Structures**:
- **well_dict**: ``{well: {feature: value}}`` - Well-centric feature mapping
- **plates_dict**: ``{plate: {well: {feature: value}}}`` - Plate-centric organization

#### Phase 4: Experimental Data Mapping

.. code-block:: python

   # Map experimental design to measured values
   experiment_dict_values = make_experiment_dict_values(
       plates_dict, experiment_dict_locations, features
   )

**Output**: ``experiment_dict[condition][replicate][dose] = {feature: [values]}``

#### Phase 5: Statistical Analysis

.. code-block:: python

   # Control-based normalization
   if ctrl_positions is not None:
       experiment_dict_values = normalize_experiment(
           experiment_dict_values, ctrl_positions, features, plates_dict
       )
   
   # Generate feature tables
   feature_tables = create_all_feature_tables(experiment_dict_values, features)

### Replicate Management System

#### Biological Replicates

The system handles multiple biological replicates (N1, N2, N3, etc.) with automatic aggregation:

.. code-block:: python

   def make_experiment_dict_locations(plate_groups, plate_layout, conditions):
       """Create mapping of experimental conditions to well locations."""
       experiment_dict = {}
       
       for condition in conditions:
           experiment_dict[condition] = {}
           for replicate in range(1, N+1):  # N biological replicates
               replicate_key = f"N{replicate}"
               experiment_dict[condition][replicate_key] = {}
               
               # Map doses to wells for this replicate
               for dose_idx, dose in enumerate(doses):
                   wells = get_wells_for_replicate_dose(condition, replicate, dose_idx)
                   experiment_dict[condition][replicate_key][dose] = wells

#### Technical Replicates

Technical replicates (multiple wells per condition/dose) are automatically detected and averaged:

.. code-block:: python

   def process_technical_replicates(experiment_dict_values):
       """Average technical replicates within each condition/dose."""
       for condition in experiment_dict_values:
           for replicate in experiment_dict_values[condition]:
               for dose in experiment_dict_values[condition][replicate]:
                   # Multiple wells = technical replicates
                   well_values = experiment_dict_values[condition][replicate][dose]
                   if len(well_values) > 1:
                       # Average technical replicates
                       averaged_values = np.mean(well_values, axis=0)
                       experiment_dict_values[condition][replicate][dose] = averaged_values

### Normalization System

#### Control-Based Normalization

The system supports control-based normalization for plate-to-plate variation correction:

.. code-block:: python

   def normalize_experiment(experiment_dict_values, ctrl_positions, features, plates_dict):
       """Normalize experimental values using control wells."""
       
       # Calculate control statistics
       control_stats = calculate_control_statistics(ctrl_positions, plates_dict, features)
       
       # Normalize each experimental condition
       for condition in experiment_dict_values:
           for replicate in experiment_dict_values[condition]:
               for dose in experiment_dict_values[condition][replicate]:
                   normalized_values = normalize_to_controls(
                       experiment_dict_values[condition][replicate][dose],
                       control_stats,
                       features
                   )
                   experiment_dict_values[condition][replicate][dose] = normalized_values

**Normalization Methods** (configured via :class:`~openhcs.core.config.NormalizationMethod`):
- **FOLD_CHANGE**: ``value / control_mean`` (default)
- **Z_SCORE**: ``(value - control_mean) / control_std``
- **PERCENT_CONTROL**: ``(value / control_mean) * 100``

### Feature Extraction System

#### Microscope-Specific Feature Extraction

.. code-block:: python

   def get_features(raw_df, scope=None):
       """Extract feature columns based on microscope format."""
       if scope == "EDDU_CX5":
           return get_features_EDDU_CX5(raw_df)
       elif scope == "EDDU_metaxpress":
           return get_features_EDDU_metaxpress(raw_df)

   def get_features_EDDU_CX5(raw_df):
       """Extract features from CX5 format."""
       return raw_df.iloc[:, raw_df.columns.str.find("Replicate").argmax()+1:-1].columns

   def get_features_EDDU_metaxpress(raw_df):
       """Extract features from MetaXpress format."""
       feature_rows = raw_df[pd.isnull(raw_df.iloc[:,0])].iloc[0].tolist()[2:]
       return feature_rows

**Feature Types**:
- **Cell count metrics**: Total cells, viable cells, dead cells
- **Morphological features**: Cell area, perimeter, circularity, eccentricity
- **Intensity measurements**: Mean, median, standard deviation per channel
- **Texture features**: Contrast, correlation, energy, homogeneity

### Export System

#### Result Compilation

.. code-block:: python

   def create_all_feature_tables(experiment_dict_values, features):
       """Create feature-specific tables for export."""
       feature_tables = {}
       
       for feature in features:
           feature_table = create_feature_table(experiment_dict_values, feature)
           feature_tables[feature] = feature_table
       
       return feature_tables

#### Excel Export with Heatmaps

.. code-block:: python

   def export_results_with_heatmaps(feature_tables, output_path):
       """Export results with integrated heatmap visualization."""
       with pd.ExcelWriter(output_path, engine='xlsxwriter') as writer:
           for feature_name, feature_table in feature_tables.items():
               # Write data table
               feature_table.to_excel(writer, sheet_name=feature_name)
               
               # Generate heatmap
               create_heatmap_visualization(feature_table, writer, feature_name)

## Integration Points

### Pipeline Integration

The experimental analysis system integrates with OpenHCS pipelines through the analysis consolidation system:

.. code-block:: python

   # Integration with analysis consolidation
   from openhcs.processing.backends.analysis.consolidate_analysis_results import (
       consolidate_analysis_results_pipeline
   )
   
   # Experimental analysis can feed into consolidation
   consolidated_results = consolidate_analysis_results_pipeline(
       image_stack=processed_images,
       results_directory=experimental_results_dir,
       consolidation_config=AnalysisConsolidationConfig(),
       plate_metadata_config=PlateMetadataConfig()
   )

### Configuration System Integration

The experimental analysis system can be configured through the global configuration system:

.. code-block:: python

   from enum import Enum
   from dataclasses import dataclass
   from typing import Optional

   class NormalizationMethod(Enum):
       """Normalization methods for experimental analysis."""
       FOLD_CHANGE = "fold_change"      # value / control_mean
       Z_SCORE = "z_score"              # (value - control_mean) / control_std
       PERCENT_CONTROL = "percent_control"  # (value / control_mean) * 100

   class MicroscopeFormat(Enum):
       """Supported microscope formats for experimental analysis."""
       EDDU_CX5 = "EDDU_CX5"                # ThermoFisher CX5 format
       EDDU_METAXPRESS = "EDDU_metaxpress"  # Molecular Devices MetaXpress format

   @dataclass(frozen=True)
   class ExperimentalAnalysisConfig:
       """Configuration for experimental analysis system."""
       config_file_name: str = "config.xlsx"
       """Name of the experimental configuration Excel file."""

       design_sheet_name: str = "drug_curve_map"
       """Name of the sheet containing experimental design."""

       plate_groups_sheet_name: str = "plate_groups"
       """Name of the sheet containing plate group mappings."""

       normalization_method: NormalizationMethod = NormalizationMethod.FOLD_CHANGE
       """Normalization method for control-based normalization."""

       export_raw_results: bool = True
       """Whether to export raw (non-normalized) results."""

       export_heatmaps: bool = True
       """Whether to generate heatmap visualizations."""

       auto_detect_format: bool = True
       """Whether to automatically detect microscope format."""

       default_format: Optional[MicroscopeFormat] = None
       """Default format to use if auto-detection fails."""

**Configuration Features**:
- **Enum-based type safety**: Normalization methods and formats use enums to prevent invalid values
- **Configurable sheet names**: Excel sheet names can be customized for different workflows
- **Automatic format detection**: System can detect CX5 vs MetaXpress automatically
- **Flexible export options**: Control which outputs are generated

## Performance Characteristics

### Memory Efficiency

- **Lazy loading**: Results loaded on-demand to minimize memory usage
- **Chunked processing**: Large datasets processed in chunks
- **Efficient data structures**: Optimized pandas DataFrames for statistical operations

### Scalability

- **Multi-plate support**: Handles experiments across multiple physical plates
- **Variable replicate numbers**: Supports any number of biological replicates
- **Flexible condition numbers**: No limit on experimental conditions per plate

### Statistical Robustness

- **Outlier detection**: Automatic identification of statistical outliers
- **Missing data handling**: Robust handling of missing wells or failed measurements
- **Quality control metrics**: Automatic calculation of assay quality metrics (Z-factor, etc.)

The experimental analysis system provides comprehensive support for high-content screening experimental workflows, from initial experimental design through final statistical analysis and visualization, ensuring robust and reproducible analysis of complex multi-condition, multi-replicate experiments.