Experimental Analysis System

OpenHCS provides a comprehensive experimental analysis system for processing high-content screening data from ThermoFisher CX5 and MetaXpress systems, with support for complex experimental designs, replicate management, and statistical analysis.

## System Overview

The experimental analysis system handles the complete workflow from experimental design configuration to statistical analysis and visualization:

Configuration parsing: Excel-based experimental design definition
Data ingestion: Support for CX5 and MetaXpress result formats
Replicate management: Biological and technical replicate handling
Statistical analysis: Control-based normalization and dose-response analysis
Result export: Compiled results and heatmap visualization

## Architecture Components

### Modern Registry-Based Architecture

The experimental analysis system uses a registry pattern to eliminate code duplication and provide a unified interface for multiple microscope formats.

openhcs.processing.backends.experimental_analysis.unified_analysis_engine: Main entry point - ExperimentalAnalysisEngine class provides unified analysis with automatic format detection
openhcs.processing.backends.experimental_analysis.format_registry_service: FormatRegistryService - automatic discovery and management of format handlers
openhcs.processing.backends.experimental_analysis.format_registry: MicroscopeFormatRegistryBase - abstract base class defining the registry interface
openhcs.processing.backends.experimental_analysis.cx5_registry: CX5FormatRegistry - ThermoFisher CX5 format implementation
openhcs.processing.backends.experimental_analysis.metaxpress_registry: MetaXpressFormatRegistry - Molecular Devices MetaXpress format implementation

### Legacy Modules (Deprecated)

openhcs.formats.experimental_analysis: Legacy analysis functions - use ExperimentalAnalysisEngine for new code
openhcs.formats.metaxpress: Legacy MetaXpress support - integrated into registry system

### Registry System Architecture

The experimental analysis system uses a registry pattern to handle multiple microscope formats through a unified interface.

#### ExperimentalAnalysisEngine

The main entry point for experimental analysis:

from openhcs.processing.backends.experimental_analysis import ExperimentalAnalysisEngine
from openhcs.core.config import ExperimentalAnalysisConfig

# Create configuration
config = ExperimentalAnalysisConfig(
    normalization_method=NormalizationMethod.FOLD_CHANGE,
    export_heatmaps=True,
    auto_detect_format=True
)

# Initialize engine
engine = ExperimentalAnalysisEngine(config)

# Run analysis with automatic format detection
results = engine.run_analysis(
    results_path="microscope_results.xlsx",
    config_file="config.xlsx",
    compiled_results_path="compiled_results.xlsx",
    heatmap_path="heatmaps.xlsx"
)

#### FormatRegistryService

Automatic discovery and management of format handlers:

from openhcs.processing.backends.experimental_analysis import FormatRegistryService

# Get all available formats
registries = FormatRegistryService.get_all_format_registries()
# Returns: {'EDDU_CX5': CX5FormatRegistry, 'EDDU_metaxpress': MetaXpressFormatRegistry}

# Get specific format handler
cx5_registry = FormatRegistryService.get_registry_instance_for_format('EDDU_CX5')

# Automatic format detection from file
format_name = FormatRegistryService.detect_format_from_file('results.xlsx')

Discovery Mechanism: - Uses pkgutil.walk_packages to find all registry implementations - No hardcoded imports required - Automatically registers new format handlers - Caches registry instances for performance

#### MicroscopeFormatRegistryBase

Abstract base class defining the registry interface:

class MicroscopeFormatRegistryBase(ABC):
    FORMAT_NAME: str
    SHEET_NAME: Optional[str]
    SUPPORTED_EXTENSIONS: Tuple[str, ...]

    @abstractmethod
    def extract_features(self, raw_df: pd.DataFrame) -> List[str]:
        """Extract feature column names from raw data."""

    @abstractmethod
    def extract_plate_names(self, raw_df: pd.DataFrame) -> List[str]:
        """Extract plate identifiers from raw data."""

    @abstractmethod
    def create_plates_dict(self, raw_df: pd.DataFrame) -> Dict:
        """Create nested dictionary structure for plate data."""

    def process_data(self, results_path: str) -> Dict:
        """Complete data processing pipeline."""

Registry Pattern Benefits: - Single interface for all formats - Format-specific logic isolated in subclasses - Easy to add new formats - Testable and maintainable

### Excel Configuration Files

The system uses Excel-based configuration files with a structured format:

def read_plate_layout(config_path, design_sheet_name='drug_curve_map'):
    """Parse experimental configuration from Excel file."""
    xls = pd.ExcelFile(config_path)
    # Sheet name is configurable via ExperimentalAnalysisConfig.design_sheet_name
    df = pd.read_excel(xls, design_sheet_name, index_col=0, header=None)

    # Parse global parameters
    N = None          # Number of biological replicates
    scope = None      # Microscope format (EDDU_CX5, EDDU_metaxpress)

    # Parse experimental conditions
    conditions = []   # List of experimental conditions
    layout = {}       # Condition-to-wells mapping

    # Parse control definitions
    ctrl_positions = None  # Control well positions for normalization

Configuration Structure: - Global parameters: N (replicates), Scope (microscope format) - Control definitions: Control wells for normalization - Condition blocks: Experimental conditions with dose-response mapping - Plate groups: Biological replicate to physical plate mapping (configurable via plate_groups_sheet_name)

### Data Processing Pipeline

#### Phase 1: Configuration Parsing

# Parse experimental design
scope, plate_layout, conditions, ctrl_positions = read_plate_layout(config_file)
plate_groups = load_plate_groups(config_file)

# Create experiment location mapping
experiment_dict_locations = make_experiment_dict_locations(
    plate_groups, plate_layout, conditions
)

Output: Structured mapping of conditions → replicates → doses → wells

#### Phase 2: Data Ingestion

def read_results(results_path, scope=None):
    """Read results from microscope-specific Excel format."""
    xls = pd.ExcelFile(results_path)
    if scope == "EDDU_CX5":
        raw_df = pd.read_excel(xls, 'Rawdata')
    elif scope == "EDDU_metaxpress":
        raw_df = pd.read_excel(xls, xls.sheet_names[0])
    return raw_df

Format Support: - CX5 format: ThermoFisher CX5 ‘Rawdata’ sheet structure - MetaXpress format: Molecular Devices MetaXpress export format

#### Phase 3: Data Structure Creation

# Create well-based data structures
well_dict = create_well_dict(df, scope=scope)
plates_dict = create_plates_dict(df, scope=scope)
plates_dict = fill_plates_dict(df, plates_dict, scope=scope)

Data Structures: - well_dict: {well: {feature: value}} - Well-centric feature mapping - plates_dict: {plate: {well: {feature: value}}} - Plate-centric organization

#### Phase 4: Experimental Data Mapping

# Map experimental design to measured values
experiment_dict_values = make_experiment_dict_values(
    plates_dict, experiment_dict_locations, features
)

Output: experiment_dict[condition][replicate][dose] = {feature: [values]}

#### Phase 5: Statistical Analysis

# Control-based normalization
if ctrl_positions is not None:
    experiment_dict_values = normalize_experiment(
        experiment_dict_values, ctrl_positions, features, plates_dict
    )

# Generate feature tables
feature_tables = create_all_feature_tables(experiment_dict_values, features)

### Replicate Management System

#### Biological Replicates

The system handles multiple biological replicates (N1, N2, N3, etc.) with automatic aggregation:

def make_experiment_dict_locations(plate_groups, plate_layout, conditions):
    """Create mapping of experimental conditions to well locations."""
    experiment_dict = {}

    for condition in conditions:
        experiment_dict[condition] = {}
        for replicate in range(1, N+1):  # N biological replicates
            replicate_key = f"N{replicate}"
            experiment_dict[condition][replicate_key] = {}

            # Map doses to wells for this replicate
            for dose_idx, dose in enumerate(doses):
                wells = get_wells_for_replicate_dose(condition, replicate, dose_idx)
                experiment_dict[condition][replicate_key][dose] = wells

#### Technical Replicates

Technical replicates (multiple wells per condition/dose) are automatically detected and averaged:

def process_technical_replicates(experiment_dict_values):
    """Average technical replicates within each condition/dose."""
    for condition in experiment_dict_values:
        for replicate in experiment_dict_values[condition]:
            for dose in experiment_dict_values[condition][replicate]:
                # Multiple wells = technical replicates
                well_values = experiment_dict_values[condition][replicate][dose]
                if len(well_values) > 1:
                    # Average technical replicates
                    averaged_values = np.mean(well_values, axis=0)
                    experiment_dict_values[condition][replicate][dose] = averaged_values

### Normalization System

#### Control-Based Normalization

The system supports control-based normalization for plate-to-plate variation correction:

def normalize_experiment(experiment_dict_values, ctrl_positions, features, plates_dict):
    """Normalize experimental values using control wells."""

    # Calculate control statistics
    control_stats = calculate_control_statistics(ctrl_positions, plates_dict, features)

    # Normalize each experimental condition
    for condition in experiment_dict_values:
        for replicate in experiment_dict_values[condition]:
            for dose in experiment_dict_values[condition][replicate]:
                normalized_values = normalize_to_controls(
                    experiment_dict_values[condition][replicate][dose],
                    control_stats,
                    features
                )
                experiment_dict_values[condition][replicate][dose] = normalized_values

Normalization Methods (configured via NormalizationMethod): - FOLD_CHANGE: value / control_mean (default) - Z_SCORE: (value - control_mean) / control_std - PERCENT_CONTROL: (value / control_mean) * 100

### Feature Extraction System

#### Microscope-Specific Feature Extraction

def get_features(raw_df, scope=None):
    """Extract feature columns based on microscope format."""
    if scope == "EDDU_CX5":
        return get_features_EDDU_CX5(raw_df)
    elif scope == "EDDU_metaxpress":
        return get_features_EDDU_metaxpress(raw_df)

def get_features_EDDU_CX5(raw_df):
    """Extract features from CX5 format."""
    return raw_df.iloc[:, raw_df.columns.str.find("Replicate").argmax()+1:-1].columns

def get_features_EDDU_metaxpress(raw_df):
    """Extract features from MetaXpress format."""
    feature_rows = raw_df[pd.isnull(raw_df.iloc[:,0])].iloc[0].tolist()[2:]
    return feature_rows

Feature Types: - Cell count metrics: Total cells, viable cells, dead cells - Morphological features: Cell area, perimeter, circularity, eccentricity - Intensity measurements: Mean, median, standard deviation per channel - Texture features: Contrast, correlation, energy, homogeneity

### Export System

#### Result Compilation

def create_all_feature_tables(experiment_dict_values, features):
    """Create feature-specific tables for export."""
    feature_tables = {}

    for feature in features:
        feature_table = create_feature_table(experiment_dict_values, feature)
        feature_tables[feature] = feature_table

    return feature_tables

#### Excel Export with Heatmaps

def export_results_with_heatmaps(feature_tables, output_path):
    """Export results with integrated heatmap visualization."""
    with pd.ExcelWriter(output_path, engine='xlsxwriter') as writer:
        for feature_name, feature_table in feature_tables.items():
            # Write data table
            feature_table.to_excel(writer, sheet_name=feature_name)

            # Generate heatmap
            create_heatmap_visualization(feature_table, writer, feature_name)

## Integration Points

### Pipeline Integration

The experimental analysis system integrates with OpenHCS pipelines through the analysis consolidation system:

# Integration with analysis consolidation
from openhcs.processing.backends.analysis.consolidate_analysis_results import (
    consolidate_analysis_results_pipeline
)

# Experimental analysis can feed into consolidation
consolidated_results = consolidate_analysis_results_pipeline(
    image_stack=processed_images,
    results_directory=experimental_results_dir,
    consolidation_config=AnalysisConsolidationConfig(),
    plate_metadata_config=PlateMetadataConfig()
)

### Configuration System Integration

The experimental analysis system can be configured through the global configuration system:

from enum import Enum
from dataclasses import dataclass
from typing import Optional

class NormalizationMethod(Enum):
    """Normalization methods for experimental analysis."""
    FOLD_CHANGE = "fold_change"      # value / control_mean
    Z_SCORE = "z_score"              # (value - control_mean) / control_std
    PERCENT_CONTROL = "percent_control"  # (value / control_mean) * 100

class MicroscopeFormat(Enum):
    """Supported microscope formats for experimental analysis."""
    EDDU_CX5 = "EDDU_CX5"                # ThermoFisher CX5 format
    EDDU_METAXPRESS = "EDDU_metaxpress"  # Molecular Devices MetaXpress format

@dataclass(frozen=True)
class ExperimentalAnalysisConfig:
    """Configuration for experimental analysis system."""
    config_file_name: str = "config.xlsx"
    """Name of the experimental configuration Excel file."""

    design_sheet_name: str = "drug_curve_map"
    """Name of the sheet containing experimental design."""

    plate_groups_sheet_name: str = "plate_groups"
    """Name of the sheet containing plate group mappings."""

    normalization_method: NormalizationMethod = NormalizationMethod.FOLD_CHANGE
    """Normalization method for control-based normalization."""

    export_raw_results: bool = True
    """Whether to export raw (non-normalized) results."""

    export_heatmaps: bool = True
    """Whether to generate heatmap visualizations."""

    auto_detect_format: bool = True
    """Whether to automatically detect microscope format."""

    default_format: Optional[MicroscopeFormat] = None
    """Default format to use if auto-detection fails."""

Configuration Features: - Enum-based type safety: Normalization methods and formats use enums to prevent invalid values - Configurable sheet names: Excel sheet names can be customized for different workflows - Automatic format detection: System can detect CX5 vs MetaXpress automatically - Flexible export options: Control which outputs are generated

## Performance Characteristics

### Memory Efficiency

Lazy loading: Results loaded on-demand to minimize memory usage
Chunked processing: Large datasets processed in chunks
Efficient data structures: Optimized pandas DataFrames for statistical operations

### Scalability

Multi-plate support: Handles experiments across multiple physical plates
Variable replicate numbers: Supports any number of biological replicates
Flexible condition numbers: No limit on experimental conditions per plate

### Statistical Robustness

Outlier detection: Automatic identification of statistical outliers
Missing data handling: Robust handling of missing wells or failed measurements
Quality control metrics: Automatic calculation of assay quality metrics (Z-factor, etc.)

The experimental analysis system provides comprehensive support for high-content screening experimental workflows, from initial experimental design through final statistical analysis and visualization, ensuring robust and reproducible analysis of complex multi-condition, multi-replicate experiments.