Experimental Analysis System
OpenHCS provides a comprehensive experimental analysis system for processing high-content screening data from ThermoFisher CX5 and MetaXpress systems, with support for complex experimental designs, replicate management, and statistical analysis.
## System Overview
The experimental analysis system handles the complete workflow from experimental design configuration to statistical analysis and visualization:
Configuration parsing: Excel-based experimental design definition
Data ingestion: Support for CX5 and MetaXpress result formats
Replicate management: Biological and technical replicate handling
Statistical analysis: Control-based normalization and dose-response analysis
Result export: Compiled results and heatmap visualization
## Architecture Components
### Modern Registry-Based Architecture
The experimental analysis system uses a registry pattern to eliminate code duplication and provide a unified interface for multiple microscope formats.
- openhcs.processing.backends.experimental_analysis.unified_analysis_engine
Main entry point -
ExperimentalAnalysisEngineclass provides unified analysis with automatic format detection- openhcs.processing.backends.experimental_analysis.format_registry_service
FormatRegistryService- automatic discovery and management of format handlers- openhcs.processing.backends.experimental_analysis.format_registry
MicroscopeFormatRegistryBase- abstract base class defining the registry interface- openhcs.processing.backends.experimental_analysis.cx5_registry
CX5FormatRegistry- ThermoFisher CX5 format implementation- openhcs.processing.backends.experimental_analysis.metaxpress_registry
MetaXpressFormatRegistry- Molecular Devices MetaXpress format implementation
### Legacy Modules (Deprecated)
- openhcs.formats.experimental_analysis
Legacy analysis functions - use
ExperimentalAnalysisEnginefor new code- openhcs.formats.metaxpress
Legacy MetaXpress support - integrated into registry system
### Registry System Architecture
The experimental analysis system uses a registry pattern to handle multiple microscope formats through a unified interface.
#### ExperimentalAnalysisEngine
The main entry point for experimental analysis:
from openhcs.processing.backends.experimental_analysis import ExperimentalAnalysisEngine
from openhcs.core.config import ExperimentalAnalysisConfig
# Create configuration
config = ExperimentalAnalysisConfig(
normalization_method=NormalizationMethod.FOLD_CHANGE,
export_heatmaps=True,
auto_detect_format=True
)
# Initialize engine
engine = ExperimentalAnalysisEngine(config)
# Run analysis with automatic format detection
results = engine.run_analysis(
results_path="microscope_results.xlsx",
config_file="config.xlsx",
compiled_results_path="compiled_results.xlsx",
heatmap_path="heatmaps.xlsx"
)
#### FormatRegistryService
Automatic discovery and management of format handlers:
from openhcs.processing.backends.experimental_analysis import FormatRegistryService
# Get all available formats
registries = FormatRegistryService.get_all_format_registries()
# Returns: {'EDDU_CX5': CX5FormatRegistry, 'EDDU_metaxpress': MetaXpressFormatRegistry}
# Get specific format handler
cx5_registry = FormatRegistryService.get_registry_instance_for_format('EDDU_CX5')
# Automatic format detection from file
format_name = FormatRegistryService.detect_format_from_file('results.xlsx')
Discovery Mechanism:
- Uses pkgutil.walk_packages to find all registry implementations
- No hardcoded imports required
- Automatically registers new format handlers
- Caches registry instances for performance
#### MicroscopeFormatRegistryBase
Abstract base class defining the registry interface:
class MicroscopeFormatRegistryBase(ABC):
FORMAT_NAME: str
SHEET_NAME: Optional[str]
SUPPORTED_EXTENSIONS: Tuple[str, ...]
@abstractmethod
def extract_features(self, raw_df: pd.DataFrame) -> List[str]:
"""Extract feature column names from raw data."""
@abstractmethod
def extract_plate_names(self, raw_df: pd.DataFrame) -> List[str]:
"""Extract plate identifiers from raw data."""
@abstractmethod
def create_plates_dict(self, raw_df: pd.DataFrame) -> Dict:
"""Create nested dictionary structure for plate data."""
def process_data(self, results_path: str) -> Dict:
"""Complete data processing pipeline."""
Registry Pattern Benefits: - Single interface for all formats - Format-specific logic isolated in subclasses - Easy to add new formats - Testable and maintainable
### Excel Configuration Files
The system uses Excel-based configuration files with a structured format:
def read_plate_layout(config_path, design_sheet_name='drug_curve_map'):
"""Parse experimental configuration from Excel file."""
xls = pd.ExcelFile(config_path)
# Sheet name is configurable via ExperimentalAnalysisConfig.design_sheet_name
df = pd.read_excel(xls, design_sheet_name, index_col=0, header=None)
# Parse global parameters
N = None # Number of biological replicates
scope = None # Microscope format (EDDU_CX5, EDDU_metaxpress)
# Parse experimental conditions
conditions = [] # List of experimental conditions
layout = {} # Condition-to-wells mapping
# Parse control definitions
ctrl_positions = None # Control well positions for normalization
Configuration Structure:
- Global parameters: N (replicates), Scope (microscope format)
- Control definitions: Control wells for normalization
- Condition blocks: Experimental conditions with dose-response mapping
- Plate groups: Biological replicate to physical plate mapping (configurable via plate_groups_sheet_name)
### Data Processing Pipeline
#### Phase 1: Configuration Parsing
# Parse experimental design
scope, plate_layout, conditions, ctrl_positions = read_plate_layout(config_file)
plate_groups = load_plate_groups(config_file)
# Create experiment location mapping
experiment_dict_locations = make_experiment_dict_locations(
plate_groups, plate_layout, conditions
)
Output: Structured mapping of conditions → replicates → doses → wells
#### Phase 2: Data Ingestion
def read_results(results_path, scope=None):
"""Read results from microscope-specific Excel format."""
xls = pd.ExcelFile(results_path)
if scope == "EDDU_CX5":
raw_df = pd.read_excel(xls, 'Rawdata')
elif scope == "EDDU_metaxpress":
raw_df = pd.read_excel(xls, xls.sheet_names[0])
return raw_df
Format Support: - CX5 format: ThermoFisher CX5 ‘Rawdata’ sheet structure - MetaXpress format: Molecular Devices MetaXpress export format
#### Phase 3: Data Structure Creation
# Create well-based data structures
well_dict = create_well_dict(df, scope=scope)
plates_dict = create_plates_dict(df, scope=scope)
plates_dict = fill_plates_dict(df, plates_dict, scope=scope)
Data Structures:
- well_dict: {well: {feature: value}} - Well-centric feature mapping
- plates_dict: {plate: {well: {feature: value}}} - Plate-centric organization
#### Phase 4: Experimental Data Mapping
# Map experimental design to measured values
experiment_dict_values = make_experiment_dict_values(
plates_dict, experiment_dict_locations, features
)
Output: experiment_dict[condition][replicate][dose] = {feature: [values]}
#### Phase 5: Statistical Analysis
# Control-based normalization
if ctrl_positions is not None:
experiment_dict_values = normalize_experiment(
experiment_dict_values, ctrl_positions, features, plates_dict
)
# Generate feature tables
feature_tables = create_all_feature_tables(experiment_dict_values, features)
### Replicate Management System
#### Biological Replicates
The system handles multiple biological replicates (N1, N2, N3, etc.) with automatic aggregation:
def make_experiment_dict_locations(plate_groups, plate_layout, conditions):
"""Create mapping of experimental conditions to well locations."""
experiment_dict = {}
for condition in conditions:
experiment_dict[condition] = {}
for replicate in range(1, N+1): # N biological replicates
replicate_key = f"N{replicate}"
experiment_dict[condition][replicate_key] = {}
# Map doses to wells for this replicate
for dose_idx, dose in enumerate(doses):
wells = get_wells_for_replicate_dose(condition, replicate, dose_idx)
experiment_dict[condition][replicate_key][dose] = wells
#### Technical Replicates
Technical replicates (multiple wells per condition/dose) are automatically detected and averaged:
def process_technical_replicates(experiment_dict_values):
"""Average technical replicates within each condition/dose."""
for condition in experiment_dict_values:
for replicate in experiment_dict_values[condition]:
for dose in experiment_dict_values[condition][replicate]:
# Multiple wells = technical replicates
well_values = experiment_dict_values[condition][replicate][dose]
if len(well_values) > 1:
# Average technical replicates
averaged_values = np.mean(well_values, axis=0)
experiment_dict_values[condition][replicate][dose] = averaged_values
### Normalization System
#### Control-Based Normalization
The system supports control-based normalization for plate-to-plate variation correction:
def normalize_experiment(experiment_dict_values, ctrl_positions, features, plates_dict):
"""Normalize experimental values using control wells."""
# Calculate control statistics
control_stats = calculate_control_statistics(ctrl_positions, plates_dict, features)
# Normalize each experimental condition
for condition in experiment_dict_values:
for replicate in experiment_dict_values[condition]:
for dose in experiment_dict_values[condition][replicate]:
normalized_values = normalize_to_controls(
experiment_dict_values[condition][replicate][dose],
control_stats,
features
)
experiment_dict_values[condition][replicate][dose] = normalized_values
Normalization Methods (configured via NormalizationMethod):
- FOLD_CHANGE: value / control_mean (default)
- Z_SCORE: (value - control_mean) / control_std
- PERCENT_CONTROL: (value / control_mean) * 100
### Feature Extraction System
#### Microscope-Specific Feature Extraction
def get_features(raw_df, scope=None):
"""Extract feature columns based on microscope format."""
if scope == "EDDU_CX5":
return get_features_EDDU_CX5(raw_df)
elif scope == "EDDU_metaxpress":
return get_features_EDDU_metaxpress(raw_df)
def get_features_EDDU_CX5(raw_df):
"""Extract features from CX5 format."""
return raw_df.iloc[:, raw_df.columns.str.find("Replicate").argmax()+1:-1].columns
def get_features_EDDU_metaxpress(raw_df):
"""Extract features from MetaXpress format."""
feature_rows = raw_df[pd.isnull(raw_df.iloc[:,0])].iloc[0].tolist()[2:]
return feature_rows
Feature Types: - Cell count metrics: Total cells, viable cells, dead cells - Morphological features: Cell area, perimeter, circularity, eccentricity - Intensity measurements: Mean, median, standard deviation per channel - Texture features: Contrast, correlation, energy, homogeneity
### Export System
#### Result Compilation
def create_all_feature_tables(experiment_dict_values, features):
"""Create feature-specific tables for export."""
feature_tables = {}
for feature in features:
feature_table = create_feature_table(experiment_dict_values, feature)
feature_tables[feature] = feature_table
return feature_tables
#### Excel Export with Heatmaps
def export_results_with_heatmaps(feature_tables, output_path):
"""Export results with integrated heatmap visualization."""
with pd.ExcelWriter(output_path, engine='xlsxwriter') as writer:
for feature_name, feature_table in feature_tables.items():
# Write data table
feature_table.to_excel(writer, sheet_name=feature_name)
# Generate heatmap
create_heatmap_visualization(feature_table, writer, feature_name)
## Integration Points
### Pipeline Integration
The experimental analysis system integrates with OpenHCS pipelines through the analysis consolidation system:
# Integration with analysis consolidation
from openhcs.processing.backends.analysis.consolidate_analysis_results import (
consolidate_analysis_results_pipeline
)
# Experimental analysis can feed into consolidation
consolidated_results = consolidate_analysis_results_pipeline(
image_stack=processed_images,
results_directory=experimental_results_dir,
consolidation_config=AnalysisConsolidationConfig(),
plate_metadata_config=PlateMetadataConfig()
)
### Configuration System Integration
The experimental analysis system can be configured through the global configuration system:
from enum import Enum
from dataclasses import dataclass
from typing import Optional
class NormalizationMethod(Enum):
"""Normalization methods for experimental analysis."""
FOLD_CHANGE = "fold_change" # value / control_mean
Z_SCORE = "z_score" # (value - control_mean) / control_std
PERCENT_CONTROL = "percent_control" # (value / control_mean) * 100
class MicroscopeFormat(Enum):
"""Supported microscope formats for experimental analysis."""
EDDU_CX5 = "EDDU_CX5" # ThermoFisher CX5 format
EDDU_METAXPRESS = "EDDU_metaxpress" # Molecular Devices MetaXpress format
@dataclass(frozen=True)
class ExperimentalAnalysisConfig:
"""Configuration for experimental analysis system."""
config_file_name: str = "config.xlsx"
"""Name of the experimental configuration Excel file."""
design_sheet_name: str = "drug_curve_map"
"""Name of the sheet containing experimental design."""
plate_groups_sheet_name: str = "plate_groups"
"""Name of the sheet containing plate group mappings."""
normalization_method: NormalizationMethod = NormalizationMethod.FOLD_CHANGE
"""Normalization method for control-based normalization."""
export_raw_results: bool = True
"""Whether to export raw (non-normalized) results."""
export_heatmaps: bool = True
"""Whether to generate heatmap visualizations."""
auto_detect_format: bool = True
"""Whether to automatically detect microscope format."""
default_format: Optional[MicroscopeFormat] = None
"""Default format to use if auto-detection fails."""
Configuration Features: - Enum-based type safety: Normalization methods and formats use enums to prevent invalid values - Configurable sheet names: Excel sheet names can be customized for different workflows - Automatic format detection: System can detect CX5 vs MetaXpress automatically - Flexible export options: Control which outputs are generated
## Performance Characteristics
### Memory Efficiency
Lazy loading: Results loaded on-demand to minimize memory usage
Chunked processing: Large datasets processed in chunks
Efficient data structures: Optimized pandas DataFrames for statistical operations
### Scalability
Multi-plate support: Handles experiments across multiple physical plates
Variable replicate numbers: Supports any number of biological replicates
Flexible condition numbers: No limit on experimental conditions per plate
### Statistical Robustness
Outlier detection: Automatic identification of statistical outliers
Missing data handling: Robust handling of missing wells or failed measurements
Quality control metrics: Automatic calculation of assay quality metrics (Z-factor, etc.)
The experimental analysis system provides comprehensive support for high-content screening experimental workflows, from initial experimental design through final statistical analysis and visualization, ensuring robust and reproducible analysis of complex multi-condition, multi-replicate experiments.