Data Dimensions

High-content screening data has multiple dimensions - wells, sites, channels, Z-planes, and timepoints. OpenHCS provides systematic ways to organize processing across these dimensions through variable components and group by parameters.

Understanding Microscopy Data Dimensions

Typical HCS data structure:

Plate/
├── A01_s1_w1.tif    # Well A01, Site 1, Channel 1
├── A01_s1_w2.tif    # Well A01, Site 1, Channel 2
├── A01_s2_w1.tif    # Well A01, Site 2, Channel 1
├── A01_s2_w2.tif    # Well A01, Site 2, Channel 2
├── A02_s1_w1.tif    # Well A02, Site 1, Channel 1
└── ...

Dimensions:

  • Well: Sample position (A01, A02, B01, etc.) - represents experimental conditions

  • Site: Imaging position within a well (1, 2, 3, etc.) - multiple fields of view

  • Channel: Fluorescence channel (1, 2, 3, etc.) - different markers or wavelengths

  • Z-Index: Z-plane depth (1, 2, 3, etc.) - for 3D imaging

  • Timepoint: Time series point (1, 2, 3, etc.) - for live imaging

Variable Components

Variable components tell OpenHCS how to group files for processing. They define which dimensions vary within each processing group.

from openhcs.constants.constants import VariableComponents

# Available variable components
VariableComponents.SITE       # Process each site separately
VariableComponents.CHANNEL    # Process each channel separately
VariableComponents.Z_INDEX    # Process each Z-plane separately
VariableComponents.TIMEPOINT  # Process each timepoint separately
VariableComponents.WELL       # Process each well separately

Most Common: Process Each Site Separately

from openhcs.core.steps.function_step import FunctionStep

# Process each site independently (most common pattern)
step = FunctionStep(
    func=(normalize_images, {}),
    variable_components=[VariableComponents.SITE],
    name="normalize"
)

What this does: Groups files by (well, channel, z_index) and processes each site separately.

Example grouping: - Group 1: [A01_s1_w1.tif, A01_s1_w2.tif] → Process site 1 of well A01 - Group 2: [A01_s2_w1.tif, A01_s2_w2.tif] → Process site 2 of well A01 - Group 3: [A02_s1_w1.tif, A02_s1_w2.tif] → Process site 1 of well A02

When to use: Most image processing operations (filtering, segmentation, analysis) that work on complete images from one imaging position.

Process Each Channel Separately

# Process each channel independently
step = FunctionStep(
    func=(create_composite, {}),
    variable_components=[VariableComponents.CHANNEL],
    name="composite"
)

What this does: Groups files by (well, site, z_index) and processes each channel separately.

Example grouping: - Group 1: [A01_s1_w1.tif, A01_s2_w1.tif] → Process channel 1 across all sites - Group 2: [A01_s1_w2.tif, A01_s2_w2.tif] → Process channel 2 across all sites

When to use: Operations that combine data across sites for each channel (creating channel composites, channel-specific normalization).

Process Each Z-Plane Separately

# Process each Z-plane independently for 3D analysis
step = FunctionStep(
    func=(max_projection_across_sites, {}),
    variable_components=[VariableComponents.Z_INDEX],
    name="z_projection"
)

What this does: Groups files by (well, site, channel) and processes each Z-plane separately.

Process Each Timepoint Separately

# Process each timepoint independently for time series analysis
step = FunctionStep(
    func=(track_cells_over_time, {}),
    variable_components=[VariableComponents.TIMEPOINT],
    name="cell_tracking"
)

What this does: Groups files by (well, site, channel, z_index) and processes each timepoint separately.

Example grouping (ImageXpress format): - Group 1: [A01_s1_w1_t001.tif] → Process timepoint 1 - Group 2: [A01_s1_w1_t002.tif] → Process timepoint 2 - Group 3: [A01_s1_w1_t003.tif] → Process timepoint 3

When to use: Time series analysis, cell tracking, temporal dynamics studies.

Microscope Format Support:

  • ImageXpress: _t001, _t002, etc. (e.g., A01_s1_w1_t003.tif)

  • Opera Phenix: sk1, sk2, etc. (e.g., r01c01f01p01-ch1sk5.tif)

  • OMERO: Timepoint dimension from OMERO metadata

  • OpenHCS: _t001, _t002, etc. (native format)

Accessing Timepoint Values:

All microscope handlers provide get_timepoint_values() to retrieve available timepoints:

from openhcs.microscopes import get_microscope_handler

handler = get_microscope_handler('/path/to/plate', microscope_type='imagexpress')

# Get all timepoints for a specific well
timepoints = handler.get_timepoint_values(well_id='A01')
# Returns: ['001', '002', '003', ...]

# Timepoint is now a first-class component dimension
# alongside channel, z-index, site, and well

When to use: 3D imaging where you need to combine or analyze across Z-planes, such as creating maximum projections.

Multiple Variable Components

# Process each site and channel combination separately
step = FunctionStep(
    func=(single_image_analysis, {}),
    variable_components=[VariableComponents.SITE, VariableComponents.CHANNEL],
    name="single_image"
)

What this does: Creates separate groups for each unique combination of site and channel.

When to use: Operations that work on individual images rather than image stacks.

Group By Parameter

The group_by parameter works with dictionary function patterns to route different data to different functions.

from openhcs.constants.constants import GroupBy

# Route different channels to different functions
step = FunctionStep(
    func={
        '1': (analyze_nuclei, {}),    # Channel 1 → nuclei analysis
        '2': (analyze_neurites, {})   # Channel 2 → neurite analysis
    },
    group_by=GroupBy.CHANNEL,
    variable_components=[VariableComponents.SITE]
)

How Group By Works

  1. Data Grouping: Files are first grouped by variable_components

  2. Function Routing: Within each group, data is routed to functions based on group_by

  3. Execution: Each function processes its assigned data

Example with channel routing:

Files: A01_s1_w1.tif, A01_s1_w2.tif

Step 1 - Group by variable_components=[SITE]:
Group: [A01_s1_w1.tif, A01_s1_w2.tif]  # Same site

Step 2 - Route by group_by=CHANNEL:
Channel 1: A01_s1_w1.tif → analyze_nuclei()
Channel 2: A01_s1_w2.tif → analyze_neurites()

Available Group By Options

GroupBy.CHANNEL   # Route by channel number (primary use case)

Common Data Organization Patterns

Site-by-Site Processing

Most common pattern for standard image processing:

# Process each imaging site independently
step = FunctionStep(
    func=(segment_cells, {}),
    variable_components=[VariableComponents.SITE],
    name="segmentation"
)

Use cases: Filtering, segmentation, feature extraction, most analysis operations.

Channel-Specific Analysis

Different analysis for different fluorescent markers:

# Different analysis for each channel
step = FunctionStep(
    func={
        '1': (count_nuclei, {}),        # DAPI channel
        '2': (measure_intensity, {}),   # GFP channel
        '3': (detect_structures, {})    # RFP channel
    },
    group_by=GroupBy.CHANNEL,
    variable_components=[VariableComponents.SITE],
    name="channel_analysis"
)

Use cases: Multi-marker experiments where each channel represents different biological features.

Multi-Channel Processing

Different processing for different fluorescent markers:

# Different preprocessing for different channels
step = FunctionStep(
    func={
        '1': [(gaussian_filter, {'sigma': 1.0}), (tophat, {'selem_radius': 15})],     # DAPI preprocessing
        '2': [(gaussian_filter, {'sigma': 2.0}), (enhance_contrast, {'percentile_range': (1, 99)})]    # GFP preprocessing
    },
    group_by=GroupBy.CHANNEL,
    variable_components=[VariableComponents.SITE],
    name="channel_preprocessing"
)

Use cases: Multi-marker experiments where each channel requires different preprocessing approaches.

Z-Stack Processing

Processing 3D image stacks:

# Combine Z-planes into maximum projection
step = FunctionStep(
    func=(max_projection, {}),
    variable_components=[VariableComponents.Z_INDEX],
    name="z_projection"
)

Use cases: 3D imaging where you need to combine or analyze across Z-planes.

Choosing the Right Organization

Consider Your Analysis Goal:

  • Single image operations: Use [SITE, CHANNEL] to process individual images

  • Multi-channel analysis: Use [SITE] with channel-specific functions

  • Cross-site analysis: Use [CHANNEL] to combine data across sites

  • Well-level summaries: Use [WELL] to analyze entire wells

Consider Your Data Structure:

  • 2D images: Typically use [SITE]

  • 3D stacks: May need [Z_INDEX] for projection operations

  • Time series: May need [TIME] for temporal analysis

  • Multi-condition: Use group_by=WELL for condition-specific processing

Performance Considerations:

  • Parallel processing: More variable components = more parallel groups

  • Memory usage: Fewer variable components = larger data groups = more memory per group

  • I/O efficiency: Group organization affects how data is loaded and cached

The data dimensions system provides systematic control over how your analysis processes the multiple dimensions of HCS data, enabling both simple single-image operations and complex multi-dimensional workflows.