Pattern Grouping and Special Output Path Resolution
Overview
This document explains the fundamental principles of how OpenHCS handles pattern grouping, special output path resolution, and the interaction between group_by, variable_components, and the special I/O system. Understanding these principles is essential for debugging path collisions and understanding compilation behavior.
Critical Insight: The group_by parameter serves dual purposes:
1. For dict patterns: Specifies what dimension the dictionary keys represent
2. For list patterns: Controls pattern grouping and special output namespacing
This dual purpose is intentional and enables compile-time path planning with deterministic, semantic paths.
First Principles
Pattern Types and Internal Representation
OpenHCS internally normalizes all function patterns to a dictionary format for uniform processing:
Dict Patterns (explicit):
# User writes:
FunctionStep(
func={'1': analyze_nuclei, '2': analyze_gfp},
group_by=GroupBy.CHANNEL
)
# Internal representation: (unchanged)
{'1': analyze_nuclei, '2': analyze_gfp}
List Patterns (normalized to dict):
# User writes:
FunctionStep(func=[normalize, tophat])
# Internal representation:
{'default': [normalize, tophat]}
Single Patterns (normalized to dict):
# User writes:
FunctionStep(func=(normalize, {}))
# Internal representation:
{'default': (normalize, {})}
The “default” Key Convention:
- String "default" is used as the internal dict key for non-dict patterns
- Converted to None when used as a special output group key
- See openhcs/formats/func_arg_prep.py::iter_pattern_items() and openhcs/core/pipeline/path_planner.py::extract_attributes()
The Role of group_by
For Dict Patterns: Semantic Meaning of Keys
The group_by parameter tells OpenHCS what dimension the dictionary keys represent:
# Keys are channel numbers
FunctionStep(
func={'1': analyze_nuclei, '2': analyze_gfp},
group_by=GroupBy.CHANNEL # Keys '1' and '2' are channel numbers
)
# Keys are well identifiers
FunctionStep(
func={'control': process_control, 'treatment': process_treatment},
group_by=GroupBy.WELL # Keys are well names
)
For List Patterns: Pattern Grouping and Output Namespacing
For list patterns, group_by controls how patterns are organized during discovery and how special outputs are namespaced:
# WITHOUT group_by: All patterns processed together
FunctionStep(
func=(normalize, {}),
group_by=None, # or GroupBy.NONE
variable_components=[VariableComponents.SITE, VariableComponents.CHANNEL]
)
# Pattern discovery returns:
# {"default": ["A01_s{iii}_w1_z001.tif", "A01_s{iii}_w2_z001.tif"]}
# Special outputs: All patterns write to SAME path → COLLISION!
# WITH group_by: Patterns grouped by component
FunctionStep(
func=(normalize, {}),
group_by=GroupBy.CHANNEL,
variable_components=[VariableComponents.SITE, VariableComponents.CHANNEL]
)
# Pattern discovery returns:
# {"1": ["A01_s{iii}_w1_z001.tif"], "2": ["A01_s{iii}_w2_z001.tif"]}
# Special outputs: Each channel gets its own path → NO COLLISION!
The Dual Purpose of group_by
Why does ``group_by`` affect list patterns?
The system uses group_by for list patterns to enable:
Semantic Grouping: Patterns are organized by meaningful component values (channel 1, channel 2) rather than arbitrary indices
Deterministic Paths: Special output paths are known at compile time, not runtime
Cross-Step Communication: Later steps can reference special outputs by component (e.g., “get cell_counts for channel 1”)
Compile-Time Validation: Path collisions are detected during compilation, not execution
Alternative Considered: Runtime collision detection with auto-generated suffixes
# Runtime collision detection (NOT IMPLEMENTED):
if vfs_path in already_saved_paths:
vfs_path = f"{vfs_path.stem}_pattern{index}{vfs_path.suffix}"
# Problems:
# 1. Non-deterministic paths (unknown until runtime)
# 2. Cross-step communication breaks (can't reference by name)
# 3. Loss of semantic meaning (cell_counts_pattern0.pkl vs cell_counts_w1.pkl)
Conclusion: Using group_by for list patterns provides compile-time guarantees and semantic clarity.
Pattern Discovery and Grouping Flow
Understanding the complete flow from pattern discovery to execution is essential for debugging path issues.
Step 1: Pattern Discovery
The PatternDiscoveryEngine discovers patterns based on variable_components:
# Configuration:
variable_components = [VariableComponents.SITE, VariableComponents.CHANNEL]
group_by = GroupBy.CHANNEL
# Files in directory:
# A01_s001_w1_z001_t001.tif
# A01_s001_w2_z001_t001.tif
# A01_s002_w1_z001_t001.tif
# A01_s002_w2_z001_t001.tif
# Step 1: Generate patterns (based on variable_components)
patterns = [
"A01_s{iii}_w1_z001_t001.tif", # Site varies, channel=1 fixed
"A01_s{iii}_w2_z001_t001.tif" # Site varies, channel=2 fixed
]
# Step 2: Group patterns (based on group_by)
if group_by == GroupBy.CHANNEL:
# Parse each pattern to extract channel value
# "A01_s{iii}_w1_z001_t001.tif" → replace {iii} with 001 → parse → channel='1'
grouped_patterns = {
"1": ["A01_s{iii}_w1_z001_t001.tif"],
"2": ["A01_s{iii}_w2_z001_t001.tif"]
}
else:
# No grouping
grouped_patterns = ["A01_s{iii}_w1_z001_t001.tif", "A01_s{iii}_w2_z001_t001.tif"]
Key Files:
- openhcs/formats/pattern/pattern_discovery.py::auto_detect_patterns() (line 233-277)
- openhcs/formats/pattern/pattern_discovery.py::group_patterns_by_component() (line 157-202)
Step 2: Path Planning (Compilation)
The PathPlanner determines execution groups and creates special output paths:
# For list patterns with group_by:
def _get_execution_groups(step, step_index):
# Resolve group_by via ObjectState (handles lazy dataclasses)
group_by = step_state.get_saved_resolved_value("processing_config.group_by")
if group_by and group_by != GroupBy.NONE:
# Get component keys from orchestrator
return ["1", "2"] # For CHANNEL with 2 channels
else:
return [None] # No grouping
# Create special output paths for each group:
execution_groups = ["1", "2"]
for output_key in special_outputs:
paths_by_group = {
"1": "A01_w1_cell_counts_step7.pkl",
"2": "A01_w2_cell_counts_step7.pkl"
}
Critical: The path planner must use ObjectState.get_saved_resolved_value() to resolve group_by from lazy dataclasses, NOT direct getattr() access.
Key Files:
- openhcs/core/pipeline/path_planner.py::_get_execution_groups() (line 105-146)
- openhcs/core/pipeline/path_planner.py::_build_paths_by_group() (line 145-157)
- openhcs/core/pipeline/compiler.py::initialize_step_plans_for_context() (line 495-505)
Step 3: Pattern Preparation (Execution)
The prepare_patterns_and_functions() normalizes patterns to dict format:
# Input from pattern discovery:
patterns = {"1": ["w1_pattern"], "2": ["w2_pattern"]} # Already grouped
func = (normalize, {}) # List pattern
# Normalization:
grouped_patterns = patterns # Already a dict, use as-is
component_to_funcs = {
"1": [(normalize, {})], # Same function for channel 1
"2": [(normalize, {})] # Same function for channel 2
}
Key Files:
- openhcs/formats/func_arg_prep.py::prepare_patterns_and_functions() (line 96-273)
Step 4: Execution Loop
The execution loop processes each component group separately:
# For each component value:
for comp_val, pattern_list in grouped_patterns.items():
# comp_val = "1" or "2"
exec_func = component_to_funcs[comp_val]
# For each pattern in this component group:
for pattern in pattern_list:
_process_single_pattern_group(
...,
component_value=comp_val, # "1" or "2"
...
)
# Inside _process_single_pattern_group:
component_key = str(component_value) # "1" or "2"
# Select special outputs for this component:
special_outputs_for_component = _select_special_plan_for_component(
special_outputs_by_group, # {"1": {...}, "2": {...}}
component_key, # "1" or "2"
default_plan
)
# Returns: {"cell_counts": {"path": "A01_w1_cell_counts_step7.pkl"}}
Key Files:
- openhcs/core/steps/function_step.py::process() (line 1316-1356)
- openhcs/core/steps/function_step.py::_process_single_pattern_group() (line 701-900)
- openhcs/core/steps/function_step.py::_select_special_plan_for_component() (line 78-90)
Common Issues and Debugging
Issue 1: Special Output Path Collision with List Patterns
Symptom:
FileExistsError: Path already exists: /path/to/results/A01_cell_counts_step7.pkl
Root Cause: Multiple patterns trying to write to the same special output path.
Diagnosis:
Check if
group_byis being resolved correctly during path planning:# Add debug logging in path_planner.py::_get_execution_groups() logger.info(f"🔍 PATH_PLANNER: group_by={group_by} (via ObjectState)") logger.info(f"🔍 PATH_PLANNER: Resolved groups: {result}")
Check if
special_outputs_by_grouphas the expected groups:# Add debug logging in function_step.py::_process_single_pattern_group() logger.info(f"🔍 AVAILABLE_GROUPS: {list(special_outputs_by_group.keys())}") logger.info(f"🔍 COMPONENT_KEY: {component_key}")
Common Causes:
``group_by`` is ``None`` during path planning: The path planner is reading
group_bybefore it’s been resolved from lazy dataclassFix: Ensure
step_state_mapis passed toPathPlannerand useObjectState.get_saved_resolved_value()Pattern discovery not grouping patterns:
auto_detect_patterns()not receivinggroup_byparameterFix: Ensure
group_byis passed fromFunctionStep.process()tomicroscope_handler.auto_detect_patterns()Orchestrator not initialized: Cannot resolve component keys for
group_byFix: Ensure orchestrator is initialized before compilation
Issue 2: group_by Resolves to None During Compilation
Symptom: Path planner logs show group_by=None even though step configuration has group_by=GroupBy.CHANNEL
Root Cause: Lazy dataclass not resolved via ObjectState during path planning.
Diagnosis:
# Check if using direct getattr (WRONG):
group_by = getattr(step.processing_config, "group_by", None) # Returns unresolved lazy value
# Should use ObjectState (CORRECT):
group_by = step_state.get_saved_resolved_value("processing_config.group_by")
Fix: Update PathPlanner._get_execution_groups() to accept step_index and use step_state_map for resolution.
Key Commit: The compiler was refactored to resolve step attributes via ObjectState instead of getattr with fallback.
Issue 3: Understanding “default” vs None in Group Keys
Confusion: Why do some logs show "default" and others show None?
Explanation:
Internal dict keys: Use string
"default"for non-dict patterns (seefunc_arg_prep.py::iter_pattern_items())Special output group keys: Convert
"default"toNone(seepath_planner.py::extract_attributes()line 59)
# Internal representation:
grouped_patterns = {"default": [pattern1, pattern2]}
# Special output groups:
special_outputs_by_group = {None: {"cell_counts": {"path": "..."}}}
# Conversion happens here:
normalized_key = None if group_key == "default" else group_key
When you see:
- dict_key_for_funcplan = "default": List/single pattern execution
- special_outputs_by_group = {None: ...}: Ungrouped special outputs
- component_key = None: No component grouping
Variable Components vs Group By
Understanding the Difference
``variable_components``: Controls which components vary during pattern discovery
variable_components = [VariableComponents.SITE, VariableComponents.CHANNEL]
# Discovers patterns where site and channel vary:
# "A01_s{iii}_w1_z001.tif" ← site varies, channel=1 fixed
# "A01_s{iii}_w2_z001.tif" ← site varies, channel=2 fixed
``group_by``: Controls how discovered patterns are organized and how special outputs are namespaced
group_by = GroupBy.CHANNEL
# Groups patterns by channel value:
# {"1": ["A01_s{iii}_w1_z001.tif"], "2": ["A01_s{iii}_w2_z001.tif"]}
# Creates channel-specific special output paths
Key Distinction:
- variable_components: “What varies in the pattern?” (pattern discovery)
- group_by: “How should we organize the patterns?” (grouping and namespacing)
Example Combinations
Example 1: Site varies, no grouping
FunctionStep(
func=(normalize, {}),
variable_components=[VariableComponents.SITE],
group_by=None
)
# Discovers: ["A01_s{iii}_w1_z001.tif"]
# Groups: {"default": ["A01_s{iii}_w1_z001.tif"]}
# Special outputs: All sites write to same path
Example 2: Site and channel vary, group by channel
FunctionStep(
func=(normalize, {}),
variable_components=[VariableComponents.SITE, VariableComponents.CHANNEL],
group_by=GroupBy.CHANNEL
)
# Discovers: ["A01_s{iii}_w1_z001.tif", "A01_s{iii}_w2_z001.tif"]
# Groups: {"1": ["A01_s{iii}_w1_z001.tif"], "2": ["A01_s{iii}_w2_z001.tif"]}
# Special outputs: Each channel gets its own path
Example 3: Dict pattern (group_by specifies key meaning)
FunctionStep(
func={'1': analyze_nuclei, '2': analyze_gfp},
variable_components=[VariableComponents.SITE],
group_by=GroupBy.CHANNEL # Keys '1' and '2' are channel numbers
)
# Discovers: ["A01_s{iii}_w1_z001.tif", "A01_s{iii}_w2_z001.tif"]
# Groups: {"1": ["A01_s{iii}_w1_z001.tif"], "2": ["A01_s{iii}_w2_z001.tif"]}
# Routes: channel 1 → analyze_nuclei, channel 2 → analyze_gfp