Dict Pattern Special Outputs: Architectural Case Study

Problem Statement

OpenHCS needed to support special outputs (cross-step communication and materialization) from dict patterns, but the original special I/O system was designed around single functions per step. This created a fundamental architectural tension between component-specific processing and step-to-step communication.

Background Context

Original Special I/O Design

  • Purpose: Cross-step communication (positions generation → assembly) and analysis materialization

  • Assumption: Single function per step with simple key matching

  • Architecture: Declarative compilation with runtime execution filtering

Dict Pattern Requirements

  • Use Case: Component-specific processing ({'DAPI': analyze_nuclei, 'GFP': analyze_proteins})

  • Benefit: Eliminates need for separate channel isolation steps

  • Challenge: Multiple functions per step, each potentially producing special outputs

The Architectural Tension

Cross-Step Communication Problem

# Position generation (dict pattern)
{'DAPI': ashlar_compute_positions}  # Produces: DAPI_positions

# Assembly step (single pattern)
assemble_images  # Expects: positions

# PROBLEM: DAPI_positions ≠ positions (linking fails)

Execution Filtering Problem

# Step plan (compiled, namespaced)
step_special_outputs_plan = {'DAPI_cell_counts': {...}}

# Function attributes (original, not namespaced)
func_special_outputs = {'cell_counts'}

# Current filtering logic FAILS
outputs_plan_for_this_call = {
    key: value for key, value in step_special_outputs_plan.items()
    if key in func_special_outputs  # 'DAPI_cell_counts' not in {'cell_counts'}
}

Analysis Framework

Forest-Level Thinking Principles

  1. Architectural Immunity: No solutions that create technical debt

  2. Compilation Model Integrity: Compiled plans are single source of truth

  3. Fail-Loud Philosophy: Clear errors over silent failures

  4. Minimal Complexity: Simplest solution that maintains full functionality

Compiler-Inspired Approach

Drawing from compiler design patterns for symbol resolution and scoping: - Scope Promotion: Single-key dict patterns promote to global scope - Namespacing: Multi-key dict patterns maintain component-specific scope - Collision Detection: Compiler validates unique output keys

Solution Architecture

1. Full Namespacing System

Pattern: dict_key_chain_position_original_key

{
    'DAPI': [preprocess_func, count_cells_func],  # Chain
    'GFP': analyze_proteins_func                  # Single
}

# Generates namespaced keys:
# DAPI_0_cell_counts (from count_cells_func at position 0 in DAPI chain)
# DAPI_1_protein_data (from analyze_proteins_func at position 1 in DAPI chain)
# GFP_0_analysis_results (from analyze_proteins_func at position 0 in GFP chain)

Benefits: - Unique keys across all dict components - Preserves original function metadata - Enables cross-step communication with explicit namespacing

2. Scope Promotion for Single-Key Dicts

Rule: Single-key dict patterns auto-promote to global scope

# Single-key dict pattern
{'positions': ashlar_compute_positions}

# Auto-promotes to global scope:
# positions (not positions_0_positions)

# Consuming step can use simple key:
@special_inputs("positions")
def assemble_images(image_stack, positions):
    return stitch(image_stack, positions)

Benefits: - Maintains backward compatibility - Simplifies common single-component use cases - Preserves existing cross-step communication patterns

3. Collision Detection System

Validation: Compiler ensures unique output keys across all patterns

# INVALID: Collision detected at compile time
{
    'DAPI': count_cells_func,     # Produces: cell_counts
    'GFP': count_cells_func       # Produces: cell_counts (COLLISION!)
}

# Compiler error: "Duplicate special output key 'cell_counts' detected"

Benefits: - Fail-loud behavior prevents runtime errors - Clear error messages guide developers - Maintains compilation model integrity

4. Execution Filtering Enhancement

Solution: Reverse-lookup mapping from original keys to namespaced keys

# Compilation generates reverse mapping
original_to_namespaced = {
    'cell_counts': 'DAPI_0_cell_counts',
    'protein_data': 'GFP_0_protein_data'
}

# Enhanced filtering logic
def filter_special_outputs_for_function(step_plan, func_special_outputs):
    filtered_plan = {}
    for original_key in func_special_outputs:
        if original_key in original_to_namespaced:
            namespaced_key = original_to_namespaced[original_key]
            if namespaced_key in step_plan:
                filtered_plan[namespaced_key] = step_plan[namespaced_key]
    return filtered_plan

Benefits: - Functions use original, clean metadata - Compilation handles namespacing complexity - Runtime execution remains efficient

Implementation Strategy

Phase 1: Compilation Enhancement

Objective: Extend compiler to generate namespaced keys and reverse mappings

Changes: 1. Modify special output discovery to generate namespaced keys 2. Create reverse-lookup mapping during compilation 3. Add collision detection validation 4. Implement scope promotion rules for single-key dicts

Phase 2: Runtime Execution Update

Objective: Update execution filtering to use reverse mappings

Changes: 1. Enhance filter_special_outputs_for_function with reverse lookup 2. Update special output saving to use namespaced keys 3. Maintain backward compatibility for existing pipelines

Phase 3: Cross-Step Communication

Objective: Enable cross-step communication with namespaced keys

Changes: 1. Update special input resolution to handle namespaced keys 2. Add developer guidance for cross-step communication patterns 3. Provide clear error messages for missing dependencies

Architectural Benefits

Maintains Compilation Model

  • Single source of truth: compiled plans contain all execution information

  • No runtime discovery or dynamic key generation

  • Predictable behavior across all execution contexts

Preserves Function Purity

  • Functions declare clean, semantic output keys

  • No function-level awareness of namespacing complexity

  • Testable functions with clear interfaces

Enables Component-Specific Processing

  • Dict patterns work seamlessly with special I/O

  • Component-specific analysis with cross-step communication

  • Eliminates need for separate channel isolation steps

Fail-Loud Error Handling

  • Collision detection at compile time

  • Clear error messages for missing dependencies

  • No silent failures or unexpected behavior

Future Extensions

Advanced Namespacing Patterns

Hierarchical Namespacing: Support nested dict patterns with multi-level namespacing

{
    'DAPI': {
        'nuclei': count_nuclei_func,
        'background': measure_background_func
    },
    'GFP': analyze_proteins_func
}

# Generates: DAPI_nuclei_0_cell_counts, DAPI_background_0_intensity, GFP_0_protein_data

Conditional Outputs: Support conditional special outputs based on component properties

Cross-Pattern Communication

Pattern-to-Pattern Dependencies: Enable communication between different dict pattern components

# Step 1: Generate component-specific data
{'DAPI': generate_nuclei_masks, 'GFP': generate_protein_masks}

# Step 2: Cross-component analysis
{'analysis': correlate_nuclei_proteins}  # Uses both DAPI and GFP outputs

This case study demonstrates how compiler design principles can solve complex architectural challenges while maintaining system integrity and developer experience.