Building Intuition ================== Understanding when and how to use different OpenHCS features requires developing mental models for common patterns and use cases. This section provides practical guidance for building effective analysis workflows. Mental Models for OpenHCS ------------------------- Pipeline as Assembly Line ~~~~~~~~~~~~~~~~~~~~~~~~~ Think of a pipeline as an assembly line where data flows through processing stations: .. code-block:: text Raw Images → [Normalize] → [Filter] → [Segment] → [Analyze] → Results ↓ ↓ ↓ ↓ Station 1 Station 2 Station 3 Station 4 **Key insights**: - Each station (step) does one specific job - Data flows automatically between stations - Multiple items (wells/sites) processed in parallel - Quality control can happen at any station Steps as Specialized Workers ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Each FunctionStep is like a specialized worker that knows how to process specific types of data: .. code-block:: python # Worker that specializes in channel-specific analysis channel_specialist = FunctionStep( func={ '1': analyze_nuclei, # Knows how to handle DAPI '2': analyze_neurites # Knows how to handle GFP }, group_by=GroupBy.CHANNEL ) **Key insights**: - Workers have specific skills (function patterns) - Workers know what data they can handle (variable_components) - Complex jobs can be broken down into specialized workers - Workers can collaborate (function chains) VFS as Smart Storage Manager ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Virtual File System acts like a smart storage manager that automatically decides where to put data: .. code-block:: text Processing: Memory (fast access) ↓ Intermediate: Memory (temporary) ↓ Final Results: Disk/Zarr (persistent) **Key insights**: - Fast storage for active work (memory) - Persistent storage for important results (disk/zarr) - Automatic optimization based on usage patterns - Transparent to analysis code Common Usage Patterns --------------------- Site-by-Site Image Processing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Most common pattern for standard image analysis: .. code-block:: python # Process each imaging site independently pipeline = Pipeline([ FunctionStep( func=stack_percentile_normalize, variable_components=[VariableComponents.SITE], name="normalize" ), FunctionStep( func=gaussian_filter, variable_components=[VariableComponents.SITE], sigma=2.0, name="filter" ), FunctionStep( func=segment_cells, variable_components=[VariableComponents.SITE], name="segment" ) ]) **When to use**: Standard image processing where each site is analyzed independently. **Mental model**: Each imaging position gets the same treatment, processed in parallel. Multi-Channel Analysis Workflows ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Different analysis for different fluorescent markers: .. code-block:: python # Channel-specific analysis after common preprocessing pipeline = Pipeline([ # Common preprocessing for all channels FunctionStep( func=stack_percentile_normalize, variable_components=[VariableComponents.SITE], name="normalize" ), # Channel-specific analysis FunctionStep( func={ '1': count_cells_single_channel, # DAPI → nuclei count '2': skan_axon_skeletonize_and_analyze # GFP → neurite analysis }, group_by=GroupBy.CHANNEL, variable_components=[VariableComponents.SITE], name="analyze" ) ]) **When to use**: Multi-marker experiments where each channel represents different biological features. **Mental model**: Common preparation followed by specialized analysis based on what each channel shows. Multi-Channel Processing Workflows ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Different processing for different fluorescent markers: .. code-block:: python # Different preprocessing for different channels pipeline = Pipeline([ FunctionStep( func={ '1': [ # DAPI channel (gaussian_filter, {'sigma': 1.0}), (tophat, {'selem_radius': 25}) ], '2': [ # GFP channel (gaussian_filter, {'sigma': 1.5}), (enhance_contrast, {'percentile_range': (2, 98)}), (tophat, {'selem_radius': 30}) ] }, group_by=GroupBy.CHANNEL, variable_components=[VariableComponents.SITE], name="channel_preprocessing" ), # Channel-specific analysis FunctionStep( func={ '1': (count_nuclei, {}), # DAPI analysis '2': (trace_neurites, {}) # GFP analysis }, group_by=GroupBy.CHANNEL, variable_components=[VariableComponents.SITE], name="analyze" ) ]) **When to use**: Multi-marker experiments where each channel requires different processing and analysis. **Mental model**: Channel-specific preprocessing and analysis pipelines that run in parallel. Memory-to-Disk Materialization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Keep processing fast while saving important results: .. code-block:: python pipeline = Pipeline([ # Fast processing in memory FunctionStep(func=preprocess, name="preprocess"), FunctionStep(func=filter_images, name="filter"), # Save important intermediate results FunctionStep( func=segment_cells, name="segment", force_disk_output=True # Save segmentation for inspection ), # Continue processing in memory FunctionStep(func=measure_features, name="measure"), # Final results automatically saved to configured backend FunctionStep(func=generate_summary, name="summary") ]) **When to use**: Long pipelines where you want to checkpoint important intermediate results. **Mental model**: Fast processing with strategic checkpoints for important results. Decision Trees for Common Scenarios ----------------------------------- Choosing Variable Components ~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text Do you need to process individual images? ├─ Yes → variable_components=[SITE, CHANNEL] └─ No → Do you need channel-specific processing? ├─ Yes → variable_components=[SITE] + dictionary pattern └─ No → Do you need to combine across sites? ├─ Yes → variable_components=[CHANNEL] └─ No → variable_components=[SITE] Choosing Function Patterns ~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text Do different data types need different processing? ├─ Yes → Dictionary pattern with group_by └─ No → Do you need multiple sequential operations? ├─ Yes → Function chain pattern └─ No → Single function pattern Choosing Storage Strategy ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: text How large is your dataset? ├─ Small (<10GB) → Memory backend for speed ├─ Medium (10-100GB) → Mixed strategy (memory + disk checkpoints) └─ Large (>100GB) → Zarr backend with compression Performance Optimization Patterns --------------------------------- GPU Memory Management ~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Efficient GPU processing pattern pipeline = Pipeline([ # Group GPU operations together FunctionStep( func=[ gaussian_filter, # CuPy tophat, # CuPy threshold_otsu # CuPy ], name="gpu_preprocessing" ), # CPU analysis (automatic memory conversion) FunctionStep( func=count_cells_single_channel, # NumPy name="cpu_analysis" ) ]) **Pattern**: Group operations by memory type to minimize conversions. Parallel Processing Optimization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Maximize parallelization step = FunctionStep( func=expensive_analysis, variable_components=[VariableComponents.SITE], # More parallel groups name="parallel_analysis" ) **Pattern**: Use fine-grained variable components for CPU-intensive operations to maximize parallel processing. Memory Usage Optimization ~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Manage memory usage in large datasets pipeline = Pipeline([ FunctionStep(func=large_preprocessing, name="preprocess"), # Free memory by saving to disk FunctionStep( func=memory_intensive_analysis, name="analysis", force_disk_output=True ), # Continue with freed memory FunctionStep(func=final_processing, name="final") ]) **Pattern**: Use strategic disk output to manage memory usage in long pipelines. Troubleshooting Common Issues ---------------------------- "Out of Memory" Errors ~~~~~~~~~~~~~~~~~~~~~~ **Symptoms**: GPU or CPU out of memory errors during processing. **Solutions**: - Use ``force_disk_output=True`` for large intermediate results - Process fewer sites simultaneously (adjust variable_components) - Switch to CPU backend for memory-intensive operations - Use Zarr backend with compression for large datasets Slow Processing ~~~~~~~~~~~~~~ **Symptoms**: Processing takes much longer than expected. **Solutions**: - Use GPU backends (CuPy, PyTorch, pyclesperanto) for large images - Group operations by memory type to minimize conversions - Use appropriate variable_components for parallelization - Check storage backend performance (SSD vs HDD) Incorrect Results ~~~~~~~~~~~~~~~~ **Symptoms**: Analysis produces unexpected or inconsistent results. **Solutions**: - Check variable_components match your analysis intent - Verify group_by parameter for dictionary patterns - Use ``force_disk_output=True`` to inspect intermediate results - Test with small datasets first Building Effective Workflows ---------------------------- Start Simple ~~~~~~~~~~~~ Begin with basic patterns and add complexity gradually: 1. **Single function steps** with site-by-site processing 2. **Add function chains** for sequential operations 3. **Introduce dictionary patterns** for multi-channel analysis 4. **Optimize storage and memory** for performance Iterate and Refine ~~~~~~~~~~~~~~~~~ Use OpenHCS features to iteratively improve workflows: - **Add checkpoints** with ``force_disk_output`` for debugging - **Optimize memory usage** by adjusting variable_components - **Improve performance** by grouping operations by backend - **Add condition-specific processing** as experiments become more complex Test at Scale ~~~~~~~~~~~~~ Validate workflows with realistic datasets: - **Test with full-size images** to identify memory issues - **Process multiple wells** to verify parallel execution - **Use representative data** to catch edge cases - **Monitor resource usage** to optimize performance These patterns and mental models provide a foundation for building effective OpenHCS workflows that scale from simple image processing to complex multi-dimensional analysis pipelines.