Building Intuition
==================

Understanding when and how to use different OpenHCS features requires developing mental models for common patterns and use cases. This section provides practical guidance for building effective analysis workflows.

Mental Models for OpenHCS
-------------------------

Pipeline as Assembly Line
~~~~~~~~~~~~~~~~~~~~~~~~~

Think of a pipeline as an assembly line where data flows through processing stations:

.. code-block:: text

   Raw Images → [Normalize] → [Filter] → [Segment] → [Analyze] → Results
                    ↓           ↓          ↓          ↓
                 Station 1   Station 2  Station 3  Station 4

**Key insights**:
- Each station (step) does one specific job
- Data flows automatically between stations
- Multiple items (wells/sites) processed in parallel
- Quality control can happen at any station

Steps as Specialized Workers
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Each FunctionStep is like a specialized worker that knows how to process specific types of data:

.. code-block:: python

   # Worker that specializes in channel-specific analysis
   channel_specialist = FunctionStep(
       func={
           '1': analyze_nuclei,     # Knows how to handle DAPI
           '2': analyze_neurites    # Knows how to handle GFP
       },
       group_by=GroupBy.CHANNEL
   )

**Key insights**:
- Workers have specific skills (function patterns)
- Workers know what data they can handle (variable_components)
- Complex jobs can be broken down into specialized workers
- Workers can collaborate (function chains)

VFS as Smart Storage Manager
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Virtual File System acts like a smart storage manager that automatically decides where to put data:

.. code-block:: text

   Processing: Memory (fast access)
        ↓
   Intermediate: Memory (temporary)
        ↓  
   Final Results: Disk/Zarr (persistent)

**Key insights**:
- Fast storage for active work (memory)
- Persistent storage for important results (disk/zarr)
- Automatic optimization based on usage patterns
- Transparent to analysis code

Common Usage Patterns
---------------------

Site-by-Site Image Processing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Most common pattern for standard image analysis:

.. code-block:: python

   # Process each imaging site independently
   pipeline = Pipeline([
       FunctionStep(
           func=stack_percentile_normalize,
           variable_components=[VariableComponents.SITE],
           name="normalize"
       ),
       FunctionStep(
           func=gaussian_filter,
           variable_components=[VariableComponents.SITE],
           sigma=2.0,
           name="filter"
       ),
       FunctionStep(
           func=segment_cells,
           variable_components=[VariableComponents.SITE],
           name="segment"
       )
   ])

**When to use**: Standard image processing where each site is analyzed independently.

**Mental model**: Each imaging position gets the same treatment, processed in parallel.

Multi-Channel Analysis Workflows
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Different analysis for different fluorescent markers:

.. code-block:: python

   # Channel-specific analysis after common preprocessing
   pipeline = Pipeline([
       # Common preprocessing for all channels
       FunctionStep(
           func=stack_percentile_normalize,
           variable_components=[VariableComponents.SITE],
           name="normalize"
       ),
       
       # Channel-specific analysis
       FunctionStep(
           func={
               '1': count_cells_single_channel,      # DAPI → nuclei count
               '2': skan_axon_skeletonize_and_analyze # GFP → neurite analysis
           },
           group_by=GroupBy.CHANNEL,
           variable_components=[VariableComponents.SITE],
           name="analyze"
       )
   ])

**When to use**: Multi-marker experiments where each channel represents different biological features.

**Mental model**: Common preparation followed by specialized analysis based on what each channel shows.

Multi-Channel Processing Workflows
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Different processing for different fluorescent markers:

.. code-block:: python

   # Different preprocessing for different channels
   pipeline = Pipeline([
       FunctionStep(
           func={
               '1': [  # DAPI channel
                   (gaussian_filter, {'sigma': 1.0}),
                   (tophat, {'selem_radius': 25})
               ],
               '2': [  # GFP channel
                   (gaussian_filter, {'sigma': 1.5}),
                   (enhance_contrast, {'percentile_range': (2, 98)}),
                   (tophat, {'selem_radius': 30})
               ]
           },
           group_by=GroupBy.CHANNEL,
           variable_components=[VariableComponents.SITE],
           name="channel_preprocessing"
       ),
       
       # Channel-specific analysis
       FunctionStep(
           func={
               '1': (count_nuclei, {}),      # DAPI analysis
               '2': (trace_neurites, {})     # GFP analysis
           },
           group_by=GroupBy.CHANNEL,
           variable_components=[VariableComponents.SITE],
           name="analyze"
       )
   ])

**When to use**: Multi-marker experiments where each channel requires different processing and analysis.

**Mental model**: Channel-specific preprocessing and analysis pipelines that run in parallel.

Memory-to-Disk Materialization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Keep processing fast while saving important results:

.. code-block:: python

   pipeline = Pipeline([
       # Fast processing in memory
       FunctionStep(func=preprocess, name="preprocess"),
       FunctionStep(func=filter_images, name="filter"),
       
       # Save important intermediate results
       FunctionStep(
           func=segment_cells,
           name="segment",
           force_disk_output=True  # Save segmentation for inspection
       ),
       
       # Continue processing in memory
       FunctionStep(func=measure_features, name="measure"),
       
       # Final results automatically saved to configured backend
       FunctionStep(func=generate_summary, name="summary")
   ])

**When to use**: Long pipelines where you want to checkpoint important intermediate results.

**Mental model**: Fast processing with strategic checkpoints for important results.

Decision Trees for Common Scenarios
-----------------------------------

Choosing Variable Components
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   Do you need to process individual images?
   ├─ Yes → variable_components=[SITE, CHANNEL]
   └─ No → Do you need channel-specific processing?
           ├─ Yes → variable_components=[SITE] + dictionary pattern
           └─ No → Do you need to combine across sites?
                   ├─ Yes → variable_components=[CHANNEL]
                   └─ No → variable_components=[SITE]

Choosing Function Patterns
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   Do different data types need different processing?
   ├─ Yes → Dictionary pattern with group_by
   └─ No → Do you need multiple sequential operations?
           ├─ Yes → Function chain pattern
           └─ No → Single function pattern

Choosing Storage Strategy
~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: text

   How large is your dataset?
   ├─ Small (<10GB) → Memory backend for speed
   ├─ Medium (10-100GB) → Mixed strategy (memory + disk checkpoints)
   └─ Large (>100GB) → Zarr backend with compression

Performance Optimization Patterns
---------------------------------

GPU Memory Management
~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Efficient GPU processing pattern
   pipeline = Pipeline([
       # Group GPU operations together
       FunctionStep(
           func=[
               gaussian_filter,    # CuPy
               tophat,            # CuPy  
               threshold_otsu     # CuPy
           ],
           name="gpu_preprocessing"
       ),
       
       # CPU analysis (automatic memory conversion)
       FunctionStep(
           func=count_cells_single_channel,  # NumPy
           name="cpu_analysis"
       )
   ])

**Pattern**: Group operations by memory type to minimize conversions.

Parallel Processing Optimization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Maximize parallelization
   step = FunctionStep(
       func=expensive_analysis,
       variable_components=[VariableComponents.SITE],  # More parallel groups
       name="parallel_analysis"
   )

**Pattern**: Use fine-grained variable components for CPU-intensive operations to maximize parallel processing.

Memory Usage Optimization
~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   # Manage memory usage in large datasets
   pipeline = Pipeline([
       FunctionStep(func=large_preprocessing, name="preprocess"),
       
       # Free memory by saving to disk
       FunctionStep(
           func=memory_intensive_analysis,
           name="analysis", 
           force_disk_output=True
       ),
       
       # Continue with freed memory
       FunctionStep(func=final_processing, name="final")
   ])

**Pattern**: Use strategic disk output to manage memory usage in long pipelines.

Troubleshooting Common Issues
----------------------------

"Out of Memory" Errors
~~~~~~~~~~~~~~~~~~~~~~

**Symptoms**: GPU or CPU out of memory errors during processing.

**Solutions**:
- Use ``force_disk_output=True`` for large intermediate results
- Process fewer sites simultaneously (adjust variable_components)
- Switch to CPU backend for memory-intensive operations
- Use Zarr backend with compression for large datasets

Slow Processing
~~~~~~~~~~~~~~

**Symptoms**: Processing takes much longer than expected.

**Solutions**:
- Use GPU backends (CuPy, PyTorch, pyclesperanto) for large images
- Group operations by memory type to minimize conversions
- Use appropriate variable_components for parallelization
- Check storage backend performance (SSD vs HDD)

Incorrect Results
~~~~~~~~~~~~~~~~

**Symptoms**: Analysis produces unexpected or inconsistent results.

**Solutions**:
- Check variable_components match your analysis intent
- Verify group_by parameter for dictionary patterns
- Use ``force_disk_output=True`` to inspect intermediate results
- Test with small datasets first

Building Effective Workflows
----------------------------

Start Simple
~~~~~~~~~~~~

Begin with basic patterns and add complexity gradually:

1. **Single function steps** with site-by-site processing
2. **Add function chains** for sequential operations
3. **Introduce dictionary patterns** for multi-channel analysis
4. **Optimize storage and memory** for performance

Iterate and Refine
~~~~~~~~~~~~~~~~~

Use OpenHCS features to iteratively improve workflows:

- **Add checkpoints** with ``force_disk_output`` for debugging
- **Optimize memory usage** by adjusting variable_components
- **Improve performance** by grouping operations by backend
- **Add condition-specific processing** as experiments become more complex

Test at Scale
~~~~~~~~~~~~~

Validate workflows with realistic datasets:

- **Test with full-size images** to identify memory issues
- **Process multiple wells** to verify parallel execution
- **Use representative data** to catch edge cases
- **Monitor resource usage** to optimize performance

These patterns and mental models provide a foundation for building effective OpenHCS workflows that scale from simple image processing to complex multi-dimensional analysis pipelines.