==========================
ROI Extraction and Streaming System
==========================

**Status**: Production | **Introduced**: 2025-10 | **Stability**: Stable

The Problem: Scattered ROI Handling
------------------------------------

Cell segmentation and counting pipelines generate labeled masks that need to be converted into regions of interest (ROIs) for further analysis. Without a unified ROI system, each visualization tool (Napari, Fiji, OMERO) requires custom code to extract and format ROIs. Additionally, ROIs need to be materialized to multiple backends simultaneously (disk for archival, OMERO for server storage, Napari for visualization), creating complex coordination logic scattered throughout the codebase.

The Solution: Unified ROI Extraction and Multi-Backend Materialization
-----------------------------------------------------------------------

The ROI (Region of Interest) system provides backend-agnostic extraction, representation, and materialization of regions of interest from segmentation masks. ROIs are automatically extracted from labeled masks generated by cell counting and segmentation functions, then materialized to multiple backends simultaneously (disk, OMERO, Napari, Fiji) through a unified interface.

Overview
========

The ROI (Region of Interest) system provides backend-agnostic extraction, representation, and materialization of regions of interest from segmentation masks. ROIs are automatically extracted from labeled masks generated by cell counting and segmentation functions, then materialized to multiple backends simultaneously (disk, OMERO, Napari, Fiji).

Architecture
============

Core Components
---------------

ROI Representation
^^^^^^^^^^^^^^^^^^

ROIs are represented using immutable dataclasses defined in ``openhcs/core/roi.py``:

.. code-block:: python

    @dataclass(frozen=True)
    class ROI:
        """Region of Interest with metadata."""
        shapes: List[Any]  # List of shape objects
        metadata: Dict[str, Any] = field(default_factory=dict)

**Shape Types**:

- ``PolygonShape``: Polygon defined by vertex coordinates (Nx2 array of (y, x))
- ``MaskShape``: Binary mask with bounding box
- ``PointShape``: Single point (y, x)
- ``EllipseShape``: Ellipse with center and radii

ROI Extraction
^^^^^^^^^^^^^^

ROIs are extracted from labeled segmentation masks using scikit-image:

.. code-block:: python

    from openhcs.core.roi import extract_rois_from_labeled_mask

    rois = extract_rois_from_labeled_mask(
        labeled_mask,
        min_area=10,  # Minimum area in pixels
        extract_contours=True  # Extract polygon contours vs binary masks
    )

**Extraction Process**:

1. Use ``skimage.measure.regionprops`` to get region properties
2. Filter regions by minimum area threshold
3. Extract metadata (label, area, perimeter, centroid, bbox)
4. Extract shapes (polygon contours or binary masks)
5. Create immutable ROI objects

Skeleton ROI Extraction
^^^^^^^^^^^^^^^^^^^^^^^

Skeleton masks require specialized extraction to preserve branch topology and connectivity. The standard ``extract_rois_from_labeled_mask()`` approach fragments skeletons at junctions, losing connectivity information.

**Problem with Standard Extraction**:

Using ``scipy.ndimage.label()`` treats skeletons as binary masks:

- Fragments branches at junction points
- Loses connectivity and topology information
- Creates disconnected regions instead of continuous paths
- Cannot distinguish individual branches

**Solution: skan Branch Path Extraction**:

Use ``skan.Skeleton`` to extract actual branch paths:

.. code-block:: python

    from skan import Skeleton
    from openhcs.core.roi import PolylineShape, ROI

    # Create skan Skeleton object
    skeleton_obj = Skeleton(skeleton_mask)

    # Extract each branch as a separate ROI
    for branch_idx in range(skeleton_obj.n_paths):
        # Get pixel coordinates for this branch path
        # Returns (N, 2) array of (row, col) = (y, x) coordinates
        path_coords = skeleton_obj.path_coordinates(branch_idx)

        # Create polyline shape from path coordinates
        # PolylineShape is for open paths (not closed polygons)
        shape = PolylineShape(coordinates=path_coords)

        # Create ROI with metadata
        metadata = {
            'position': z_idx,
            'label': f'Skeleton_Z{z_idx:03d}_Branch{branch_idx:03d}',
            'branch_index': branch_idx,
            'path_length': len(path_coords)
        }

        roi = ROI(shapes=[shape], metadata=metadata)

**Benefits**:

- Preserves skeleton topology and connectivity
- One ROI per continuous branch (not per connected component)
- Compatible with skan's graph-based skeleton analysis
- Enables visual validation of skeleton analysis results
- Proper polyline ROIs that represent skeleton structure

**Implementation**: ``openhcs/processing/backends/analysis/skan_axon_analysis.py::_skeleton_mask_to_rois()``

Multi-Backend Materialization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ROIs are materialized to multiple backends simultaneously during pipeline execution:

.. code-block:: python

    def materialize_segmentation_masks(
        data: List[np.ndarray],
        path: str,
        filemanager,
        backends: Union[str, List[str]],
        backend_kwargs: dict = None
    ) -> str:
        """Materialize segmentation masks as ROIs to multiple backends."""

**Backend-Specific Formats**:

- **Disk**: JSON file + ImageJ ROI format (``.roi`` files)
- **OMERO**: ``omero.model.RoiI`` objects linked to images
- **Napari**: ZMQ streaming as shapes layers
- **Fiji**: ZMQ streaming as ImageJ ROIs to ROI Manager

Per-Channel Materialization
----------------------------

For dict patterns (e.g., processing multiple channels), ROIs are materialized separately for each channel to preserve channel identity:

**Path Structure**:

.. code-block:: text

    # Single channel
    A01_segmentation_masks_step7_rois.json

    # Multi-channel (dict pattern)
    A01_w1_segmentation_masks_step7_rois.json  # Channel 1
    A01_w2_segmentation_masks_step7_rois.json  # Channel 2

**Implementation** (``openhcs/core/steps/function_step.py``):

.. code-block:: python

    # For dict patterns, materialize each channel separately
    if is_dict_pattern and dict_keys:
        for dict_key in dict_keys:
            # Construct channel-specific path
            channel_path = f"{dir_part}/{well_id}_w{dict_key}_{rest}"
            
            # Load channel data
            channel_data = filemanager.load(channel_path, Backend.MEMORY.value)
            
            # Materialize to all backends
            result_path = mat_func(channel_data, str(analysis_path), 
                                   filemanager, backends, backend_kwargs)

Backend-Specific Implementation
================================

Disk Backend
------------

**Location**: ``openhcs/io/disk.py``

Saves ROIs in two formats:

1. **JSON Format**: Human-readable ROI data with metadata
2. **ImageJ Format**: Binary ``.roi`` files for ImageJ/Fiji compatibility

.. code-block:: python

    def _save_rois(self, rois: List, output_path: Path, **kwargs) -> str:
        """Save ROIs to disk in JSON and ImageJ formats."""
        
        # Save JSON
        roi_data = [self._roi_to_dict(roi) for roi in rois]
        json_path = output_path.with_suffix('.json')
        json_path.write_text(json.dumps(roi_data, indent=2))
        
        # Save ImageJ ROIs
        imagej_dir = output_path.with_name(f"{output_path.stem}_rois")
        for roi in rois:
            # Convert to ImageJ format using roifile library
            ...

OMERO Backend
-------------

**Location**: ``openhcs/io/omero_local.py``

Creates ``omero.model.RoiI`` objects and links them to images:

.. code-block:: python

    def _save_rois(self, rois: List, output_path: Path, images_dir: str = None, **kwargs) -> str:
        """Create OMERO ROIs and link to images."""
        
        # Find corresponding image
        image_id = self._find_image_id_from_path(output_path, images_dir)
        
        # Create OMERO ROI
        omero_roi = omero.model.RoiI()
        omero_roi.setImage(omero.model.ImageI(image_id, False))
        
        # Add shapes
        for roi in rois:
            for shape in roi.shapes:
                if isinstance(shape, PolygonShape):
                    polygon = omero.model.PolygonI()
                    # Convert coordinates to OMERO format
                    ...

Napari Streaming Backend
-------------------------

**Location**: ``openhcs/io/napari_stream.py``

Streams ROIs as shapes layers via ZMQ:

.. code-block:: python

    def _save_rois(self, rois: List, output_path: Path, napari_port: int, **kwargs) -> str:
        """Stream ROIs to Napari as shapes layer."""
        
        # Extract layer name from output_path
        layer_name = output_path.stem  # e.g., "A01_w1_segmentation_masks_step7_rois"
        
        # Convert ROIs to Napari shapes format
        shapes_data = []
        for roi in rois:
            for shape in roi.shapes:
                if isinstance(shape, PolygonShape):
                    shapes_data.append({
                        'type': 'polygon',
                        'coordinates': shape.coordinates.tolist(),
                        'metadata': roi.metadata
                    })
        
        # Send via ZMQ
        message = {
            'type': 'shapes',
            'shapes': shapes_data,
            'layer_name': layer_name
        }
        publisher.send_json(message)

**Viewer Server** (``openhcs/runtime/napari_stream_visualizer.py``):

.. code-block:: python

    def _process_shapes_message(self, data: Dict[str, Any]):
        """Process shapes/ROIs message and add to Napari viewer."""
        
        shapes_data = data.get('shapes', [])
        layer_name = data.get('layer_name', 'ROIs')
        
        # Convert to Napari format
        napari_shapes = []
        for shape_dict in shapes_data:
            if shape_dict['type'] == 'polygon':
                napari_shapes.append(np.array(shape_dict['coordinates']))
        
        # Add or update layer
        if layer_name in self.layers:
            layer.data = napari_shapes
        else:
            layer = self.viewer.add_shapes(
                napari_shapes,
                name=layer_name,
                edge_color='red',
                face_color='transparent'
            )

Fiji Streaming Backend
-----------------------

**Location**: ``openhcs/io/fiji_stream.py``

Streams ROIs as ImageJ ROIs to ROI Manager via ZMQ:

.. code-block:: python

    def _save_rois(self, rois: List, output_path: Path, fiji_port: int, **kwargs) -> str:
        """Stream ROIs to Fiji ROI Manager."""
        
        from roifile import ImagejRoi
        import base64
        
        # Extract descriptive prefix
        roi_prefix = output_path.stem  # e.g., "A01_w1_segmentation_masks_step7_rois"
        
        # Convert ROIs to ImageJ format
        roi_bytes_list = []
        for roi in rois:
            for shape in roi.shapes:
                if isinstance(shape, PolygonShape):
                    # Convert to ImageJ ROI
                    coords_xy = shape.coordinates[:, [1, 0]]  # Swap to (x, y)
                    ij_roi = ImagejRoi.frompoints(coords_xy)
                    
                    # Set descriptive name
                    ij_roi.name = f"{roi_prefix}_ROI_{roi.metadata['label']}"
                    roi_bytes_list.append(ij_roi.tobytes())
        
        # Encode and send via ZMQ
        encoded_rois = [base64.b64encode(b).decode('utf-8') for b in roi_bytes_list]
        message = {'type': 'rois', 'rois': encoded_rois}
        publisher.send_json(message)

**Viewer Server** (``openhcs/runtime/fiji_viewer_server.py``):

.. code-block:: python

    def _process_rois_message(self, data: Dict[str, Any]):
        """Process ROIs message and add to Fiji ROI Manager."""
        
        rois_encoded = data.get('rois', [])
        
        # Get ROI Manager
        rm = RoiManager.getInstance()
        if rm is None:
            rm = RoiManager()
        
        # Decode and add ROIs
        for roi_encoded in rois_encoded:
            roi_bytes = base64.b64decode(roi_encoded)
            
            # Load via temporary file
            with tempfile.NamedTemporaryFile(suffix='.roi', delete=False) as tmp:
                tmp.write(roi_bytes)
                tmp_path = tmp.name
            
            # Load ROI using ImageJ's RoiDecoder
            roi_decoder = RoiDecoder(tmp_path)
            java_roi = roi_decoder.getRoi()
            rm.addRoi(java_roi)

Integration with Pipeline System
================================

Special Outputs Registration
-----------------------------

Functions register ROI outputs using the ``@special_outputs`` decorator with writer-based materialization specs:

.. code-block:: python

    from openhcs.processing.materialization import MaterializationSpec, CsvOptions, ROIOptions

    @special_outputs(
        ("cell_counts", MaterializationSpec(CsvOptions(filename_suffix="_details.csv"))),
        ("segmentation_masks", MaterializationSpec(ROIOptions())),
    )
    def count_cells_single_channel(..., return_segmentation_mask: bool = False):
        """Count cells and optionally return segmentation masks."""

        if return_segmentation_mask:
            return output_stack, cell_counts, labeled_masks
        else:
            return output_stack, cell_counts

Materialization Workflow
-------------------------

1. **Execution**: Function returns special outputs (e.g., labeled masks)
2. **Storage**: Special outputs saved to memory backend with channel-specific paths
3. **Materialization**: After step completion, format writers are invoked
4. **Multi-Backend**: ROIs saved to all backends (disk + streaming) simultaneously

**Path Transformation**:

.. code-block:: text

    Memory:   /memory/plate_001/results/A01_w1_segmentation_masks_step7.pkl
    Analysis: /disk/plate_001_outputs/images_results/A01_w1_segmentation_masks_step7_rois.json

Usage Examples
==============

Basic Usage
-----------

.. code-block:: python

    from openhcs.processing.backends.analysis.cell_counting_cpu import count_cells_single_channel
    from openhcs.core.config import DetectionMethod, DtypeConversion

    # Enable segmentation mask return
    Step(
        func={
            '1': (count_cells_single_channel, {
                'min_cell_area': 40,
                'max_cell_area': 200,
                'detection_method': DetectionMethod.WATERSHED,
                'return_segmentation_mask': True  # Enable ROI extraction
            })
        },
        group_by=GroupBy.CHANNEL,
        variable_components=[VariableComponents.SITE]
    )

Multi-Channel with Streaming
-----------------------------

.. code-block:: python

    Step(
        func={
            '1': (count_cells_single_channel, {'return_segmentation_mask': True}),
            '2': (count_cells_single_channel, {'return_segmentation_mask': True})
        },
        group_by=GroupBy.CHANNEL,
        napari_streaming_config=LazyNapariStreamingConfig(napari_port=5559),
        fiji_streaming_config=LazyFijiStreamingConfig(fiji_port=5560)
    )

**Result**: 4 separate ROI layers/sets (2 wells × 2 channels):

- Napari: ``A01_w1_rois``, ``A01_w2_rois``, ``B03_w1_rois``, ``B03_w2_rois``
- Fiji: ROIs named ``A01_w1_rois_ROI_1``, ``A01_w2_rois_ROI_1``, etc.

See Also
========

- :doc:`special_io_system` - Special outputs framework
- :doc:`napari_streaming_system` - Napari streaming architecture
- :doc:`fiji_streaming_system` - Fiji streaming architecture
- :doc:`omero_backend_system` - OMERO backend implementation
- :doc:`dict_pattern_case_study` - Dict pattern materialization