OMERO Backend System
====================

Overview
--------

The OMERO backend system provides server-side execution support for OpenHCS on OMERO servers with zero data transfer overhead. It implements a virtual backend pattern where filenames are generated on-demand from OMERO metadata rather than from a real filesystem.

**The Server-Side Execution Challenge**: Traditional image processing requires downloading data from OMERO to local machines, processing it, and uploading results back. For high-content screening datasets (100GB+ per plate), this creates massive data transfer bottlenecks and makes server-side processing impractical.

**The OpenHCS Solution**: A virtual backend architecture that generates filenames from OMERO's plate structure without requiring a real filesystem. Combined with multiprocessing-safe connection management and the ZMQ execution system, this enables true server-side processing where data never leaves the OMERO server.

**Key Innovation**: VirtualBackend ABC pattern that separates backends with real filesystems (disk, zarr) from backends that generate filenames on-demand (OMERO, cloud storage). This enables location-transparent processing where the same pipeline code works on disk, zarr, or OMERO without modification.

Architecture
------------

Virtual Backend Pattern
~~~~~~~~~~~~~~~~~~~~~~~

Unlike traditional storage backends (disk, zarr), the OMERO backend is a **VirtualBackend** that generates filenames on-demand from OMERO's plate structure:

.. code-block:: python

   # Traditional backend: Real files on disk
   /data/plate/A01/A01_s001_z000_c000.tif  # Actual file exists
   
   # OMERO backend: Virtual paths generated from metadata
   /omero/plate_123/A01/A01_s001_z000_c000.tif  # No real file, just metadata

Key Design Principles
~~~~~~~~~~~~~~~~~~~~~

**No Real Filesystem**
  All paths are virtual, generated from OMERO plate structure

**Lazy Loading**
  Plate metadata cached on first access, reused for all operations

**Location Transparency**
  Same path format regardless of backend (disk, zarr, OMERO)

**On-Demand Generation**
  Files created only when needed (e.g., derived plates)

**Automatic Backend Selection**
  OMERO plates automatically use ``omero_local`` for both read and write operations, ignoring user's materialization backend choice

Automatic Backend Selection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Critical Design Rule**: OMERO plates MUST use ``omero_local`` backend for both input (read) and output (materialization). The system automatically enforces this through the microscope handler's backend compatibility system.

**Why This Matters**:

- OMERO uses virtual paths like ``/omero/plate_123/`` that don't exist on the filesystem
- Attempting to read/write using ``disk`` or ``zarr`` backends will fail with permission errors
- OMERO output must be saved as FileAnnotations attached to OMERO objects, not as files

**Automatic Backend Selection Logic**:

The backend selection happens through the microscope handler's ``compatible_backends`` property:

.. code-block:: python

   class OMEROHandler(MicroscopeHandler):
       @property
       def compatible_backends(self) -> List[Backend]:
           """OMERO is only compatible with OMERO_LOCAL backend."""
           return [Backend.OMERO_LOCAL]

When the compiler calls ``get_primary_backend()``, it returns the first compatible backend, which for OMERO is always ``omero_local``. This applies to both read and materialization backends:

.. code-block:: python

   # Compiler backend selection (in MaterializationFlagPlanner)
   read_backend = context.microscope_handler.get_primary_backend(plate_path, filemanager)
   # Returns: 'omero_local' (first compatible backend for OMERO)

   materialization_backend = context.microscope_handler.get_primary_backend(plate_path, filemanager)
   # Returns: 'omero_local' (same logic)

**User Impact**:

- Users don't need to configure backends for OMERO plates
- System "just works" regardless of VFSConfig settings
- Prevents common errors from trying to write to ``/omero/`` paths

**Contrast with Other Microscopes**:

- **ImageXpress/Opera Phenix**: Compatible with disk backend → Read from disk → Write to OpenHCS format (disk or zarr based on ``materialization_backend``)
- **OpenHCS**: Compatible with disk/zarr/virtual_workspace → Read from auto-detected backend → Write to OpenHCS format (disk or zarr based on ``materialization_backend``)
- **OMERO**: Compatible with omero_local only → Read from omero_local → Write to omero_local (``materialization_backend`` choice ignored)

VirtualBackend ABC
~~~~~~~~~~~~~~~~~~

The system introduces a new abstract base class for backends without real filesystems:

.. code-block:: python

   class VirtualBackend(ABC):
       """Base class for backends without real filesystem (OMERO, cloud, etc.)"""
       
       @abstractmethod
       def list_files(self, directory: str) -> List[str]:
           """Generate file list from metadata."""
           pass
       
       @abstractmethod
       def generate_filename(self, metadata: Dict) -> str:
           """Generate filename from metadata."""
           pass

This enables future cloud storage backends (S3, GCS) using the same pattern.

Multiprocessing-Safe Connection Management
------------------------------------------

OMERO connections contain unpicklable ``IcePy.Communicator`` objects, requiring special handling for multiprocessing:

The Problem
~~~~~~~~~~~

.. code-block:: python

   # This fails - connection can't be pickled
   backend = OMEROLocalBackend(omero_conn=conn)
   process = multiprocessing.Process(target=worker, args=(backend,))  # ❌ Pickle error

The Solution
~~~~~~~~~~~~

.. code-block:: python

   # Connection parameters stored, not connection itself
   backend = OMEROLocalBackend(omero_conn=conn)
   # Connection recreated in worker process using stored params
   process = multiprocessing.Process(target=worker, args=(backend,))  # ✅ Works

Implementation Strategy
~~~~~~~~~~~~~~~~~~~~~~~

1. **Main Process**: Store connection parameters (host, port, username, password)
2. **Pickle**: Exclude connection object via ``__getstate__``
3. **Worker Process**: Recreate connection using stored parameters
4. **Global Registry**: Share connections across backend instances

.. code-block:: python

   class OMEROLocalBackend(VirtualBackend):
       def __getstate__(self):
           """Exclude unpicklable connection object."""
           state = self.__dict__.copy()
           # Remove unpicklable connection
           state['_initial_conn'] = None
           return state

       def __setstate__(self, state):
           """Restore state after unpickling."""
           self.__dict__.update(state)
           # Connection will be retrieved from global registry in worker process

See ``openhcs/io/omero_local.py`` lines 93-150 for complete implementation.

Metadata Caching Strategy
--------------------------

OMERO metadata is cached at the plate level to minimize API queries:

Cache Structure
~~~~~~~~~~~~~~~

.. code-block:: python

   @dataclass
   class PlateStructure:
       plate_id: int
       parser_name: str
       microscope_type: str
       wells: Dict[str, WellStructure]  # well_id → WellStructure
       all_well_ids: Set[str]
       max_sites: int
       max_z: int
       max_c: int
       max_t: int
   
   @dataclass
   class WellStructure:
       well_id: str
       row: int
       col: int
       images: Dict[int, ImageStructure]  # site → ImageStructure
   
   @dataclass
   class ImageStructure:
       image_id: int
       site: int
       size_z: int
       size_c: int
       size_t: int

Caching Pattern
~~~~~~~~~~~~~~~

.. code-block:: python

   # First access: Query OMERO once for entire plate
   metadata = handler.get_channel_values(plate_id)  # Queries OMERO
   
   # Subsequent accesses: Return cached data
   z_values = handler.get_z_index_values(plate_id)  # From cache
   t_values = handler.get_timepoint_values(plate_id)  # From cache

This reduces OMERO API calls from O(wells × sites) to O(1) per plate.

Transparent File Handling
--------------------------

Analysis results (JSON/CSV) are automatically saved as OMERO FileAnnotations:

Format Registry
~~~~~~~~~~~~~~~

.. code-block:: python

   class OMEROFileFormatRegistry:
       """Registry of text file formats that should be saved as FileAnnotations."""
       
       TEXT_FORMATS = {'.json', '.csv', '.txt', '.tsv'}
       
       @classmethod
       def is_text_format(cls, filename: str) -> bool:
           return Path(filename).suffix.lower() in cls.TEXT_FORMATS

FileAnnotation Creation
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

   def save(self, data: np.ndarray, path: str) -> None:
       """Save data to OMERO (image or FileAnnotation)."""
       
       if OMEROFileFormatRegistry.is_text_format(path):
           # Save as FileAnnotation
           file_ann = self._create_file_annotation(data, path)
           # Attach to appropriate OMERO object (plate/well/image)
           self._attach_annotation(file_ann, path)
       else:
           # Save as image plane
           self._write_image_plane(data, path)

This is completely transparent to analysis functions - they just call ``filemanager.save()`` and the backend handles OMERO-specific logic.

Automatic Instance Management
------------------------------

The OMERO instance manager provides automatic server lifecycle management:

Auto-Detection
~~~~~~~~~~~~~~

.. code-block:: python

   from openhcs.runtime.omero_instance_manager import OMEROInstanceManager
   
   manager = OMEROInstanceManager()
   
   # Auto-detect if OMERO is running
   if manager.is_running():
       print("OMERO is running")
   else:
       print("OMERO is not running")

Auto-Connection
~~~~~~~~~~~~~~~

.. code-block:: python

   # Auto-connect to existing instance
   if manager.connect(timeout=10):
       print(f"Connected to OMERO at {manager.host}:{manager.port}")
       conn = manager.get_connection()
   else:
       print("Failed to connect")

Auto-Start
~~~~~~~~~~

.. code-block:: python

   # Auto-start via docker-compose if not running
   if not manager.is_running():
       manager.start_via_docker_compose()
       manager.wait_for_ready(timeout=60)

Context Manager
~~~~~~~~~~~~~~~

.. code-block:: python

   with OMEROInstanceManager() as manager:
       conn = manager.get_connection()
       # Use connection
       # Automatic cleanup on exit

Integration with ZMQ Execution
-------------------------------

The OMERO backend combines with the ZMQ execution system for server-side processing:

.. code-block:: python

   # Client runs locally
   from openhcs.runtime.zmq_execution_client import ZMQExecutionClient
   
   client = ZMQExecutionClient(
       host='omero-server.example.com',
       port=7777
   )
   
   # Server runs on OMERO machine (near data)
   response = client.execute_pipeline(
       plate_id=123,  # OMERO plate ID
       pipeline_steps=steps,
       global_config=config
   )
   
   # Processing happens server-side
   # Results streamed back to local client
   # Zero data transfer overhead

This pattern eliminates data transfer bottlenecks by processing data where it lives.

Backend Parameter Propagation
------------------------------

All analysis materialization handlers accept a ``backend``/``backends`` parameter to enable saving to any backend:

.. code-block:: python

   def cell_counting_cpu(
       image: np.ndarray,
       filemanager: FileManager,
       metadata: Dict,
       backend: str = 'disk',  # Can be 'disk', 'zarr', or 'omero'
       **params
   ) -> Tuple[np.ndarray, Dict]:
       """Cell counting with backend-agnostic saving."""
       
       # Process image
       labeled = label_cells(image)
       
       # Save to specified backend
       filemanager.save(
           labeled,
           construct_path(metadata, 'labeled'),
           backend=backend
       )
       
       # Save analysis results (JSON)
       results = count_cells(labeled)
       filemanager.save(
           results,
           construct_path(metadata, 'results.json'),
           backend=backend  # Automatically becomes FileAnnotation on OMERO
       )
       
       return labeled, results

This is completely transparent to analysis code - no OMERO-specific logic needed.

Critical Bug Fix: Black Well Output
------------------------------------

Problem
~~~~~~~

One well in derived plates always had black (zero) output while all others were fine.

Root Cause
~~~~~~~~~~

``_create_derived_plate`` created placeholder images filled with zeros for all wells. When ``_write_planes_to_plate`` ran, it detected these as "already existing" and skipped writing actual data for the first well processed.

Solution
~~~~~~~~

Removed placeholder image creation entirely. Wells are created empty, and images are created with actual data in ``_write_planes_to_plate`` on first write.

.. code-block:: python

   # OLD: Created placeholder zero images
   for well_id, well_data in wells_structure.items():
       well = create_well(plate, well_id)
       for site in range(well_data.max_sites):
           # ❌ Created placeholder with zeros
           image = create_image_with_zeros(well, site)
   
   # NEW: Create wells without images
   for well_id, well_data in wells_structure.items():
       well = create_well(plate, well_id)
       # ✅ No placeholder images
       # Images created with real data in _write_planes_to_plate

See ``openhcs/io/omero_local.py`` lines 716-730 for implementation.

Usage Example
-------------

Complete Workflow
~~~~~~~~~~~~~~~~~

.. code-block:: python

   from polystore.omero_local import OMEROLocalBackend
   from polystore.base import storage_registry
   from openhcs.microscopes.omero import OMEROHandler
   from openhcs.runtime.omero_instance_manager import OMEROInstanceManager
   from openhcs.core.orchestrator.orchestrator import PipelineOrchestrator

   # 1. Connect to OMERO
   with OMEROInstanceManager() as manager:
       conn = manager.get_connection()

       # 2. Create and register backend (CRITICAL STEP)
       backend = OMEROLocalBackend(omero_conn=conn)
       storage_registry['omero_local'] = backend

       # 3. Create microscope handler
       handler = OMEROHandler(backend=backend)

       # 4. Run pipeline
       orchestrator = PipelineOrchestrator(
           plate_paths=[123],  # OMERO plate ID
           steps=pipeline_steps,
           global_config=global_config
       )

       orchestrator.run()

       # 5. Results saved as OMERO FileAnnotations
       # Automatically attached to plate/wells/images

**Critical Note**: The OMERO backend must be manually registered in the ``storage_registry`` because it requires a connection object that cannot be created automatically by the metaclass system. This is different from other backends (disk, memory, zarr) which are auto-registered.

See Also
--------

- :doc:`storage_and_memory_system` - Storage backend architecture
- :doc:`../guides/omero_integration` - OMERO integration guide