OMERO Backend System
Overview
The OMERO backend system provides server-side execution support for OpenHCS on OMERO servers with zero data transfer overhead. It implements a virtual backend pattern where filenames are generated on-demand from OMERO metadata rather than from a real filesystem.
The Server-Side Execution Challenge: Traditional image processing requires downloading data from OMERO to local machines, processing it, and uploading results back. For high-content screening datasets (100GB+ per plate), this creates massive data transfer bottlenecks and makes server-side processing impractical.
The OpenHCS Solution: A virtual backend architecture that generates filenames from OMERO’s plate structure without requiring a real filesystem. Combined with multiprocessing-safe connection management and the ZMQ execution system, this enables true server-side processing where data never leaves the OMERO server.
Key Innovation: VirtualBackend ABC pattern that separates backends with real filesystems (disk, zarr) from backends that generate filenames on-demand (OMERO, cloud storage). This enables location-transparent processing where the same pipeline code works on disk, zarr, or OMERO without modification.
Architecture
Virtual Backend Pattern
Unlike traditional storage backends (disk, zarr), the OMERO backend is a VirtualBackend that generates filenames on-demand from OMERO’s plate structure:
# Traditional backend: Real files on disk
/data/plate/A01/A01_s001_z000_c000.tif # Actual file exists
# OMERO backend: Virtual paths generated from metadata
/omero/plate_123/A01/A01_s001_z000_c000.tif # No real file, just metadata
Key Design Principles
- No Real Filesystem
All paths are virtual, generated from OMERO plate structure
- Lazy Loading
Plate metadata cached on first access, reused for all operations
- Location Transparency
Same path format regardless of backend (disk, zarr, OMERO)
- On-Demand Generation
Files created only when needed (e.g., derived plates)
- Automatic Backend Selection
OMERO plates automatically use
omero_localfor both read and write operations, ignoring user’s materialization backend choice
Automatic Backend Selection
Critical Design Rule: OMERO plates MUST use omero_local backend for both input (read) and output (materialization). The system automatically enforces this through the microscope handler’s backend compatibility system.
Why This Matters:
OMERO uses virtual paths like
/omero/plate_123/that don’t exist on the filesystemAttempting to read/write using
diskorzarrbackends will fail with permission errorsOMERO output must be saved as FileAnnotations attached to OMERO objects, not as files
Automatic Backend Selection Logic:
The backend selection happens through the microscope handler’s compatible_backends property:
class OMEROHandler(MicroscopeHandler):
@property
def compatible_backends(self) -> List[Backend]:
"""OMERO is only compatible with OMERO_LOCAL backend."""
return [Backend.OMERO_LOCAL]
When the compiler calls get_primary_backend(), it returns the first compatible backend, which for OMERO is always omero_local. This applies to both read and materialization backends:
# Compiler backend selection (in MaterializationFlagPlanner)
read_backend = context.microscope_handler.get_primary_backend(plate_path, filemanager)
# Returns: 'omero_local' (first compatible backend for OMERO)
materialization_backend = context.microscope_handler.get_primary_backend(plate_path, filemanager)
# Returns: 'omero_local' (same logic)
User Impact:
Users don’t need to configure backends for OMERO plates
System “just works” regardless of VFSConfig settings
Prevents common errors from trying to write to
/omero/paths
Contrast with Other Microscopes:
ImageXpress/Opera Phenix: Compatible with disk backend → Read from disk → Write to OpenHCS format (disk or zarr based on
materialization_backend)OpenHCS: Compatible with disk/zarr/virtual_workspace → Read from auto-detected backend → Write to OpenHCS format (disk or zarr based on
materialization_backend)OMERO: Compatible with omero_local only → Read from omero_local → Write to omero_local (
materialization_backendchoice ignored)
VirtualBackend ABC
The system introduces a new abstract base class for backends without real filesystems:
class VirtualBackend(ABC):
"""Base class for backends without real filesystem (OMERO, cloud, etc.)"""
@abstractmethod
def list_files(self, directory: str) -> List[str]:
"""Generate file list from metadata."""
pass
@abstractmethod
def generate_filename(self, metadata: Dict) -> str:
"""Generate filename from metadata."""
pass
This enables future cloud storage backends (S3, GCS) using the same pattern.
Multiprocessing-Safe Connection Management
OMERO connections contain unpicklable IcePy.Communicator objects, requiring special handling for multiprocessing:
The Problem
# This fails - connection can't be pickled
backend = OMEROLocalBackend(omero_conn=conn)
process = multiprocessing.Process(target=worker, args=(backend,)) # ❌ Pickle error
The Solution
# Connection parameters stored, not connection itself
backend = OMEROLocalBackend(omero_conn=conn)
# Connection recreated in worker process using stored params
process = multiprocessing.Process(target=worker, args=(backend,)) # ✅ Works
Implementation Strategy
Main Process: Store connection parameters (host, port, username, password)
Pickle: Exclude connection object via
__getstate__Worker Process: Recreate connection using stored parameters
Global Registry: Share connections across backend instances
class OMEROLocalBackend(VirtualBackend):
def __getstate__(self):
"""Exclude unpicklable connection object."""
state = self.__dict__.copy()
# Remove unpicklable connection
state['_initial_conn'] = None
return state
def __setstate__(self, state):
"""Restore state after unpickling."""
self.__dict__.update(state)
# Connection will be retrieved from global registry in worker process
See openhcs/io/omero_local.py lines 93-150 for complete implementation.
Metadata Caching Strategy
OMERO metadata is cached at the plate level to minimize API queries:
Cache Structure
@dataclass
class PlateStructure:
plate_id: int
parser_name: str
microscope_type: str
wells: Dict[str, WellStructure] # well_id → WellStructure
all_well_ids: Set[str]
max_sites: int
max_z: int
max_c: int
max_t: int
@dataclass
class WellStructure:
well_id: str
row: int
col: int
images: Dict[int, ImageStructure] # site → ImageStructure
@dataclass
class ImageStructure:
image_id: int
site: int
size_z: int
size_c: int
size_t: int
Caching Pattern
# First access: Query OMERO once for entire plate
metadata = handler.get_channel_values(plate_id) # Queries OMERO
# Subsequent accesses: Return cached data
z_values = handler.get_z_index_values(plate_id) # From cache
t_values = handler.get_timepoint_values(plate_id) # From cache
This reduces OMERO API calls from O(wells × sites) to O(1) per plate.
Transparent File Handling
Analysis results (JSON/CSV) are automatically saved as OMERO FileAnnotations:
Format Registry
class OMEROFileFormatRegistry:
"""Registry of text file formats that should be saved as FileAnnotations."""
TEXT_FORMATS = {'.json', '.csv', '.txt', '.tsv'}
@classmethod
def is_text_format(cls, filename: str) -> bool:
return Path(filename).suffix.lower() in cls.TEXT_FORMATS
FileAnnotation Creation
def save(self, data: np.ndarray, path: str) -> None:
"""Save data to OMERO (image or FileAnnotation)."""
if OMEROFileFormatRegistry.is_text_format(path):
# Save as FileAnnotation
file_ann = self._create_file_annotation(data, path)
# Attach to appropriate OMERO object (plate/well/image)
self._attach_annotation(file_ann, path)
else:
# Save as image plane
self._write_image_plane(data, path)
This is completely transparent to analysis functions - they just call filemanager.save() and the backend handles OMERO-specific logic.
Automatic Instance Management
The OMERO instance manager provides automatic server lifecycle management:
Auto-Detection
from openhcs.runtime.omero_instance_manager import OMEROInstanceManager
manager = OMEROInstanceManager()
# Auto-detect if OMERO is running
if manager.is_running():
print("OMERO is running")
else:
print("OMERO is not running")
Auto-Connection
# Auto-connect to existing instance
if manager.connect(timeout=10):
print(f"Connected to OMERO at {manager.host}:{manager.port}")
conn = manager.get_connection()
else:
print("Failed to connect")
Auto-Start
# Auto-start via docker-compose if not running
if not manager.is_running():
manager.start_via_docker_compose()
manager.wait_for_ready(timeout=60)
Context Manager
with OMEROInstanceManager() as manager:
conn = manager.get_connection()
# Use connection
# Automatic cleanup on exit
Integration with ZMQ Execution
The OMERO backend combines with the ZMQ execution system for server-side processing:
# Client runs locally
from openhcs.runtime.zmq_execution_client import ZMQExecutionClient
client = ZMQExecutionClient(
host='omero-server.example.com',
port=7777
)
# Server runs on OMERO machine (near data)
response = client.execute_pipeline(
plate_id=123, # OMERO plate ID
pipeline_steps=steps,
global_config=config
)
# Processing happens server-side
# Results streamed back to local client
# Zero data transfer overhead
This pattern eliminates data transfer bottlenecks by processing data where it lives.
Backend Parameter Propagation
All analysis materialization handlers accept a backend/backends parameter to enable saving to any backend:
def cell_counting_cpu(
image: np.ndarray,
filemanager: FileManager,
metadata: Dict,
backend: str = 'disk', # Can be 'disk', 'zarr', or 'omero'
**params
) -> Tuple[np.ndarray, Dict]:
"""Cell counting with backend-agnostic saving."""
# Process image
labeled = label_cells(image)
# Save to specified backend
filemanager.save(
labeled,
construct_path(metadata, 'labeled'),
backend=backend
)
# Save analysis results (JSON)
results = count_cells(labeled)
filemanager.save(
results,
construct_path(metadata, 'results.json'),
backend=backend # Automatically becomes FileAnnotation on OMERO
)
return labeled, results
This is completely transparent to analysis code - no OMERO-specific logic needed.
Critical Bug Fix: Black Well Output
Problem
One well in derived plates always had black (zero) output while all others were fine.
Root Cause
_create_derived_plate created placeholder images filled with zeros for all wells. When _write_planes_to_plate ran, it detected these as “already existing” and skipped writing actual data for the first well processed.
Solution
Removed placeholder image creation entirely. Wells are created empty, and images are created with actual data in _write_planes_to_plate on first write.
# OLD: Created placeholder zero images
for well_id, well_data in wells_structure.items():
well = create_well(plate, well_id)
for site in range(well_data.max_sites):
# ❌ Created placeholder with zeros
image = create_image_with_zeros(well, site)
# NEW: Create wells without images
for well_id, well_data in wells_structure.items():
well = create_well(plate, well_id)
# ✅ No placeholder images
# Images created with real data in _write_planes_to_plate
See openhcs/io/omero_local.py lines 716-730 for implementation.
Usage Example
Complete Workflow
from polystore.omero_local import OMEROLocalBackend
from polystore.base import storage_registry
from openhcs.microscopes.omero import OMEROHandler
from openhcs.runtime.omero_instance_manager import OMEROInstanceManager
from openhcs.core.orchestrator.orchestrator import PipelineOrchestrator
# 1. Connect to OMERO
with OMEROInstanceManager() as manager:
conn = manager.get_connection()
# 2. Create and register backend (CRITICAL STEP)
backend = OMEROLocalBackend(omero_conn=conn)
storage_registry['omero_local'] = backend
# 3. Create microscope handler
handler = OMEROHandler(backend=backend)
# 4. Run pipeline
orchestrator = PipelineOrchestrator(
plate_paths=[123], # OMERO plate ID
steps=pipeline_steps,
global_config=global_config
)
orchestrator.run()
# 5. Results saved as OMERO FileAnnotations
# Automatically attached to plate/wells/images
Critical Note: The OMERO backend must be manually registered in the storage_registry because it requires a connection object that cannot be created automatically by the metaclass system. This is different from other backends (disk, memory, zarr) which are auto-registered.
See Also
Storage and Memory System Architecture - Storage backend architecture
OMERO Integration - OMERO integration guide