Function Registry System

OpenHCS implements a unified library registry system that automatically discovers and integrates GPU-accelerated functions from multiple libraries (pyclesperanto, scikit-image, CuCIM) with type-safe contracts, JSON-based caching, and consistent memory management.

## Why Unified Function Discovery

Scientific image processing involves diverse libraries, each with different:

  • Memory types: NumPy arrays, CuPy arrays, PyTorch tensors

  • Function signatures: Inconsistent parameter naming and ordering

  • Processing contracts: 2D-only, 3D-capable, or flexible dimensionality

  • GPU support: Native GPU, CPU-only, or hybrid implementations

Without unification, pipelines would need library-specific logic throughout. The registry system provides a single interface to all functions while preserving their native performance characteristics.

## Unified Registry Architecture

The new unified registry system is built around the LibraryRegistryBase abstract class that eliminates ~70% of code duplication across library registries while enforcing consistent behavior:

class LibraryRegistryBase(ABC):
    """Minimal ABC for all library registries."""

    # Common exclusions across all libraries
    COMMON_EXCLUSIONS = {
        'imread', 'imsave', 'load', 'save', 'read', 'write',
        'show', 'imshow', 'plot', 'display', 'view', 'visualize'
    }

    # Abstract class attributes - each implementation must define these
    MODULES_TO_SCAN: List[str]
    MEMORY_TYPE: str  # Memory type string value
    FLOAT_DTYPE: Any  # Library-specific float32 type

This design enables consistent function discovery across all supported libraries while maintaining their native performance characteristics.

## Processing Contract System

The registry system classifies functions by their processing contracts using a unified enum:

class ProcessingContract(Enum):
    PURE_3D = "_execute_pure_3d"        # Processes 3D volumes directly
    PURE_2D = "_execute_pure_2d"        # Processes 2D slices only
    FLEXIBLE = "_execute_flexible"       # Handles both 2D and 3D
    VOLUMETRIC_TO_SLICE = "_execute_volumetric_to_slice"  # 3D→2D reduction

    def execute(self, registry, func, image, *args, **kwargs):
        """Execute the contract method on the registry."""
        method = getattr(registry, self.value)
        return method(func, image, *args, **kwargs)

This classification enables OpenHCS to automatically handle dimensionality conversions and choose optimal execution strategies.

## JSON-Based Cache Architecture

The unified registry system features a clean, fail-loud JSON-based cache architecture with version validation:

def _load_from_cache(self) -> Optional[Dict[str, FunctionMetadata]]:
    """Load function metadata from cache with validation."""
    # Version validation
    cached_version = cache_data.get('library_version', 'unknown')
    current_version = self.get_library_version()
    if cached_version != current_version:
        logger.info(f"Version changed ({cached_version}{current_version}) - cache invalid")
        return None

    # Age validation (7 day expiry)
    cache_age_days = (time.time() - cache_timestamp) / (24 * 3600)
    if cache_age_days > 7:
        return None

Cache Benefits: - Fast startup: Instant loading of all libraries from cache - Version safety: Automatic cache invalidation on library updates - Function reconstruction: Preserves original function names and metadata - Fail-loud behavior: No silent cache corruption or stale data

## Automatic Function Discovery

The registry automatically scans and registers functions from multiple GPU libraries:

  • 230+ pyclesperanto functions: GPU-accelerated OpenCL implementations

  • 110+ scikit-image functions: CPU implementations with GPU variants via CuCIM

  • 124+ CuCIM functions: RAPIDS GPU imaging library

  • CuPy scipy.ndimage functions: GPU-accelerated NumPy equivalents

  • Native OpenHCS functions: Custom implementations for specific workflows

Total: Comprehensive function library with unified contracts and automatic memory type conversion.

## Registry Service and Automatic Discovery

The RegistryService provides unified access to all registry implementations with automatic discovery:

class RegistryService:
    """Clean service for registry discovery and function metadata access."""

    @classmethod
    def get_all_functions_with_metadata(cls) -> Dict[str, FunctionMetadata]:
        """Get unified metadata for all functions from all registries."""
        # Discover all registry classes automatically
        registry_classes = cls._discover_registries()

        # Load functions from each registry (with caching)
        for registry_class in registry_classes:
            registry_instance = registry_class()
            functions = registry_instance._load_or_discover_functions()
            all_functions.update(functions)

Automatic Discovery: Uses pkgutil.walk_packages to automatically discover all registry implementations in openhcs.processing.backends.lib_registry, ensuring the system automatically adapts to new registries without code changes.

## Directory Structure

The unified registry system moved from the old structure to a clean, organized layout:

Old Structure (deprecated):

openhcs/processing/backends/analysis/
├── cupy_registry.py
├── pyclesperanto_registry.py
└── scikit_image_registry.py

New Structure:

openhcs/processing/backends/lib_registry/
├── unified_registry.py          # Base classes and common functionality
├── registry_service.py          # Automatic discovery service
├── openhcs_registry.py          # OpenHCS native functions
├── pyclesperanto_registry.py    # Pyclesperanto GPU functions
├── scikit_image_registry.py     # Scikit-image CPU functions
└── cupy_registry.py             # CuPy GPU functions

## Function Metadata System

Each registered function is wrapped in a FunctionMetadata dataclass that provides clean metadata without library-specific leakage:

@dataclass(frozen=True)
class FunctionMetadata:
    """Clean metadata with no library-specific leakage."""
    name: str                    # Function name in registry
    func: Callable              # Wrapped function ready for execution
    contract: ProcessingContract # Processing behavior classification
    registry: LibraryRegistryBase # Reference to source registry
    module: str = ""            # Original module path
    doc: str = ""               # First line of docstring
    tags: List[str] = []        # Generated tags for categorization
    original_name: str = ""     # Original function name for cache reconstruction

## Memory Type Abstraction

The registry provides automatic memory type conversion between different GPU libraries:

### Automatic Conversion - NumPy ↔ CuPy: Zero-copy GPU transfers where possible - PyTorch ↔ CuPy: Shared memory GPU tensors - Memory type detection: Automatic input type recognition - Optimal routing: Functions execute on their native memory types

### Type Safety - Contract validation: Ensures functions receive compatible data types - Dimension checking: Validates 2D vs 3D requirements before execution - Error prevention: Catches type mismatches at registration time

## Integration with Pipeline System

### Function Discovery The updated func_registry.py integrates with the unified registry system:

# Phase 1: Register all functions from RegistryService
from openhcs.processing.backends.lib_registry.registry_service import RegistryService
all_functions = RegistryService.get_all_functions_with_metadata()

# Initialize registry structure based on discovered registries
for func_name, metadata in all_functions.items():
    registry_name = metadata.registry.library_name
    if registry_name not in FUNC_REGISTRY:
        FUNC_REGISTRY[registry_name] = []

# Register all functions
for func_name, metadata in all_functions.items():
    registry_name = metadata.registry.library_name
    FUNC_REGISTRY[registry_name].append(metadata.func)

### Automatic Optimization - GPU acceleration: Automatically uses GPU variants when available - Memory efficiency: Minimizes CPU↔GPU transfers - Contract-based execution: Chooses optimal processing strategy - JSON caching: Fast startup through metadata caching with version validation

## Design Benefits

### Code Reduction - Eliminates ~1000+ lines: Removes duplicated code across library registries - Consistent patterns: Enforces uniform testing and registration behavior - Centralized fixes: Bug fixes and improvements apply to all libraries - Type-safe interface: Abstract base prevents shortcuts and ensures consistency

### Developer Experience - Single interface: All functions work identically regardless of library - Automatic discovery: New registries are automatically detected - GPU transparency: Automatic GPU acceleration without code changes - Library agnostic: Switch between implementations without pipeline changes

### Performance - Native speed: Functions execute at library-native performance - Memory optimization: Minimal type conversion overhead - GPU utilization: Automatic GPU routing for supported functions - Fast startup: JSON cache enables instant loading of all libraries

### Extensibility - Minimal code: Adding new libraries requires only 60-120 lines vs 350-400 - Automatic integration: New registries are discovered without configuration - Contract system: Automatic classification of new function behaviors - Version safety: Automatic cache invalidation prevents stale function metadata

### Architecture Improvements - Clean separation: Library-specific logic isolated in individual registries - Fail-loud behavior: No defensive programming or silent failures - Generic solution: Automatically adapts to new components without hardcoding - Cache architecture: JSON-based with version validation and age expiry

This unified registry architecture enables OpenHCS to provide a single, consistent interface to hundreds of GPU-accelerated functions while maintaining their native performance characteristics and handling the complexity of memory type conversions transparently. The system eliminates massive code duplication while making it trivial to add support for new libraries.