Module Structure

Overview

OpenHCS uses a carefully designed module structure that enforces clean architectural boundaries, prevents circular dependencies, and ensures doctrinal compliance. This document explains the organization of modules in OpenHCS and the principles behind this structure.

Basic Module Concepts

In OpenHCS, modules are organized according to these key principles:

Interface-Implementation Separation: Interfaces are defined separately from their implementations
Schema-First Design: Data structures are defined in schemas before they are used
Explicit Registration: Components are registered explicitly, not implicitly on import
Unidirectional Dependencies: Dependencies flow in one direction to prevent cycles

OpenHCS Module Structure

OpenHCS follows a clean, hierarchical module organization optimized for GPU-accelerated bioimage analysis:

Real Import Patterns (from TUI-generated scripts):

# Core orchestration and pipeline components
from openhcs.core.orchestrator.orchestrator import PipelineOrchestrator
from openhcs.core.steps.function_step import FunctionStep
from openhcs.core.config import GlobalPipelineConfig

# Constants and enums
from openhcs.constants.constants import VariableComponents, Backend, Microscope

# Processing functions by computational backend
from openhcs.processing.backends.processors.torch_processor import stack_percentile_normalize
from openhcs.processing.backends.processors.cupy_processor import tophat, create_composite
from openhcs.processing.backends.analysis.cell_counting_cpu import count_cells_single_channel
from openhcs.processing.backends.assemblers.assemble_stack_cupy import assemble_stack_cupy

# Configuration classes
from openhcs.core.config import PathPlanningConfig, VFSConfig, ZarrConfig
from openhcs.core.config import MaterializationBackend, ZarrCompressor, ZarrChunkStrategy

Module Organization:

openhcs/
├── core/                       # Core orchestration and pipeline management
│   ├── orchestrator/           # PipelineOrchestrator and execution engine
│   ├── steps/                  # FunctionStep and step implementations
│   ├── config/                 # Configuration classes and management
│   └── context/                # ProcessingContext and state management
├── processing/                 # Integrated Python image processing libraries
│   └── backends/               # Organized by computational backend
│       ├── processors/         # Image processing functions
│       ├── analysis/           # Analysis and measurement functions
│       ├── assemblers/         # Image assembly and stitching
│       └── pos_gen/            # Position generation algorithms
├── constants/                  # Enums and constant definitions
└── utils/                      # Utility functions and helpers

Import Patterns by Use Case:

# Basic pipeline creation (most common)
from openhcs.core.orchestrator.orchestrator import PipelineOrchestrator
from openhcs.core.steps.function_step import FunctionStep
from openhcs.core.config import GlobalPipelineConfig
from openhcs.constants.constants import VariableComponents

# GPU processing functions
from openhcs.processing.backends.processors.torch_processor import stack_percentile_normalize
from openhcs.processing.backends.processors.cupy_processor import tophat

# Analysis functions
from openhcs.processing.backends.analysis.cell_counting_cpu import count_cells_single_channel

# Configuration and enums
from openhcs.core.config import PathPlanningConfig, VFSConfig, ZarrConfig
from openhcs.constants.constants import Backend, Microscope

Key Modules and Their Purpose

core/

The core/ module contains the fundamental orchestration and pipeline management components:

orchestrator/: PipelineOrchestrator for multi-well execution
steps/: FunctionStep implementation with GPU support
config/: Configuration classes for system-wide settings
context/: ProcessingContext for execution state management

processing/backends/

The processing/backends/ module provides seamless integration with major Python image processing libraries, organized by computational backend:

processors/: Image processing functions (torch, cupy, numpy)
analysis/: Analysis and measurement functions
assemblers/: Image assembly and stitching algorithms
pos_gen/: Position generation for stitching

constants/

The constants/ module defines enums and constants used throughout OpenHCS:

VariableComponents: SITE, CHANNEL, TIME processing dimensions
Backend: Memory backend types (MEMORY, DISK, ZARR)
Microscope: Supported microscope types and detection

Doctrinal Motivation: Enforces clear separation of concerns, facilitates polymorphism, and is crucial for breaking import cycles. Implementations depend on these interfaces, not on each other directly. Supports Clause 21 (Frontloaded Validation) by making dependencies explicit.

schemas/

The schemas/ directory contains all Pydantic models or other schema definitions used for configuration, data validation, and context management. These schemas define the structure of data that flows through the system.

Doctrinal Motivation: Enforces Clause 21 (Frontloaded Validation) by providing a single source of truth for data structures. Promotes Clause 66 (Context Immunity) by clearly defining the structure of context objects.

registries/

The registries/ directory contains modules responsible for the registration and discovery of pluggable components (backends, handlers, steps). Each registry follows an initialize_foo() pattern for explicit, controlled initialization.

Doctrinal Motivation: Decouples component definition from usage. Prevents registration side-effects on module import. Ensures initialization is explicit and traceable, supporting testability and Clause 3 (Statelessness) by controlling when stateful registries are populated.

backends/

The backends/ directory contains concrete implementations of interfaces defined in interfaces/. Each backend (e.g., MIST, Ashlar) resides in its own sub-package.

Doctrinal Motivation: Clear separation of implementation from interface. __init__.py in this directory and its subdirectories are minimal to prevent accidental registration on import.

io/backends/

The io/backends/ directory could contain different storage backend implementations.

Doctrinal Motivation: Similar to backends/, separates storage interface implementations from their definition.

Initialization Discipline

OpenHCS follows a strict initialization discipline to prevent side-effects on import and ensure explicit control over component registration:

No Registration at Module Load: Processing backends, GPU schedulers, and pipeline components are not registered when their respective modules are imported.
Explicit Setup Pattern: All registries provide explicit setup functions that perform the actual registration of available implementations.
Import-Safe Initialization Points: - TUI Application: The TUI calls necessary setup functions during application startup. - Script Generation: Generated scripts include proper initialization code. - Orchestrator Usage: Applications using OpenHCS as a library must call setup functions explicitly.

Doctrinal Motivation: Ensures that the application state (which components are available) is explicitly managed and not a side-effect of imports. This improves predictability, testability, and makes the set of available components deterministic at initialization.

Public API

OpenHCS provides a stable public API through the openhcs package. This API is carefully designed to be safe to import without triggering side-effects:

# Safe to import - no side effects
import openhcs

# Initialize openhcs before using
openhcs.initialize()

# Now use the API
config = openhcs.create_config(input_dir="path/to/images")
results = openhcs.run_pipeline(config)

The public API is defined in openhcs/api.py and re-exported by openhcs/__init__.py. This ensures that import openhcs is safe and does not trigger backend registrations or other internal initializations.

Doctrinal Compliance

OpenHCS’s module structure is designed to comply with the following doctrinal clauses:

Clause 3 (Statelessness): Explicit initialization of registries and clear separation of concerns help in designing components that are individually stateless or whose state is managed explicitly.
Clause 12 (Smell Intolerance): When fetching from a registry, if an item is not found, a deterministic error is raised. No trying alternative names or default fallbacks.
Clause 17 (VFS Exclusivity): The openhcs/core/io/file_manager.py module is the primary interaction point for file system operations, using the VFS system. Other modules depend on this for I/O.
Clause 21 (Frontloaded Validation): Interfaces define explicit contracts. Configuration classes define data dependencies. Registries make component availability explicit rather than implicit through imports.
Clause 65 (Absolute Execution): Clear interfaces and explicit registration reduce the need for hasattr or try-except blocks for probing capabilities.
Clause 66 (Context Immunity): The openhcs/core/context/processing_context.py manages context explicitly. Components declare their context needs via configuration classes.
Clause 77 (Rot Intolerance): The refactor provides an opportunity to identify and prune unused modules or consolidate overly fragmented ones. Clearer directory responsibilities make rot more apparent.

Best Practices

When working with OpenHCS’s module structure, follow these best practices:

Import from Public API: Import from openhcs.core and openhcs.processing rather than directly from implementation modules.
Use Configuration Classes: Define data structures using configuration classes in openhcs.core.config before using them.
Setup Components Explicitly: Setup GPU registries and processing backends explicitly, not implicitly on import.
Initialize Before Use: Call appropriate setup functions before using GPU-accelerated components.
Respect Unidirectional Dependencies: Ensure dependencies flow in one direction to prevent cycles: - Abstract classes should not depend on implementations - Configuration should not depend on implementations - Implementations should depend on abstract classes and configuration - Registries should depend on interfaces, not implementations
Use VirtualPath for I/O: Always use VirtualPath for file system operations, not Path or str.
Declare Context Dependencies: Use StepFieldDependency to declare context field dependencies, not direct access to context attributes.
Avoid Runtime Flexibility: Don’t vary behavior based on field presence or state. Use explicit schemas and validation.
Eliminate Dead Code: Remove unused code, procedural glue, or legacy compatibility layers.
Write Structural Tests: Tests should enforce structure, not behavior. Use tests in tests/rot/ to verify doctrinal compliance.