Memory Type System and Stack Utils
Overview
OpenHCS implements a memory type system that enables conversion between different array libraries (NumPy, PyTorch, CuPy, TensorFlow, JAX, pyclesperanto) while maintaining dimensional constraints and GPU device management.
Note: All code examples reflect the actual OpenHCS implementation and are verified against the current codebase.
Core Principles
Clause 278: Mandatory 3D Output Enforcement
All functions must return a 3D array of shape [Z, Y, X], even when
operating on a single 2D slice. This prevents silent shape coercion and
enforces explicit intent throughout the pipeline.
Memory Type Discipline
Explicit Declaration: All functions must declare input/output memory types via decorators
Automatic Conversion: Stack utils handle conversion between memory types
GPU Discipline: Explicit GPU device management and validation
Strict Validation: Fail fast on invalid inputs rather than silent coercion
Stack Utils Architecture
stack_slices(): 2D → 3D Conversion
Converts a list of 2D images into a single 3D array with specified memory type:
Input Requirements:
slices: List of 2D arrays (any supported memory type)memory_type: Target memory type (numpy,cupy,torch,tensorflow,jax,pyclesperanto)gpu_id: GPU device ID (required, validated for GPU memory types)
Output Guarantees:
Always returns 3D array of shape
[Z, Y, X]All slices converted to target memory type
GPU placement enforced for GPU memory types
Validation:
All input slices must be 2D
Empty slice list raises error
GPU device ID validated for GPU memory types
unstack_slices(): 3D → 2D Conversion
Splits a 3D array into a list of 2D slices with specified memory type:
Input Requirements:
array: 3D array (any supported memory type)memory_type: Target memory type for output slicesgpu_id: GPU device ID (required)validate_slices: Optional validation that output slices are 2D
Output Guarantees:
Returns list of 2D slices in target memory type
Preserves spatial dimensions (Y, X)
GPU placement enforced for GPU memory types
Memory Conversion System
Core Conversion Function
The memory conversion system provides direct conversion between any supported memory types:
Supported Conversions:
NumPy ↔ CuPy: Direct GPU/CPU transfer
PyTorch ↔ CuPy: GPU tensor sharing
TensorFlow ↔ JAX: Cross-framework GPU arrays
pyclesperanto ↔ All: OpenCL GPU arrays
Conversion Strategy:
Direct Conversion: When libraries share memory layout
CPU Roundtrip: When direct GPU-to-GPU conversion unavailable
Fail-Loud: Clear errors for unsupported conversions
Memory Type Detection
Automatic detection of array memory types:
Detection Strategy:
Type-based: Uses
type(data).__module__patternsAttribute-based: Checks for library-specific attributes
Fail-loud: Raises error for unknown types
GPU Device Management
Device Validation
Strict GPU device validation for GPU memory types:
Validation Rules:
GPU memory types require valid
gpu_idCPU memory types ignore
gpu_idInvalid device IDs raise immediate errors
Device Movement
Moving arrays between GPU devices:
Movement Strategy:
PyTorch:
.to(device)methodCuPy: Context manager with device switching
TensorFlow:
tf.device()contextJAX:
jax.device_put()with explicit device
Integration with FunctionStep
Pipeline Integration
Stack utils integrate seamlessly with FunctionStep execution:
# FunctionStep automatically handles memory type conversion
def process_images(context, pattern_group, memory_type, gpu_id):
# 1. Load 2D slices from files
slices_2d = [load_image(path) for path in pattern_group]
# 2. Stack to 3D with target memory type
stack_3d = stack_slices(slices_2d, memory_type, gpu_id)
# 3. Process with target memory type
processed_3d = processing_function(stack_3d)
# 4. Unstack back to 2D for saving
output_slices = unstack_slices(processed_3d, "numpy", gpu_id)
return output_slices
Memory Type Flow:
Input: Files loaded as NumPy arrays
Processing: Converted to target memory type (torch, cupy, etc.)
Output: Converted back to NumPy for file saving
Performance Characteristics
Conversion Performance
Fast Conversions (shared memory):
NumPy ↔ CuPy (GPU memory sharing)
PyTorch ↔ CuPy (GPU tensor sharing)
TensorFlow ↔ JAX (similar GPU layouts)
Slow Conversions (CPU roundtrip):
PyTorch ↔ TensorFlow (different GPU memory models)
pyclesperanto ↔ JAX (OpenCL ↔ CUDA)
Optimization Strategy:
Minimize conversions within processing chains
Use consistent memory types for related operations
Leverage GPU memory sharing when possible
Memory Usage Patterns
Efficient Patterns:
# Good: Consistent memory type throughout chain
stack_3d = stack_slices(slices, "torch", gpu_id)
processed = torch_function_1(stack_3d)
processed = torch_function_2(processed)
output = unstack_slices(processed, "numpy", gpu_id)
Inefficient Patterns:
# Bad: Multiple conversions
stack_3d = stack_slices(slices, "torch", gpu_id)
cupy_result = convert_memory(stack_3d, "torch", "cupy", gpu_id)
torch_result = convert_memory(cupy_result, "cupy", "torch", gpu_id)
Error Handling and Validation
Strict Validation Philosophy
OpenHCS follows fail-loud principles for memory operations:
Dimensional Validation:
Input slices must be exactly 2D
Output arrays must be exactly 3D
No silent dimension coercion
Memory Type Validation:
Unknown memory types raise immediate errors
GPU operations require valid device IDs
Conversion failures provide detailed error messages
Example Error Messages:
MemoryConversionError: Failed to convert torch tensor to cupy array
Reason: GPU device 2 not available (only 0-1 available)
Source: torch.cuda.FloatTensor on device 0
Target: cupy array on device 2
Future Extensions
Planned Enhancements
Memory Pressure Detection:
Automatic fallback to CPU when GPU memory exhausted
Smart memory type selection based on available resources
Lazy Conversion:
Defer conversions until actually needed
Chain multiple operations before converting
Memory Pool Management:
Reuse allocated arrays to reduce allocation overhead
GPU memory pool optimization for large datasets
This memory type system ensures type safety, performance, and maintainability across OpenHCS’s multi-backend processing pipeline.