OpenHCS Refactoring Principles: Mathematical Simplification Approach

Status: CANONICAL Version: 1.0 Last Updated: 2025-01-31

This document codifies the mathematical simplification approach used to refactor OpenHCS codebase, treating code duplication like algebraic expressions that can be factored and simplified.

Core First Principles

Algebraic Common Factors Principle

Treat duplicate code patterns like mathematical expressions that can be factored out:

# Before: Duplicate expressions (like 3x + 3y = 3(x + y))
if condition_a:
    result = process_pattern(data_a, param_a)
else:
    result = process_pattern(data_b, param_b)

# After: Factor out common pattern
result = process_pattern(
    data_a if condition_a else data_b,
    param_a if condition_a else param_b
)

Rule: If you see the same logical structure repeated with only parameter variations, extract the common pattern into a parameterized function.

Single-Use Function Inlining Rule

If a method is only called once, inline it as a lambda or direct code rather than creating unnecessary abstraction.

# Before: Unnecessary abstraction
def _helper_method(self, data):
    return data.process() if data else None

def main_method(self):
    return self._helper_method(self.data)  # Only call site

# After: Inline at call site
def main_method(self):
    return self.data.process() if self.data else None

Rule: Use grep -r "method_name" --include="*.py" to verify single usage before inlining.

Mathematical Simplification Approach

Consolidate duplicate conditional logic using:

  • Ternary operators for simple conditions

  • Lookup tables for multiple discrete cases

  • Parameterized functions for complex logic

# Before: Duplicate conditional blocks
if use_recursive:
    config = ResolutionConfig(provider, fallback_a)
else:
    config = ResolutionConfig(provider, fallback_b)

# After: Unified expression
fallback = fallback_a if use_recursive else fallback_b
config = ResolutionConfig(provider, fallback)

Pattern Identification Guidelines

Duplicate Conditional Logic Patterns

Symptoms:

  • Repeated if/else blocks with identical structure

  • Same method calls with different parameters

  • Similar error handling in multiple places

Detection:

# Look for repeated conditional patterns
grep -A 5 -B 5 "if.*:" file.py | grep -A 10 -B 10 "else:"

Example from lazy_config.py:

# BEFORE: Duplicate conditional logic
if use_recursive_resolution:
    return ResolutionConfig(
        instance_provider=instance_provider,
        fallback_chain=fallback_chain or [static_fallback]
    )
else:
    return ResolutionConfig(
        instance_provider=instance_provider,
        fallback_chain=[safe_fallback, static_fallback]
    )

# AFTER: Unified expression
final_fallback_chain = (fallback_chain or [static_fallback]) if use_recursive_resolution else [safe_fallback, static_fallback]
return ResolutionConfig(instance_provider=instance_provider, fallback_chain=final_fallback_chain)

Repeated Field Processing Patterns

Symptoms:

  • Multiple for field in fields(dataclass) loops

  • Similar field value extraction logic

  • Repeated field validation patterns

Example from lazy_config.py:

# BEFORE: Verbose field processing
field_values = {}
for field_obj in fields(dataclass_type):
    if preserve_values:
        field_values[field_obj.name] = getattr(source_config, field_obj.name)
    else:
        field_values[field_obj.name] = None

# AFTER: Concise comprehension
field_values = {f.name: getattr(source_config, f.name) if preserve_values else None for f in fields(dataclass_type)}

Duplicate Value Resolution Logic

Symptoms:

  • Repeated “try sources, return first non-None” patterns

  • Similar null checking and fallback logic

  • Multiple functions with identical resolution structure

Solution Pattern:

# Extract into reusable resolver
def _resolve_value_from_sources(field_name: str, *source_funcs):
    """Try multiple source functions, return first non-None value."""
    for source_func in source_funcs:
        try:
            value = source_func(field_name)
            if value is not None:
                return value
        except (AttributeError, Exception):
            continue
    return None

Method Proliferation Detection

Symptoms:

  • Many private methods with single call sites

  • Helper methods that are just wrappers

  • Excessive abstraction layers

Detection:

# Find single-use methods
for method in $(grep -o "def _[a-z_]*(" file.py | sed 's/def //; s/(//'); do
    count=$(grep -c "$method" file.py)
    if [ $count -eq 2 ]; then  # Definition + single call
        echo "Single-use method: $method"
    fi
done

Solution Strategies

Extract Truly Reusable Patterns

Criteria for extraction:

  • Used in 3+ places

  • Represents a core algorithmic pattern

  • Provides meaningful abstraction

Example:

# Reusable pattern used throughout module
def _resolve_value_from_sources(field_name: str, *source_funcs):
    """Core resolution pattern used in multiple contexts."""
    for source_func in source_funcs:
        try:
            value = source_func(field_name)
            if value is not None:
                return value
        except (AttributeError, Exception):
            continue
    return None

Inline Single-Use Helpers

When to inline:

  • Method has only one call site

  • Method is just a simple wrapper

  • Inlining improves readability

Example from lazy_config.py:

# BEFORE: Unnecessary helper method
def _bind_methods_to_class(lazy_class, base_class, resolution_config):
    method_bindings = {...}
    for method_name, method_impl in method_bindings.items():
        setattr(lazy_class, method_name, method_impl)

# Called only once
LazyDataclassFactory._bind_methods_to_class(lazy_class, base_class, resolution_config)

# AFTER: Inlined at call site
method_bindings = {...}
for method_name, method_impl in method_bindings.items():
    setattr(lazy_class, method_name, method_impl)

Replace Repeated Conditional Blocks

Strategy:

  1. Identify the varying parameters

  2. Extract the common logic

  3. Use ternary operators or lookup tables

Example:

# BEFORE: Repeated structure
if mode == 'edit':
    processor = EditProcessor(config)
    result = processor.process(data)
elif mode == 'view':
    processor = ViewProcessor(config)
    result = processor.process(data)
else:
    processor = DefaultProcessor(config)
    result = processor.process(data)

# AFTER: Lookup table approach
processor_map = {
    'edit': EditProcessor,
    'view': ViewProcessor,
}
processor_class = processor_map.get(mode, DefaultProcessor)
result = processor_class(config).process(data)

Consolidate Field Processing

Pattern:

# BEFORE: Verbose loops
result = {}
for field in fields(obj):
    if condition:
        result[field.name] = transform_a(field)
    else:
        result[field.name] = transform_b(field)

# AFTER: Concise comprehension
result = {f.name: transform_a(f) if condition else transform_b(f) for f in fields(obj)}

Move Inline Imports to Top-Level

Before:

def method():
    from some.module import function  # Inline import
    return function(data)

After:

from some.module import function  # Top-level import

def method():
    return function(data)

Before/After Examples from lazy_config.py Refactoring

Scary Method Simplification

BEFORE (26 lines):

@staticmethod
def create_getattribute() -> Callable[[Any, str], Any]:
    """Create lazy __getattribute__ method."""
    def __getattribute__(self: Any, name: str) -> Any:
        value = object.__getattribute__(self, name)
        if value is None and name in {f.name for f in fields(self.__class__)}:
            # Check if this field has a lazy dataclass type
            field_obj = next((f for f in fields(self.__class__) if f.name == name), None)
            if field_obj:
                field_type = field_obj.type
                # Handle Optional[LazyType] by unwrapping
                if hasattr(field_type, '__origin__') and field_type.__origin__ is Union:
                    args = getattr(field_type, '__args__', ())
                    if len(args) == 2 and type(None) in args:
                        field_type = args[0] if args[1] is type(None) else args[1]

                # Check if field type is a lazy dataclass
                if hasattr(field_type, '_resolve_field_value') or (
                    hasattr(field_type, '__name__') and field_type.__name__.startswith('Lazy')
                ):
                    # Create instance of lazy nested class
                    return field_type()

            # Fall back to standard resolution for non-lazy fields
            return self._resolve_field_value(name)
        else:
            return value
    return __getattribute__

AFTER (6 lines):

@staticmethod
def create_getattribute() -> Callable[[Any, str], Any]:
    """Create lazy __getattribute__ method."""
    def __getattribute__(self: Any, name: str) -> Any:
        value = object.__getattribute__(self, name)
        if value is None and name in {f.name for f in fields(self.__class__)}:
            return self._resolve_field_value(name)
        return value
    return __getattribute__

Duplicate Conditional Logic Unification

BEFORE:

if use_recursive_resolution:
    return ResolutionConfig(
        instance_provider=instance_provider,
        fallback_chain=fallback_chain or [static_fallback]
    )
else:
    safe_fallback = lambda field_name: _get_raw_field_value(instance_provider(), field_name) if instance_provider() else None
    return ResolutionConfig(
        instance_provider=instance_provider,
        fallback_chain=[safe_fallback, static_fallback]
    )

AFTER:

static_fallback = lambda field_name: _get_raw_field_value(base_class(), field_name)
safe_fallback = lambda field_name: _get_raw_field_value(instance_provider(), field_name) if instance_provider() else None

final_fallback_chain = (fallback_chain or [static_fallback]) if use_recursive_resolution else [safe_fallback, static_fallback]

return ResolutionConfig(instance_provider=instance_provider, fallback_chain=final_fallback_chain)

Single-Use Method Inlining

BEFORE:

# Method definition (23 lines)
def _create_unified_lazy_class(self, base_class, global_config_type, field_name, lazy_class_name, parent_field_path=None, parent_instance_provider=None):
    full_field_path = f"{parent_field_path}.{field_name}" if parent_field_path else field_name
    return LazyDataclassFactory.make_lazy_with_field_level_auto_hierarchy(
        base_class=base_class,
        global_config_type=global_config_type,
        field_path=full_field_path,
        lazy_class_name=lazy_class_name,
        context_provider=lambda: parent_instance_provider() if parent_instance_provider else _get_current_config(global_config_type)
    )

# Single call site
lazy_nested_type = LazyDataclassFactory._create_unified_lazy_class(
    base_class=field.type,
    global_config_type=global_config_type,
    field_name=field.name,
    lazy_class_name=f"Lazy{field.type.__name__}",
    parent_field_path=parent_field_path,
    parent_instance_provider=parent_instance_provider
)

AFTER:

# Inlined at call site
full_field_path = f"{parent_field_path}.{field.name}" if parent_field_path else field.name
lazy_nested_type = LazyDataclassFactory.make_lazy_with_field_level_auto_hierarchy(
    base_class=field.type,
    global_config_type=global_config_type,
    field_path=full_field_path,
    lazy_class_name=f"Lazy{field.type.__name__}",
    context_provider=lambda: parent_instance_provider() if parent_instance_provider else _get_current_config(global_config_type)
)

Validation Criteria

Functionality Preservation

  • All tests must pass after refactoring

  • Integration tests verify end-to-end functionality

  • Unit tests confirm individual component behavior

Quantitative Improvements

  • Significant line count reduction (target: 15-25% reduction)

  • Reduced cyclomatic complexity

  • Fewer public methods in API surface area

Qualitative Improvements

  • Elimination of unnecessary abstraction layers

  • Cleaner, more readable code

  • Consistent patterns throughout module

  • Easier maintenance and debugging

OpenHCS Principles Compliance

  • Clean, terse, elegant code

  • Functional programming patterns where appropriate

  • Fail-loud behavior instead of defensive programming

  • Mathematical simplification over complex abstractions

Refactoring Workflow

  1. Analyze: Identify duplicate patterns using grep and manual inspection

  2. Extract: Create truly reusable helper functions for patterns used 3+ times

  3. Inline: Remove single-use helper methods by inlining at call sites

  4. Simplify: Replace duplicate conditional logic with unified expressions

  5. Consolidate: Convert verbose loops to concise comprehensions

  6. Test: Verify all functionality is preserved

  7. Validate: Confirm quantitative and qualitative improvements

Tools and Commands

# Find duplicate patterns
grep -A 5 -B 5 "pattern" file.py

# Find single-use methods
grep -c "method_name" file.py

# Check line count reduction
wc -l file.py  # Before and after

# Verify functionality
python -m pytest tests/

Case Study: lazy_config.py Refactoring Results

Quantitative Results:

  • File size reduced: 997 lines → 801 lines (20% reduction)

  • Eliminated code duplication: Consolidated duplicate conditional logic and field processing patterns

  • Simplified complex methods: Made scary, unreadable code clean and terse

  • Inlined single-use methods: Removed unnecessary abstraction layers per OpenHCS principles

Key Transformations:

  • 15+ inline imports consolidated to top-level imports

  • Duplicate conditional logic unified into single expressions

  • Single-use private methods inlined at call sites

  • Complex type checking simplified to essential logic

Validation:

  • ✅ All functionality preserved - lazy config resolution works correctly

  • ✅ Integration tests pass - pipeline execution completes successfully

  • ✅ API compatibility maintained - existing imports and usage unchanged

  • ✅ OpenHCS principles followed - clean, terse, elegant code with minimal redundancy

Summary

Remember: The goal is mathematical simplification - treat code like algebraic expressions that can be factored, simplified, and optimized while preserving their essential behavior.

This approach transforms complex, duplicated code into clean, maintainable implementations that follow OpenHCS architectural principles while preserving all functionality.