OpenHCS Refactoring Principles: Mathematical Simplification Approach
Status: CANONICAL Version: 1.0 Last Updated: 2025-01-31
This document codifies the mathematical simplification approach used to refactor OpenHCS codebase, treating code duplication like algebraic expressions that can be factored and simplified.
Core First Principles
Algebraic Common Factors Principle
Treat duplicate code patterns like mathematical expressions that can be factored out:
# Before: Duplicate expressions (like 3x + 3y = 3(x + y))
if condition_a:
result = process_pattern(data_a, param_a)
else:
result = process_pattern(data_b, param_b)
# After: Factor out common pattern
result = process_pattern(
data_a if condition_a else data_b,
param_a if condition_a else param_b
)
Rule: If you see the same logical structure repeated with only parameter variations, extract the common pattern into a parameterized function.
Single-Use Function Inlining Rule
If a method is only called once, inline it as a lambda or direct code rather than creating unnecessary abstraction.
# Before: Unnecessary abstraction
def _helper_method(self, data):
return data.process() if data else None
def main_method(self):
return self._helper_method(self.data) # Only call site
# After: Inline at call site
def main_method(self):
return self.data.process() if self.data else None
Rule: Use grep -r "method_name" --include="*.py" to verify single usage before inlining.
Mathematical Simplification Approach
Consolidate duplicate conditional logic using:
Ternary operators for simple conditions
Lookup tables for multiple discrete cases
Parameterized functions for complex logic
# Before: Duplicate conditional blocks
if use_recursive:
config = ResolutionConfig(provider, fallback_a)
else:
config = ResolutionConfig(provider, fallback_b)
# After: Unified expression
fallback = fallback_a if use_recursive else fallback_b
config = ResolutionConfig(provider, fallback)
Pattern Identification Guidelines
Duplicate Conditional Logic Patterns
Symptoms:
Repeated
if/elseblocks with identical structureSame method calls with different parameters
Similar error handling in multiple places
Detection:
# Look for repeated conditional patterns
grep -A 5 -B 5 "if.*:" file.py | grep -A 10 -B 10 "else:"
Example from lazy_config.py:
# BEFORE: Duplicate conditional logic
if use_recursive_resolution:
return ResolutionConfig(
instance_provider=instance_provider,
fallback_chain=fallback_chain or [static_fallback]
)
else:
return ResolutionConfig(
instance_provider=instance_provider,
fallback_chain=[safe_fallback, static_fallback]
)
# AFTER: Unified expression
final_fallback_chain = (fallback_chain or [static_fallback]) if use_recursive_resolution else [safe_fallback, static_fallback]
return ResolutionConfig(instance_provider=instance_provider, fallback_chain=final_fallback_chain)
Repeated Field Processing Patterns
Symptoms:
Multiple
for field in fields(dataclass)loopsSimilar field value extraction logic
Repeated field validation patterns
Example from lazy_config.py:
# BEFORE: Verbose field processing
field_values = {}
for field_obj in fields(dataclass_type):
if preserve_values:
field_values[field_obj.name] = getattr(source_config, field_obj.name)
else:
field_values[field_obj.name] = None
# AFTER: Concise comprehension
field_values = {f.name: getattr(source_config, f.name) if preserve_values else None for f in fields(dataclass_type)}
Duplicate Value Resolution Logic
Symptoms:
Repeated “try sources, return first non-None” patterns
Similar null checking and fallback logic
Multiple functions with identical resolution structure
Solution Pattern:
# Extract into reusable resolver
def _resolve_value_from_sources(field_name: str, *source_funcs):
"""Try multiple source functions, return first non-None value."""
for source_func in source_funcs:
try:
value = source_func(field_name)
if value is not None:
return value
except (AttributeError, Exception):
continue
return None
Method Proliferation Detection
Symptoms:
Many private methods with single call sites
Helper methods that are just wrappers
Excessive abstraction layers
Detection:
# Find single-use methods
for method in $(grep -o "def _[a-z_]*(" file.py | sed 's/def //; s/(//'); do
count=$(grep -c "$method" file.py)
if [ $count -eq 2 ]; then # Definition + single call
echo "Single-use method: $method"
fi
done
Solution Strategies
Extract Truly Reusable Patterns
Criteria for extraction:
Used in 3+ places
Represents a core algorithmic pattern
Provides meaningful abstraction
Example:
# Reusable pattern used throughout module
def _resolve_value_from_sources(field_name: str, *source_funcs):
"""Core resolution pattern used in multiple contexts."""
for source_func in source_funcs:
try:
value = source_func(field_name)
if value is not None:
return value
except (AttributeError, Exception):
continue
return None
Inline Single-Use Helpers
When to inline:
Method has only one call site
Method is just a simple wrapper
Inlining improves readability
Example from lazy_config.py:
# BEFORE: Unnecessary helper method
def _bind_methods_to_class(lazy_class, base_class, resolution_config):
method_bindings = {...}
for method_name, method_impl in method_bindings.items():
setattr(lazy_class, method_name, method_impl)
# Called only once
LazyDataclassFactory._bind_methods_to_class(lazy_class, base_class, resolution_config)
# AFTER: Inlined at call site
method_bindings = {...}
for method_name, method_impl in method_bindings.items():
setattr(lazy_class, method_name, method_impl)
Replace Repeated Conditional Blocks
Strategy:
Identify the varying parameters
Extract the common logic
Use ternary operators or lookup tables
Example:
# BEFORE: Repeated structure
if mode == 'edit':
processor = EditProcessor(config)
result = processor.process(data)
elif mode == 'view':
processor = ViewProcessor(config)
result = processor.process(data)
else:
processor = DefaultProcessor(config)
result = processor.process(data)
# AFTER: Lookup table approach
processor_map = {
'edit': EditProcessor,
'view': ViewProcessor,
}
processor_class = processor_map.get(mode, DefaultProcessor)
result = processor_class(config).process(data)
Consolidate Field Processing
Pattern:
# BEFORE: Verbose loops
result = {}
for field in fields(obj):
if condition:
result[field.name] = transform_a(field)
else:
result[field.name] = transform_b(field)
# AFTER: Concise comprehension
result = {f.name: transform_a(f) if condition else transform_b(f) for f in fields(obj)}
Move Inline Imports to Top-Level
Before:
def method():
from some.module import function # Inline import
return function(data)
After:
from some.module import function # Top-level import
def method():
return function(data)
Before/After Examples from lazy_config.py Refactoring
Scary Method Simplification
BEFORE (26 lines):
@staticmethod
def create_getattribute() -> Callable[[Any, str], Any]:
"""Create lazy __getattribute__ method."""
def __getattribute__(self: Any, name: str) -> Any:
value = object.__getattribute__(self, name)
if value is None and name in {f.name for f in fields(self.__class__)}:
# Check if this field has a lazy dataclass type
field_obj = next((f for f in fields(self.__class__) if f.name == name), None)
if field_obj:
field_type = field_obj.type
# Handle Optional[LazyType] by unwrapping
if hasattr(field_type, '__origin__') and field_type.__origin__ is Union:
args = getattr(field_type, '__args__', ())
if len(args) == 2 and type(None) in args:
field_type = args[0] if args[1] is type(None) else args[1]
# Check if field type is a lazy dataclass
if hasattr(field_type, '_resolve_field_value') or (
hasattr(field_type, '__name__') and field_type.__name__.startswith('Lazy')
):
# Create instance of lazy nested class
return field_type()
# Fall back to standard resolution for non-lazy fields
return self._resolve_field_value(name)
else:
return value
return __getattribute__
AFTER (6 lines):
@staticmethod
def create_getattribute() -> Callable[[Any, str], Any]:
"""Create lazy __getattribute__ method."""
def __getattribute__(self: Any, name: str) -> Any:
value = object.__getattribute__(self, name)
if value is None and name in {f.name for f in fields(self.__class__)}:
return self._resolve_field_value(name)
return value
return __getattribute__
Duplicate Conditional Logic Unification
BEFORE:
if use_recursive_resolution:
return ResolutionConfig(
instance_provider=instance_provider,
fallback_chain=fallback_chain or [static_fallback]
)
else:
safe_fallback = lambda field_name: _get_raw_field_value(instance_provider(), field_name) if instance_provider() else None
return ResolutionConfig(
instance_provider=instance_provider,
fallback_chain=[safe_fallback, static_fallback]
)
AFTER:
static_fallback = lambda field_name: _get_raw_field_value(base_class(), field_name)
safe_fallback = lambda field_name: _get_raw_field_value(instance_provider(), field_name) if instance_provider() else None
final_fallback_chain = (fallback_chain or [static_fallback]) if use_recursive_resolution else [safe_fallback, static_fallback]
return ResolutionConfig(instance_provider=instance_provider, fallback_chain=final_fallback_chain)
Single-Use Method Inlining
BEFORE:
# Method definition (23 lines)
def _create_unified_lazy_class(self, base_class, global_config_type, field_name, lazy_class_name, parent_field_path=None, parent_instance_provider=None):
full_field_path = f"{parent_field_path}.{field_name}" if parent_field_path else field_name
return LazyDataclassFactory.make_lazy_with_field_level_auto_hierarchy(
base_class=base_class,
global_config_type=global_config_type,
field_path=full_field_path,
lazy_class_name=lazy_class_name,
context_provider=lambda: parent_instance_provider() if parent_instance_provider else _get_current_config(global_config_type)
)
# Single call site
lazy_nested_type = LazyDataclassFactory._create_unified_lazy_class(
base_class=field.type,
global_config_type=global_config_type,
field_name=field.name,
lazy_class_name=f"Lazy{field.type.__name__}",
parent_field_path=parent_field_path,
parent_instance_provider=parent_instance_provider
)
AFTER:
# Inlined at call site
full_field_path = f"{parent_field_path}.{field.name}" if parent_field_path else field.name
lazy_nested_type = LazyDataclassFactory.make_lazy_with_field_level_auto_hierarchy(
base_class=field.type,
global_config_type=global_config_type,
field_path=full_field_path,
lazy_class_name=f"Lazy{field.type.__name__}",
context_provider=lambda: parent_instance_provider() if parent_instance_provider else _get_current_config(global_config_type)
)
Validation Criteria
Functionality Preservation
All tests must pass after refactoring
Integration tests verify end-to-end functionality
Unit tests confirm individual component behavior
Quantitative Improvements
Significant line count reduction (target: 15-25% reduction)
Reduced cyclomatic complexity
Fewer public methods in API surface area
Qualitative Improvements
Elimination of unnecessary abstraction layers
Cleaner, more readable code
Consistent patterns throughout module
Easier maintenance and debugging
OpenHCS Principles Compliance
Clean, terse, elegant code
Functional programming patterns where appropriate
Fail-loud behavior instead of defensive programming
Mathematical simplification over complex abstractions
Refactoring Workflow
Analyze: Identify duplicate patterns using grep and manual inspection
Extract: Create truly reusable helper functions for patterns used 3+ times
Inline: Remove single-use helper methods by inlining at call sites
Simplify: Replace duplicate conditional logic with unified expressions
Consolidate: Convert verbose loops to concise comprehensions
Test: Verify all functionality is preserved
Validate: Confirm quantitative and qualitative improvements
Tools and Commands
# Find duplicate patterns
grep -A 5 -B 5 "pattern" file.py
# Find single-use methods
grep -c "method_name" file.py
# Check line count reduction
wc -l file.py # Before and after
# Verify functionality
python -m pytest tests/
Case Study: lazy_config.py Refactoring Results
Quantitative Results:
File size reduced: 997 lines → 801 lines (20% reduction)
Eliminated code duplication: Consolidated duplicate conditional logic and field processing patterns
Simplified complex methods: Made scary, unreadable code clean and terse
Inlined single-use methods: Removed unnecessary abstraction layers per OpenHCS principles
Key Transformations:
15+ inline imports consolidated to top-level imports
Duplicate conditional logic unified into single expressions
Single-use private methods inlined at call sites
Complex type checking simplified to essential logic
Validation:
✅ All functionality preserved - lazy config resolution works correctly
✅ Integration tests pass - pipeline execution completes successfully
✅ API compatibility maintained - existing imports and usage unchanged
✅ OpenHCS principles followed - clean, terse, elegant code with minimal redundancy
Summary
Remember: The goal is mathematical simplification - treat code like algebraic expressions that can be factored, simplified, and optimized while preserving their essential behavior.
This approach transforms complex, duplicated code into clean, maintainable implementations that follow OpenHCS architectural principles while preserving all functionality.