Python Object Reference Documentation Methodology

Perpetual documentation accuracy through Python object references and implementation-priming prose.

Status: CANONICAL Applies to: All OpenHCS documentation with code examples

Overview

Traditional documentation suffers from the “documentation drift” problem - code examples become outdated as implementation evolves. OpenHCS eliminates this through systematic use of Python object references (:py:meth:, :py:func:, :py:class:) combined with clear, concise prose that primes readers with implementation context.

This methodology ensures documentation remains perpetually accurate by making Sphinx automatically validate object references, while providing readers with the mental framework to understand code before they see it.

Core Principles

Python Object References as Single Source of Truth

Principle: Documentation uses Python object references that Sphinx automatically validates against the actual codebase.

Implementation:

The core entry point :py:meth:`~openhcs.core.lazy_config.LazyDataclassFactory.make_lazy_with_field_level_auto_hierarchy`
works by creating a temporary lazy dataclass instance, then asking that instance to resolve the specific field value.

Rationale: When implementation changes, Sphinx build fails if referenced objects don’t exist. No manual synchronization required, with automatic validation.

Implementation-Priming Prose

Principle: Prose explains how methods work conceptually before readers encounter the actual code, building intuition for easy code comprehension.

Implementation Pattern:

:py:meth:`~openhcs.core.lazy_placeholder._resolve_field_with_composition_awareness` works like a smart field finder.
Given a dataclass instance and a field name, it first checks if the field exists directly on the instance
(using `dataclasses.fields()`). If found, it gets the value using `getattr()`. If not found, it loops through
all fields looking for nested dataclasses, then recursively searches inside each one.

Rationale: Readers understand the implementation strategy before seeing code, making the actual implementation immediately comprehensible.

Architectural Accuracy Over Syntactic Accuracy

Principle: Documentation should reflect architectural patterns and design decisions, with Python object references automatically tracking method evolution.

Example: Documentation describing lazy structure preservation functionality uses :py:func:`~openhcs.core.lazy_config.rebuild_lazy_config_with_new_global_reference` - if method is renamed, Sphinx build fails until reference is updated.

Implementation Strategy:

Use Python object references that track refactoring automatically
Sphinx validates object existence during build
Preserve architectural explanations with validated references
Build failures force documentation updates during refactoring

Logical Method Decomposition with Implementation Context

Principle: Large methods that serve multiple conceptual purposes should be documented with clear prose explaining each logical phase, using Python object references for validation.

Documentation Approach:

Auto-Discovery Phase
~~~~~~~~~~~~~~~~~~~~

The simplified contextvars system uses explicit context management to determine configuration
hierarchy, eliminating the need for complex field path detection and auto-discovery mechanisms.

Context Detection Phase
~~~~~~~~~~~~~~~~~~~~~~~

:py:func:`~openhcs.core.lazy_config._get_current_config` checks if we're running in a PyQt application
context where thread-local storage should be available. If context is missing in a GUI environment,
it logs an architecture warning since this indicates a context management bug.

Hierarchy Building Phase
~~~~~~~~~~~~~~~~~~~~~~~~

The system constructs a resolution hierarchy by combining the current field path with discovered parent
relationships. :py:meth:`~openhcs.core.lazy_config.LazyDataclassFactory._create_field_level_hierarchy_provider`
builds a chain where each level can inherit from the next, creating the step → pipeline → global resolution flow.

Rationale: Prose explains the conceptual purpose of each phase, while Python object references ensure the described functionality actually exists and can be validated by Sphinx.

Audit Methodology

Phase 1: Python Object Reference Identification

Objective: Catalog all code references in documentation and convert to validated Python object references.

Process:

Scan for code references: Find all manual code examples, method names, and class references
Classify references:
- Implementation references: Method signatures, class definitions → :py:meth:, :py:class:
- Function references: Standalone functions → :py:func:
- Attribute references: Class attributes, constants → :py:attr:
- Conceptual examples: Pseudo-code illustrating patterns (keep as-is)
Create mapping document: Current reference, target Python object, validation status

Tools:

# Find all code blocks in documentation
find docs/ -name "*.rst" -exec grep -l "code-block:: python" {} \;

# Find manual method references that should be Python objects
grep -r "def [a-zA-Z_]" docs/source/architecture/ | grep -v ":py:"

# Find class references that should be Python objects
grep -r "class [A-Z]" docs/source/architecture/ | grep -v ":py:"

Phase 2: Python Object Validation

Objective: Verify each Python object reference points to actual, existing implementation.

Process:

Object existence check: Sphinx automatically validates Python object references during build
Import verification: Confirm referenced modules can be imported
Signature validation: Verify method signatures match expectations
Phantom reference detection: Sphinx build fails for non-existent objects

Validation Benefits:

Automatic validation: Sphinx validates all :py: references during build
Import checking: References fail if modules can’t be imported
Refactoring safety: Build fails when referenced objects are renamed/moved
No manual verification: Sphinx handles all validation automatically

Build Integration:

# Sphinx automatically validates Python object references
sphinx-build -b html docs/source docs/build -W --keep-going

# -W treats warnings as errors (including invalid references)
# --keep-going shows all invalid references at once

Phase 3: Systematic Replacement with Implementation Context

Objective: Replace code examples with Python object references and implementation-priming prose.

Process:

High priority first: Core implementation methods, class definitions
Add implementation context: Write clear prose explaining how each method works
Batch replacement: Group related examples for efficient processing
Build verification: Test documentation build after each batch

Replacement Template:

{Clear prose explaining what the method does and how it works conceptually}

:py:meth:`~module.path.ClassName.method_name` {additional context about the implementation approach}.

{Brief explanation of why this approach was chosen or what problem it solves}

Example Implementation:

The service uses :py:func:`~openhcs.core.lazy_placeholder._resolve_field_with_composition_awareness`
to find field values. This function first checks if the field exists directly on the dataclass
(like `num_workers` on `PipelineConfig`). If not found, it recursively searches through nested
dataclasses (like looking for `output_dir_suffix` inside `materialization_defaults`).

Phase 4: Perpetual Maintenance with Automatic Validation

Objective: Maintain accuracy as codebase evolves with automatic validation.

Process:

Build integration: Documentation build fails if Python object references are invalid
Refactoring protocol: Sphinx automatically detects renamed/moved methods and fails build
Review integration: Code reviews catch documentation build failures from invalid references
Automated verification: CI enforces that all Python object references are valid

Automatic Maintenance Benefits:

Zero manual tracking: Sphinx handles all reference validation
Immediate feedback: Build fails instantly when references become invalid
Refactoring safety: Impossible to forget updating documentation during refactoring
Cross-reference accuracy: All internal links automatically validated

OpenHCS-Specific Implementation Guidelines

Fail-Loud Documentation with Implementation Context

Principle: Documentation build should fail immediately when Python object references become invalid, while providing clear implementation context.

Implementation: Use Sphinx’s strict mode with Python object references and implementation-priming prose.

# Fail-loud with implementation context
:py:meth:`~openhcs.core.lazy_config.LazyDataclassFactory._create_lazy_dataclass_unified`
works like a dataclass compiler. It takes a regular dataclass definition and generates a new
class with the same fields and interface, but replaces the field access behavior.

Mathematical Simplification Applied to Documentation

Principle: Eliminate duplicate explanations by referencing single implementation source with consistent prose patterns.

Before:

# Multiple explanations of the same concept
Configuration Resolution (in config.rst)
Lazy Field Resolution (in lazy_config.rst)
Thread-Local Context (in context.rst)

After:

# Single implementation with multiple documentation perspectives
:py:func:`~openhcs.core.lazy_config._resolve_value_from_sources` implements the core
resolution logic by trying each source in the fallback chain until one returns a non-None value.
(Referenced consistently across config.rst, lazy_config.rst, and context.rst)

Architectural Coherence with Implementation Context

Principle: Document architectural patterns and design decisions with clear prose that explains implementation approach.

Focus Areas:

Why code is structured a certain way (architectural rationale)
How methods work conceptually (implementation approach)
When to use specific approaches (usage context)
What trade-offs were made (design decisions)

Implementation Context Pattern:

# Explain HOW the method works before referencing it
:py:meth:`~openhcs.core.lazy_placeholder._resolve_field_with_composition_awareness`
works like a smart field finder. Given a dataclass instance and a field name, it first
checks if the field exists directly on the instance. If not found, it loops through all
fields looking for nested dataclasses, then recursively searches inside each one.

Avoid:

Python object references without implementation context
Architectural explanations without concrete method references
Implementation details without architectural rationale

Breadth-First Documentation Structure

Principle: Organize documentation from architectural concepts to implementation details.

Structure:

Architectural overview: Why the system exists, what problems it solves
Core patterns: Key design patterns and their rationale
Implementation examples: Literal includes showing actual code
Usage patterns: How to use the implemented functionality
Integration details: How components work together

Advanced Techniques

Phantom Method Resolution

Problem: Documentation references methods that were renamed or refactored during development.

Solution: Update Python object references to point to actual implementation with clear implementation context.

Process:

Sphinx validation: Build fails automatically for non-existent method references
Functionality analysis: Determine what the phantom method was supposed to do
Implementation mapping: Find actual code that performs the same functionality
Reference update: Update Python object reference to actual method

Example:

# Old documentation with phantom method reference
:py:meth:`~openhcs.core.lazy_config._preserve_lazy_structure_if_needed`

# Updated with actual implementation and context
:py:func:`~openhcs.core.lazy_config.rebuild_lazy_config_with_new_global_reference`
preserves lazy structure by creating a new lazy config instance that maintains the same
field resolution behavior but uses an updated global reference for context resolution.

Logical Method Decomposition with Implementation Context

Problem: Large architectural methods serve multiple conceptual purposes but cannot be split for architectural reasons.

Solution: Document logical phases with clear prose explaining each phase, using Python object references for validation.

Technique:

Complex Provider Function
~~~~~~~~~~~~~~~~~~~~~~~~~

:py:meth:`~openhcs.core.lazy_config.LazyDataclassFactory.make_lazy_simple`
creates lazy dataclasses using the simplified contextvars system:

**Simplified Approach**

The new system uses explicit context management through Python's contextvars module,
eliminating complex auto-discovery and context detection mechanisms while maintaining
full inheritance functionality.

The system constructs a resolution hierarchy by combining the current field path with
discovered parent relationships using :py:func:`~openhcs.core.lazy_config._create_hierarchy_chain`.

**Phase 4: Field Resolution**

.. literalinclude:: ../../../openhcs/core/lazy_config.py
   :lines: 440-461
   :caption: Field-level inheritance resolution

Architectural Pattern Documentation

Principle: Document the architectural reasoning behind implementation choices.

Template:

Pattern: {Pattern Name}
~~~~~~~~~~~~~~~~~~~~~~~

**Problem**: {What architectural problem does this solve?}

**Solution**: {How does the implementation address the problem?}

**Implementation**:

.. literalinclude:: ../../../{source_file}
   :lines: {start}-{end}
   :caption: {Pattern implementation}

**Rationale**: {Why this approach over alternatives?}

**Trade-offs**: {What are the costs and benefits?}

Cross-Reference Accuracy

Problem: Documentation cross-references become stale when methods are renamed or moved.

Solution: Use literal includes for cross-references to maintain accuracy.

Implementation:

# Instead of manual cross-reference
See the `_create_lazy_dataclass` method for implementation details.

# Use literal include reference
The lazy dataclass creation process:

.. literalinclude:: ../../../openhcs/core/lazy_config.py
   :lines: 253-299
   :caption: Core lazy dataclass creation (_create_lazy_dataclass_unified)

Integration with OpenHCS Development Workflow

Code Review Protocol

Requirement: All code changes that affect documented functionality must update corresponding literal includes.

Process:

Identify affected documentation: Which docs reference the changed code?
Verify line numbers: Do literal includes still point to correct functionality?
Update captions: Do descriptions still accurately reflect the code?
Test documentation build: Ensure all literal includes resolve correctly

Refactoring Safety Net

Principle: Documentation serves as a safety net during refactoring by exposing all usage patterns.

Benefits:

Visibility: See all places where code is referenced
Impact assessment: Understand documentation implications of changes
Architectural coherence: Ensure refactoring preserves documented patterns
Regression prevention: Documentation build fails if refactoring breaks examples

Continuous Integration Integration

Requirements:

# Documentation verification in CI
documentation_check:
  - verify_literal_includes_exist
  - verify_line_ranges_contain_expected_content
  - build_documentation_strict_mode
  - check_cross_reference_accuracy

Failure Modes:

Missing files: Literal include references non-existent file
Invalid line ranges: Line numbers exceed file length
Empty ranges: Line range contains no code
Functionality mismatch: Code at line range doesn’t match description

Quality Metrics

Quantitative Metrics

Documentation Accuracy Rate:

Accuracy = (Valid Literal Includes / Total Code Examples) × 100
Target: >95% for core architecture docs

Implementation Coverage:

Coverage = (Documented Public Methods / Total Public Methods) × 100
Target: >80% for core modules

Maintenance Efficiency:

Efficiency = Development Time / Documentation Update Time
Target: <5% overhead

Qualitative Indicators

Developer Experience:

Developers trust documentation examples to work
Code reviews catch documentation inconsistencies
Refactoring confidence increases due to documentation safety net

Architectural Clarity:

Design decisions are clearly explained and justified
Implementation patterns are consistently documented
Complex logic is broken down into understandable sections

Codebase Health:

Documentation pressure improves code quality
Architectural patterns become more consistent
Complex methods are naturally decomposed for documentability

Success Metrics

Quantitative Measures

Python object reference coverage: Percentage of code references using :py: directives vs manual examples
Build failure rate: Frequency of documentation builds failing due to invalid object references
Implementation context coverage: Percentage of Python object references with explanatory prose

Qualitative Measures

Documentation accuracy: Alignment between documented references and actual implementation
Developer comprehension: Ease of understanding code after reading implementation context
Onboarding effectiveness: New developer ability to navigate codebase using documentation

Target State

100% Python object reference coverage for all implementation references
Zero tolerance for manual code examples that duplicate implementation
Automatic validation integrated into CI/CD pipeline with Sphinx strict mode
Implementation-priming prose for all Python object references
Fail-fast feedback when implementation changes break documentation

Example Quality Standard

Excellent Documentation Pattern:

The simplified placeholder service uses the new contextvars-based resolution system
to find field values. The system uses explicit context management and cross-dataclass inheritance
through the dual-axis resolver, eliminating the need for complex composition awareness mechanisms.

Benefits: Validated reference + clear implementation context + architectural rationale + concrete examples.

Benefits

For Developers

Guaranteed accuracy: Code examples always reflect current implementation
Reduced maintenance: No manual synchronization of code and documentation
Architectural insight: Documentation explains design decisions, not just syntax
Refactoring safety: Documentation automatically updates with code changes

For Users

Reliable examples: All code examples are guaranteed to work
Current information: Documentation never lags behind implementation
Architectural understanding: Learn not just how, but why
Consistent patterns: Same implementation referenced across multiple contexts

For Codebase Health

Perpetual audit: Documentation serves as continuous code review
Architectural documentation: Forces clear explanation of design decisions
Implementation visibility: Complex logic must be documentable to be maintainable
Quality pressure: Poor code becomes obvious when documented

Implementation Checklist

[ ] Catalog all code examples in documentation
[ ] Verify each example against current implementation
[ ] Create systematic mapping of examples to source code
[ ] Replace examples with literal includes (high priority first)
[ ] Integrate literal include verification into CI pipeline
[ ] Establish refactoring protocol for updating documentation
[ ] Document architectural patterns and design rationale
[ ] Set up automated accuracy metrics and monitoring

This methodology transforms documentation from a maintenance burden into a perpetual code audit system that ensures architectural coherence and implementation accuracy.