Parser Metaprogramming System
Overview
Traditional filename parsers hardcode component names and validation logic. The metaprogramming system eliminates these assumptions by generating parser interfaces dynamically from component enums.
class CustomFilenameParser(GenericFilenameParser):
# FILENAME_COMPONENTS automatically set from AllComponents
pass
# Interface automatically generated with component-specific methods:
# - get_well_keys(), get_site_keys(), get_channel_keys()
# - validate_well(), validate_site(), validate_channel()
# - construct_filename(well=..., site=..., channel=...)
This enables the same parser framework to work with any component structure without manual method definitions.
Dynamic Method Generation
The system generates component-specific methods at runtime using metaclass programming. Each component in the configuration automatically gets validation and extraction methods.
def _generate_dynamic_methods(self):
"""Generate component-specific methods at runtime."""
for component in self.FILENAME_COMPONENTS:
if component != 'extension':
# Generate get_{component}_keys method
setattr(self, f'get_{component}_keys',
lambda filenames, comp=component: self._extract_component_keys(filenames, comp))
# Generate validate_{component} method
setattr(self, f'validate_{component}',
lambda value, comp=component: self._validate_component_value(value, comp))
This eliminates the need to manually define methods for each component type.
GenericFilenameParser Implementation
The GenericFilenameParser provides the base implementation that works with any component configuration.
Centralized Component Configuration
class GenericFilenameParser(ABC):
def __init__(self, component_enum: Type[T]):
"""Initialize with component enum - FILENAME_COMPONENTS set automatically."""
self.component_enum = component_enum
self.FILENAME_COMPONENTS = [c.value for c in component_enum] + ['extension']
self._generate_dynamic_methods()
Component-specific methods are generated automatically based on the component configuration.
Concrete Parser Implementation
Concrete parsers inherit from GenericFilenameParser and implement format-specific parsing.
class ImageXpressFilenameParser(FilenameParser):
"""Parser for ImageXpress filename format."""
# FILENAME_COMPONENTS automatically set to AllComponents + ['extension']
# by the FilenameParser.__init__() → GenericFilenameParser.__init__() chain
_pattern = re.compile(r'(?:.*?_)?([A-Z]\d+)(?:_s(\d+|\{[^\}]*\}))?(?:_w(\d+|\{[^\}]*\}))?(?:_z(\d+|\{[^\}]*\}))?(\.\w+)?$')
def parse_filename(self, filename: str) -> Optional[Dict[str, Any]]:
"""Parse ImageXpress filename, handling placeholders like {iii}."""
basename = Path(str(filename)).name
match = self._pattern.match(basename)
if match:
well, site_str, channel_str, z_str, ext = match.groups()
parse_comp = lambda s: None if not s or '{' in s else int(s)
return {
'well': well,
'site': parse_comp(site_str),
'channel': parse_comp(channel_str),
'z_index': parse_comp(z_str),
'extension': ext if ext else '.tif'
}
return None
The parser automatically gets component-specific methods like get_well_keys(), validate_site(), etc.
Multiprocessing Compatibility
Dynamic methods are handled specially for multiprocessing serialization.
def __getstate__(self):
"""Remove dynamic methods for pickling."""
state = self.__dict__.copy()
# Remove unpicklable dynamic methods
for component in self.component_enum:
method_name = f"validate_{component.value}"
if method_name in state:
del state[method_name]
return state
def __setstate__(self, state):
"""Regenerate dynamic methods after unpickling."""
self.__dict__.update(state)
self._generate_dynamic_methods()
This ensures parser objects work correctly in ProcessPoolExecutor environments.
Common Gotchas:
Dynamic methods are regenerated after unpickling - don’t rely on method identity
Component enum must be importable in worker processes
Parser state should be minimal for efficient serialization