Extractor Registry¶
The ExtractorRegistry
manages document extractors and allows custom extractor registration.
kreuzberg.ExtractorRegistry
¶
Manages extractors for different MIME types and their configurations.
This class provides functionality to register, unregister, and retrieve extractors based on MIME types. It supports both synchronous and asynchronous operations for managing extractors. A default set of extractors is also maintained alongside user-registered extractors.
Source code in kreuzberg/_registry.py
Functions¶
add_extractor(extractor: type[Extractor]) -> None
classmethod
¶
Add an extractor to the registry.
Note
Extractors are tried in the order they are added: first added, first tried.
PARAMETER | DESCRIPTION |
---|---|
extractor | The extractor to add. TYPE: |
RETURNS | DESCRIPTION |
---|---|
None | None |
Source code in kreuzberg/_registry.py
get_extractor(mime_type: str | None, config: ExtractionConfig) -> Extractor | None
cached
classmethod
¶
Gets the extractor for the mimetype.
PARAMETER | DESCRIPTION |
---|---|
mime_type | The mime type of the content. TYPE: |
config | Extraction options object, defaults to the default object. TYPE: |
RETURNS | DESCRIPTION |
---|---|
Extractor | None | The extractor |
Source code in kreuzberg/_registry.py
remove_extractor(extractor: type[Extractor]) -> None
classmethod
¶
Remove an extractor from the registry.
PARAMETER | DESCRIPTION |
---|---|
extractor | The extractor to remove. TYPE: |
RETURNS | DESCRIPTION |
---|---|
None | None |