CLI Usage¶
The Kreuzberg CLI provides a convenient command-line interface for text extraction from documents.
Installation¶
Install Kreuzberg with CLI support:
Or install all optional dependencies:
Basic Usage¶
Extract from a file¶
Extract to a file¶
Extract from stdin¶
Command-Line Options¶
General Options¶
-o, --output PATH
: Output file path (default: stdout)--output-format [text|json]
: Output format--show-metadata
: Include metadata in output-v, --verbose
: Verbose output for debugging
Processing Options¶
--force-ocr
: Force OCR processing--chunk-content
: Enable content chunking--extract-tables
: Enable table extraction--max-chars INTEGER
: Maximum characters per chunk (default: 2000)--max-overlap INTEGER
: Maximum overlap between chunks (default: 100)
OCR Backend Options¶
--ocr-backend [tesseract|easyocr|paddleocr|none]
: OCR backend to use
Tesseract Options¶
--tesseract-lang TEXT
: Language(s) (e.g., 'eng+deu')--tesseract-psm INTEGER
: PSM mode (0-13)
EasyOCR Options¶
--easyocr-languages TEXT
: Language codes (comma-separated, e.g., 'en,de')
PaddleOCR Options¶
--paddleocr-languages TEXT
: Language codes (comma-separated, e.g., 'en,german')
Configuration File¶
Kreuzberg can load configuration from a pyproject.toml
file:
Use a specific config file:
Examples¶
Basic text extraction¶
OCR with specific language¶
Extract tables to JSON¶
Extract with metadata¶
Using EasyOCR backend¶
Extract with chunking¶
Module Execution¶
You can also run Kreuzberg as a Python module:
Command Reference¶
kreuzberg extract
¶
Extract text from a document.
Usage:
Arguments:
FILE
: Path to document or '-' for stdin (optional, defaults to stdin)
kreuzberg config
¶
Show current configuration.
Usage:
Options:
--config PATH
: Configuration file path
kreuzberg --version
¶
Show version information.
Error Handling¶
The CLI provides clear error messages:
- Exit code 0: Success
- Exit code 1: General error (parsing, validation, etc.)
- Exit code 2: Missing dependency error
Use --verbose
for detailed error information and stack traces.