CLI Usage¶
The Kreuzberg CLI provides a convenient command-line interface for text extraction from documents.
Installation¶
Install Kreuzberg with CLI support:
Or install all optional dependencies:
Basic Usage¶
Extract from a file¶
Extract to a file¶
Extract from stdin¶
Command-Line Options¶
General Options¶
-o, --output PATH
: Output file path (default: stdout)--output-format [text|json|markdown|tsv|hocr]
: Output format for extraction--show-metadata
: Include metadata in output-v, --verbose
: Verbose output for debugging
Processing Options¶
--force-ocr
: Force OCR processing--chunk-content
: Enable content chunking--extract-tables
: Enable table extraction--enable-table-detection
: Enable table extraction from scanned documents (with TSV format)--max-chars INTEGER
: Maximum characters per chunk (default: 2000)--max-overlap INTEGER
: Maximum overlap between chunks (default: 100)
OCR Backend Options¶
--ocr-backend [tesseract|easyocr|paddleocr|none]
: OCR backend to use
Tesseract Options¶
--tesseract-lang TEXT
: Language(s) (e.g., 'eng+deu')--tesseract-psm INTEGER
: PSM mode (0-13)--tesseract-output-format [text|markdown|tsv|hocr]
: OCR output format (default: markdown)
EasyOCR Options¶
--easyocr-languages TEXT
: Language codes (comma-separated, e.g., 'en,de')
PaddleOCR Options¶
--paddleocr-languages TEXT
: Language codes (comma-separated, e.g., 'en,german')
Configuration File¶
Kreuzberg can load configuration from a pyproject.toml
file:
Use a specific config file:
OCR Output Format Examples¶
Extract tables from scanned documents¶
Fast text extraction¶
Get structured markdown output¶
Extract with position information¶
Examples¶
Basic text extraction¶
OCR with specific language¶
Extract tables to JSON¶
Extract with metadata¶
Using EasyOCR backend¶
Extract with chunking¶
Module Execution¶
You can also run Kreuzberg as a Python module:
Command Reference¶
kreuzberg extract
¶
Extract text from a document.
Usage:
Arguments:
FILE
: Path to document or '-' for stdin (optional, defaults to stdin)
kreuzberg config
¶
Show current configuration.
Usage:
Options:
--config PATH
: Configuration file path
kreuzberg --version
¶
Show version information.
Error Handling¶
The CLI provides clear error messages:
- Exit code 0: Success
- Exit code 1: General error (parsing, validation, etc.)
- Exit code 2: Missing dependency error
Use --verbose
for detailed error information and stack traces.