Kreuzberg Documentation¶
Kreuzberg is a document intelligence platform with a high‑performance Rust core and native bindings for Python, TypeScript/Node.js, Ruby, and Rust itself. Use it as an SDK, CLI, Docker image, REST API server, or MCP tool to extract text, tables, and metadata from 50+ document families (PDF, Office, images, HTML, XML, archives, email, and more) with optional OCR and post-processing pipelines.
What You Can Do¶
- Single API across languages – Binding idioms follow each ecosystem, but features (extraction, OCR, chunking, embeddings, plugins) map 1:1.
- Structured extraction – Convert PDFs, Office docs, images, emails, HTML, XML, and archives into clean Markdown/JSON, preserving tables and metadata.
- Multi-engine OCR – Built-in Tesseract support everywhere, with EasyOCR and PaddleOCR extensions for Python.
- Plugin ecosystem – Register post-processors, validators, OCR backends, and run them from any binding or via the CLI/API server.
- Deployment flexibility – Ship as a library, run the CLI, or host the API server/MCP adapter inside containers.
Documentation Map¶
- Getting Started – First extraction in each language.
- Installation – Dependency matrix for Rust, Python, Ruby, Node.js, CLI, and Docker users.
- Guides – How to configure extraction, OCR, advanced features, plugins, and Docker/API deployments.
- Concepts – Architecture, extraction pipeline, MIME detection, plugin runtime, and performance strategies.
- Features directory – Exhaustive capability list per format/binding plus OCR and chunking options.
- Reference – Detailed API references (Python, TypeScript, Ruby, Rust), configuration schema, supported formats, types, and errors.
- CLI – Command syntax, flags, exit codes, and automation tips.
- API Server – Running the REST service and integrating with MCP.
- Migration and Changelog – Track breaking changes and release history.
Supported Platforms¶
| Binding / Interface | Package | Docs |
|---|---|---|
| Python | pip install kreuzberg | Python API Reference |
| TypeScript/Node.js | npm install @goldziher/kreuzberg | TypeScript API Reference |
| Ruby | gem install kreuzberg | Ruby API Reference |
| Rust | cargo add kreuzberg | Rust API Reference |
| CLI | brew install goldziher/tap/kreuzberg or cargo install kreuzberg-cli | CLI Usage |
| API Server / MCP | Docker image goldziher/kreuzberg:core | API Server Guide |
Getting Help¶
- Questions / bugs: open an issue at github.com/Goldziher/kreuzberg.
- Chat: join the community Discord (invite in README).
- Contributing: see Contributing for coding standards, environment setup, and testing instructions.
Happy extracting!