Kreuzberg Documentation¶
Kreuzberg is a document intelligence platform with a high‑performance Rust core and native bindings for Python, TypeScript/Node.js, Ruby, Go, and Rust itself. Use it as an SDK, CLI, Docker image, REST API server, or MCP tool to extract text, tables, and metadata from 56 file formats (PDF, Office, images, HTML, XML, archives, email, and more) with optional OCR and post-processing pipelines.
What You Can Do¶
- Single API across languages – Binding idioms follow each ecosystem, but features (extraction, OCR, chunking, embeddings, plugins) map 1:1.
- Structured extraction – Convert PDFs, Office docs, images, emails, HTML, XML, and archives into clean Markdown/JSON, preserving tables and metadata.
- Multi-engine OCR – Built-in Tesseract support everywhere, with EasyOCR and PaddleOCR extensions for Python.
- Plugin ecosystem – Register post-processors, validators, OCR backends, and run them from any binding or via the CLI/API server.
- Deployment flexibility – Ship as a library, run the CLI, or host the API server/MCP adapter inside containers.
Documentation Map¶
- Getting Started – First extraction in each language.
- Installation – Dependency matrix for Rust, Python, Ruby, Node.js, CLI, and Docker users.
- Guides – How to configure extraction, OCR, advanced features, plugins, and Docker/API deployments.
- Concepts – Architecture, extraction pipeline, MIME detection, plugin runtime, and performance strategies.
- Features directory – Exhaustive capability list per format/binding plus OCR and chunking options.
- Reference – Detailed API references (Python, TypeScript, Ruby, Rust), configuration schema, supported formats, types, and errors.
- CLI – Command syntax, flags, exit codes, and automation tips.
- API Server – Running the REST service and integrating with MCP.
- Migration and Changelog – Track breaking changes and release history.
Supported Platforms¶
| Binding / Interface | Package | Docs |
|---|---|---|
| Python | pip install kreuzberg | Python API Reference |
| TypeScript/Node.js | npm install @kreuzberg/node | TypeScript API Reference |
| Ruby | gem install kreuzberg | Ruby API Reference |
| Go | go get github.com/Goldziher/kreuzberg/packages/go/kreuzberg@latest | Go API Reference |
| Rust | cargo add kreuzberg | Rust API Reference |
| CLI | brew install goldziher/tap/kreuzberg or cargo install kreuzberg-cli | CLI Usage |
| API Server / MCP | Docker image goldziher/kreuzberg:core | API Server Guide |
Getting Help¶
- Questions / bugs: open an issue at github.com/Goldziher/kreuzberg.
- Chat: join the community Discord (invite in README).
- Contributing: see Contributing for coding standards, environment setup, and testing instructions.
Happy extracting!