Skip to content

Kreuzberg Documentation

Kreuzberg is a document intelligence platform with a high‑performance Rust core and native bindings for Python, TypeScript/Node.js, Ruby, Go, and Rust itself. Use it as an SDK, CLI, Docker image, REST API server, or MCP tool to extract text, tables, and metadata from 56 file formats (PDF, Office, images, HTML, XML, archives, email, and more) with optional OCR and post-processing pipelines.

What You Can Do

  • Single API across languages – Binding idioms follow each ecosystem, but features (extraction, OCR, chunking, embeddings, plugins) map 1:1.
  • Structured extraction – Convert PDFs, Office docs, images, emails, HTML, XML, and archives into clean Markdown/JSON, preserving tables and metadata.
  • Multi-engine OCR – Built-in Tesseract support everywhere, with EasyOCR and PaddleOCR extensions for Python.
  • Plugin ecosystem – Register post-processors, validators, OCR backends, and run them from any binding or via the CLI/API server.
  • Deployment flexibility – Ship as a library, run the CLI, or host the API server/MCP adapter inside containers.

Documentation Map

  • Getting Started – First extraction in each language.
  • Installation – Dependency matrix for Rust, Python, Ruby, Node.js, CLI, and Docker users.
  • Guides – How to configure extraction, OCR, advanced features, plugins, and Docker/API deployments.
  • Concepts – Architecture, extraction pipeline, MIME detection, plugin runtime, and performance strategies.
  • Features directory – Exhaustive capability list per format/binding plus OCR and chunking options.
  • Reference – Detailed API references (Python, TypeScript, Ruby, Rust), configuration schema, supported formats, types, and errors.
  • CLI – Command syntax, flags, exit codes, and automation tips.
  • API Server – Running the REST service and integrating with MCP.
  • Migration and Changelog – Track breaking changes and release history.

Supported Platforms

Binding / Interface Package Docs
Python pip install kreuzberg Python API Reference
TypeScript/Node.js npm install @kreuzberg/node TypeScript API Reference
Ruby gem install kreuzberg Ruby API Reference
Go go get github.com/Goldziher/kreuzberg/packages/go/kreuzberg@latest Go API Reference
Rust cargo add kreuzberg Rust API Reference
CLI brew install goldziher/tap/kreuzberg or cargo install kreuzberg-cli CLI Usage
API Server / MCP Docker image goldziher/kreuzberg:core API Server Guide

Getting Help

  • Questions / bugs: open an issue at github.com/Goldziher/kreuzberg.
  • Chat: join the community Discord (invite in README).
  • Contributing: see Contributing for coding standards, environment setup, and testing instructions.

Happy extracting!