Kreuzberg

The fastest document intelligence framework for RAG developers

One API 56 file
formats

Extract structured knowledge from documents in milliseconds, built for modern AI pipelines. Reduce
operational complexity while maintaining consistent, high quality results.

pdf
file
xlsx
file
xlsm
file
xlsb
file
xls
file
xlam
file
xla
file
ods
file
pptx
file
pptm
file
ppsx
file
ppt
file
docx
file
doc
file
odt
file
txt
file
md
file
markdown
file
html
file
htm
file
xml
file
svg
file
rst
file
org
file
rtf
file
json
file
yaml
file
toml
file
csv
file
tsv
file
eml
file
msg
file
png
file
jpg
file
jpeg
file
webp
file
bmp
file
tiff
file
tif
file
gif
file
jp2
file
jpx
file
jpm
file
mj2
file
pnm
file
pbm
file
pgm
file
ppm
file
zip
file
tar
file
tgz
file
7z
file
gz
file
tex
file
latex
file
epub
file
Open source

Self-host

Free forever! Open source.

  • Full control over your infrastructure
  • Local development and experimentation
  • Use our Docker images, install in one of 9 programming languages or install our CLI
  • MIT license forever
Join the waitlist and get 10,000 free pages in credits!

Fully managed

High-performance, scalable production ready in a few clicks.

Stay updated and get early access when Kreuzberg Cloud launches.

email

How It Works?

Simple three-step flow to document intelligence:

1

Upload

Upload documents via our API or dedicated SDKs. Supports PDFs, images, DOCX, PPTX, and many other formats.

2

Process

Kreuzberg Cloud extracts text, tables, images, and semantic structure. Results are cached for re-processing without re-extraction cost.

3

Integrate

JSON response with full document structure. Webhook delivery for async workflows. Plug directly into your embeddings pipeline or RAG framework.

Why Kreuzberg?

Speed That Unblocks Your Team

Speed That Unblocks Your Team

Process documents in milliseconds instead of seconds! Your RAG pipeline moves at the speed of API calls, not extraction bottlenecks. Index millions of documents without waiting weeks for processing to complete.

Batch-Processing

Batch-Processing

Effectively process large number of documents in bulk. Kreuzberg is built for batch processing, and our cloud infrastructure is designed to scale.

Built for AI Teams

Built for AI Teams

Kreuzberg is a full toolbox - text extraction, metadata extraction, NER, embedding and chunking, all in a CPU optimized binary

Polyglot and multiplafrom

Polyglot and multiplafrom

Get native performance in the language of your choice. Kreuzberg is written in Rust and is shipped for eight other progamming languages. It support Linux, MacOS and Windows runtimes.

Read the full Technical Overview on GitHub

Join thousands of developers already building document
intelligence pipelines using Kreuzberg - in their language of choice!

Rust

Rust

Python

Python

TypeScript

TypeScript

PHP

PHP

JavaScript

JavaScript

Ruby

Ruby

Elixir

Elixir

Go

Go

C#

C#

Rust

Rust

Python

Python

TypeScript

TypeScript

PHP

PHP

JavaScript

JavaScript

Ruby

Ruby

Elixir

Elixir

Go

Go

C#

C#

Frequently Asked Questions

How fast is 'fast'?
Kreuzberg is built on a high-performance Rust core, so most documents are processed almost instantly- in milliseconds instead of seconds. For bulk jobs that's thousands of pages per hour on a single API key.
What file types do you support?
PDFs (native and scanned), images (JPG, PNG), Microsoft Office (DOCX, PPTX, XLSX), web content, and plain text. We detect document type automatically and optimize extraction for each format.
Do you handle scanned documents?
Yes. Built-in OCR recognizes text in images and scanned PDFs. No additional configuration needed—just send the file and get structured output back.
What happens to my documents?
Documents are processed in memory and deleted immediately after extraction. No storage, no indexing. We don't train on your data or use it for model improvement.
Will Kreuzberg remain MIT license?
Yes! There is no BSL (Business Source License) in Kreuzberg's future. The library will remain MIT-licensed forever. We're building the commercial offering around the core library, not by restricting the library itself.
I already use your open-source library with good results. Why should I try Kreuzberg cloud?
The open-source engine is fully usable and powerful on its own. Kreuzberg Cloud removes the operational complexity, so you can run it in production without worrying about managing infrastructure.