Document intelligence for AI engineering workflows.

Extract structured, machine-readable content from any document and feed it directly into AI agents, pipelines, and applications.

1

One API

97

file formats

9

SDKs

305

Programming languages.

Upload

Drop a file here

Or click to browse

• Limited to 1 page per document
• File size capped at small uploads (under 1MB)
• Limited to 10 demo requests per IP

Why Kreuzberg?

Built for AI engineering workflows

Speed That Unblocks Your Team

Speed That
Unblocks Your Team

Process documents in milliseconds instead of seconds! Your RAG pipeline moves at the speed of API calls, not extraction bottlenecks. Index millions of documents without waiting weeks for processing to complete.

Batch-Processing at Scale

Batch-Processing at
Scale

Effectively process large number of documents in bulk. Kreuzberg is built for batch processing, and our cloud infrastructure is designed to scale.

LLM-Powered Intelligence

LLM-Powered Intelligence

Go beyond extraction. Use vision language models as an OCR backend, extract structured JSON from documents using a schema, and generate embeddings - all via 146 LLM providers, including local models with zero API key configuration.

Built for AI Teams

Built for AI
Teams

Kreuzberg is a full toolbox - text extraction, metadata extraction, NER, embedding and chunking, all in a CPU optimized binary

Code Intelligence

Code
Intelligence

Extract functions, classes, imports, and symbols from code files across 305 programming languages. Structured output, ready for semantic chunking and RAG pipelines.

Polyglot and multiplatform

Polyglot and multiplatform

Get native performance in the language of your choice. Kreuzberg is written in Rust and is shipped for 11 other programming languages. It supports Linux, MacOS and Windows runtimes.

How it works

Three steps. One API

01

Upload

Upload documents via our API or dedicated SDKs. Supports PDFs, images, DOCX, PPTX, and many other formats.

02

Process

Kreuzberg processes text, tables, images, and metadata with high accuracy. Optionally extract structured JSON using a schema, run VLM OCR for complex layouts, or generate embeddings - all in one step.

03

Integrate

JSON response with full document structure. Webhook delivery for async workflows. Plug directly into your embeddings pipeline or RAG framework.

SDKs

Native, in your language

Join thousands of developers already building document intelligence pipelines using Kreuzberg - in their language of choice!

Rust

Rust

Python

Python

TypeScript

TypeScript

PHP

PHP

JavaScript

JavaScript

Ruby

Ruby

Elixir

Elixir

Go

Go

C#

C#

Node.js

Node.js

R

R

WASM

WASM

Rust

Rust

Python

Python

TypeScript

TypeScript

PHP

PHP

JavaScript

JavaScript

Ruby

Ruby

Elixir

Elixir

Go

Go

C#

C#

Node.js

Node.js

R

R

WASM

WASM

Pricing

Pay only for what you extract - no seats, no minimums

Open Source

Self Host using Kreuzberg Open Source

Free

Check

Full control over your infrastructure

Check

Local development and experimentation

Check

Use our Docker images, install in one of 12 programming languages or install our CLI

Cloud · Pay-as-you-go

Production-ready extraction, managed by us.

$0.008/page

Check

First 10,000 pages free

Check

92 file formats, 305 code formats

Check

Images and scanned PDFs supported

Check

OCR, layout detection, table extraction

Check

No monthly minimum

Get started instantly, no card required

Try it For Free!

High volume

100K+ pages a month? Let's talk pricing.

Custom/page

Check

Everything from the Pay as you go plan

Check

Discounted per-page rate on the cloud

Frequently Asked Questions

How fast is 'fast'?
Kreuzberg is built on a high-performance Rust core, so most documents are processed almost instantly- in milliseconds instead of seconds. For bulk jobs that's thousands of pages per hour on a single API key.
What file types do you support?
PDFs (native and scanned), images (JPG, PNG), Microsoft Office (DOCX, PPTX, XLSX), web content, and plain text. We detect document type automatically and optimize extraction for each format.
Do you handle scanned documents?
Yes. Built-in OCR recognizes text in images and scanned PDFs. No additional configuration needed—just send the file and get structured output back.
What happens to my documents?
Documents are processed in memory and deleted immediately after extraction. No storage, no indexing. We don't train on your data or use it for model improvement.
What license does Kreuzberg use?
The Kreuzberg open-source library is licensed under the Elastic License v2 (ELv2) from v4.8.0 onward. You can use it freely for personal projects, internal tools, and commercial applications. The one restriction: you cannot offer Kreuzberg as a managed service to third parties. Versions v4.7.x and below remain MIT-licensed. Kreuzberg Cloud is a separate commercial product with its own terms. If you need a different licensing arrangement, contact us.
I already use your open-source library with good results. Why should I try Kreuzberg cloud?
The open-source engine is fully usable and powerful on its own. Kreuzberg Cloud removes the operational complexity, so you can run it in production without worrying about managing infrastructure.

Start Building Today

Join thousands of developers already building document intelligence pipelines using Kreuzberg - in their language of choice!

We value your privacy

Kreuzberg uses cookies to improve your experience, personalize content, and analyze traffic. You can manage your preferences at any time.