Document intelligence
for AI engineering
workflows.
Extract structured, machine-readable content from
any document and feed it directly into AI agents,
pipelines, and applications.
1
One API
97
file formats
9
SDKs
305
Programming languages.
Drop a file here
Or click to browse
• Limited to 1 page per document
• File size capped at small uploads (under 1MB)
• Limited to 10 demo requests per IP
Built for AI engineering workflows
Speed That
Unblocks Your Team
Process documents in milliseconds instead of seconds! Your RAG pipeline moves at the speed of API calls, not extraction bottlenecks. Index millions of documents without waiting weeks for processing to complete.
Batch-Processing at
Scale
Effectively process large number of documents in bulk. Kreuzberg is built for batch processing, and our cloud infrastructure is designed to scale.
LLM-Powered Intelligence
Go beyond extraction. Use vision language models as an OCR backend, extract structured JSON from documents using a schema, and generate embeddings - all via 146 LLM providers, including local models with zero API key configuration.
Built for AI
Teams
Kreuzberg is a full toolbox - text extraction, metadata extraction, NER, embedding and chunking, all in a CPU optimized binary
Code
Intelligence
Extract functions, classes, imports, and symbols from code files across 305 programming languages. Structured output, ready for semantic chunking and RAG pipelines.
Polyglot and multiplatform
Get native performance in the language of your choice. Kreuzberg is written in Rust and is shipped for 11 other programming languages. It supports Linux, MacOS and Windows runtimes.
Three steps. One API
01
Upload
Upload documents via our API or dedicated SDKs. Supports PDFs, images, DOCX, PPTX, and many other formats.
02
Process
Kreuzberg processes text, tables, images, and metadata with high accuracy. Optionally extract structured JSON using a schema, run VLM OCR for complex layouts, or generate embeddings - all in one step.
03
Integrate
JSON response with full document structure. Webhook delivery for async workflows. Plug directly into your embeddings pipeline or RAG framework.
Native, in your language
Join thousands of developers already building document intelligence
pipelines using Kreuzberg - in their language of choice!
Rust
Python
TypeScript
PHP
JavaScript
Ruby
Elixir
Go
C#
Node.js
R
WASM
Rust
Python
TypeScript
PHP
JavaScript
Ruby
Elixir
Go
C#
Node.js
R
WASM
Pay only for what you
extract - no seats, no minimums
Cloud · Pay-as-you-go
Production-ready extraction, managed by us.
$0.008/page
First 10,000 pages free
92 file formats, 305 code formats
Images and scanned PDFs supported
OCR, layout detection, table extraction
No monthly minimum
Get started instantly, no card required
Try it For Free!High volume
100K+ pages a month? Let's talk pricing.
Custom/page
Everything from the Pay as you go plan
Discounted per-page rate on the cloud
Frequently Asked Questions
Start Building Today
Join thousands of developers already building document intelligence pipelines using Kreuzberg - in their language of choice!