Extract
structured
content from
any document.
Open source. High-performance extraction. Built for
AI pipelines, and applications.
97
file formats,
16
Bindings
305
Programming languages.
pip install kreuzberg
The full stack
Document extraction
Extract text, tables, images, and metadata from 97+ file formats. PDF, Office, images, HTML, archives, email, and more - one API handles them all.
Code intelligence
Understand code structure across 248 programming languages. Extract functions, classes, imports, symbols, and docstrings with semantic chunking.
Powered by tree-sitter-language-pack
Web crawling
Scrape and crawl any website with structured output. Text, metadata, links, and clean Markdown - ready for AI pipelines.
Powered by kreuzcrawl
LLM integration
Connect to 142 LLM providers from any language. VLM OCR, structured JSON extraction, and embeddings - one unified API.
Powered by liter-llm
Runs on
Windows
Linux
macOS
Android
iOS
Flutter
Kotlin
CLI
Docker
Homebrew
Windows
Linux
macOS
Android
iOS
Flutter
Kotlin
CLI
Docker
Homebrew
In your language
Lightning fast Rust core with polyglot bindings. Build document
intelligence pipelines in your language of choice.
Rust
Python
TypeScript
PHP
JavaScript
Ruby
Elixir
Go
C#
Node.js
R
WASM
Dart
Kotlin
Swift
Java
Zig
Rust
Python
TypeScript
PHP
JavaScript
Ruby
Elixir
Go
C#
Node.js
R
WASM
Dart
Kotlin
Swift
Java
Zig
Fits into your workflow
LangChain
Document loader for LangChain pipelines
LlamaIndex
Native reader for LlamaIndex RAG workflows
Haystack
File converter for Haystack pipelines
CrewAI
Document tool for CrewAI agents
txtAI
Extractor for txtAI semantic search
SurrealDB
Document ingestion for SurrealDB
Open WebUI
Built-in extraction for Open WebUI
LangChain
Document loader for LangChain pipelines
LlamaIndex
Native reader for LlamaIndex RAG workflows
Haystack
File converter for Haystack pipelines
CrewAI
Document tool for CrewAI agents
txtAI
Extractor for txtAI semantic search
SurrealDB
Document ingestion for SurrealDB
Open WebUI
Built-in extraction for Open WebUI
See how Kreuzberg performs
Benchmarked across format types and document sizes.
Frequently Asked Questions
Already using the open source library? Kreuzberg Cloud runs the infrastructure for you.
Kreuzberg Cloud runs the extraction layer for you. Start with 10,000 free pages.