Docker Deployment¶
Kreuzberg provides official Docker images built on a high-performance Rust core with Debian 13 (Trixie). Each image supports three execution modes through a flexible entrypoint pattern, enabling deployment as an API server, CLI tool, or MCP server.
Image Variants¶
Kreuzberg offers two Docker image variants optimized for different use cases:
Core Image¶
Size: ~1.0-1.3GB Image: goldziher/kreuzberg:v4-core
Included Features: - Tesseract OCR with 12 language packs (eng, spa, fra, deu, ita, por, chi-sim, chi-tra, jpn, ara, rus, hin) - Pandoc 3.6.3 for document conversion - pdfium for PDF rendering - Full support for modern file formats
Supported Formats: - PDF, DOCX, PPTX, XLSX (modern Office formats) - Images (PNG, JPG, TIFF, BMP, etc.) - HTML, XML, JSON, YAML, TOML - Email (EML, MSG) - Archives (ZIP, TAR, GZ)
Best For: - Production deployments where image size matters - Cloud environments with size/bandwidth constraints - Kubernetes deployments with frequent pod scaling - Workflows that don't require legacy Office format support
Full Image¶
Size: ~1.5-2.1GB Image: goldziher/kreuzberg:v4-full
Included Features: - All Core image features - LibreOffice 25.8.2 for legacy format conversion
Additional Formats: - Legacy Word (.doc) - Legacy PowerPoint (.ppt) - Legacy Excel (.xls)
Best For: - Complete document intelligence pipelines - Processing legacy MS Office files - Development and testing environments - When image size is not a constraint
Quick Start¶
Pull Image¶
Basic Usage¶
# Extract a single file
docker run -v $(pwd):/data goldziher/kreuzberg:v4-core \
extract /data/document.pdf
# Batch process multiple files
docker run -v $(pwd):/data goldziher/kreuzberg:v4-core \
batch /data/*.pdf --output-format json
# Detect MIME type
docker run -v $(pwd):/data goldziher/kreuzberg:v4-core \
detect /data/unknown-file.bin
Execution Modes¶
Kreuzberg Docker images use a flexible ENTRYPOINT pattern that supports three execution modes:
1. API Server Mode (Default)¶
The default mode starts an HTTP REST API server.
Default Behavior:
Custom Configuration:
# Change host and port
docker run -p 9000:9000 goldziher/kreuzberg:v4-core \
serve --host 0.0.0.0 --port 9000
# With environment variables
docker run -p 8000:8000 \
-e KREUZBERG_CORS_ORIGINS="https://myapp.com" \
-e KREUZBERG_MAX_UPLOAD_SIZE_MB=200 \
goldziher/kreuzberg:v4-core
# With configuration file
docker run -p 8000:8000 \
-v $(pwd)/kreuzberg.toml:/config/kreuzberg.toml \
goldziher/kreuzberg:v4-core \
serve --config /config/kreuzberg.toml
See API Server Guide for complete API documentation.
2. CLI Mode¶
Run Kreuzberg as a command-line tool for file processing.
Extract Files:
# Mount directory and extract file
docker run -v $(pwd):/data goldziher/kreuzberg:v4-core \
extract /data/document.pdf
# Extract with OCR
docker run -v $(pwd):/data goldziher/kreuzberg:v4-core \
extract /data/scanned.pdf --ocr true
# Output as JSON
docker run -v $(pwd):/data goldziher/kreuzberg:v4-core \
extract /data/document.pdf --output-format json > result.json
Batch Processing:
# Process multiple files
docker run -v $(pwd):/data goldziher/kreuzberg:v4-core \
batch /data/*.pdf --output-format json
# With custom concurrency
docker run -v $(pwd):/data goldziher/kreuzberg:v4-core \
batch /data/*.pdf --concurrency 8
MIME Detection:
Cache Management:
# View cache statistics
docker run goldziher/kreuzberg:v4-core cache stats
# Clear cache
docker run goldziher/kreuzberg:v4-core cache clear
See CLI Usage Guide for complete CLI documentation.
3. MCP Server Mode¶
Run Kreuzberg as a Model Context Protocol server for AI agent integration.
Start MCP Server:
With Configuration:
docker run \
-v $(pwd)/kreuzberg.toml:/config/kreuzberg.toml \
goldziher/kreuzberg:v4-core \
mcp --config /config/kreuzberg.toml
See API Server Guide - MCP Section for integration examples.
Architecture¶
Multi-Stage Build¶
Kreuzberg Docker images use multi-stage builds for optimal size and security:
- Builder Stage: Compiles Rust binary with all dependencies
- Runtime Stage: Minimal Debian Trixie slim base with only runtime dependencies
Benefits: - No build tools or intermediate artifacts in final image - Smaller image size (builder stage not included) - Reduced attack surface
Rust Core¶
Docker images use the native Rust core directly, providing:
- 10-50x performance over pure-Python alternatives
- Memory efficiency through streaming parsers for large files
- Async processing with Tokio runtime
- Zero-copy operations where possible
Multi-Architecture Support¶
Images are built for multiple architectures:
linux/amd64(x86_64)linux/arm64(aarch64)
Architecture-specific binaries are automatically selected during build.
Security Features¶
Non-Root User:
Security Options:
# Run with additional security constraints
docker run --security-opt no-new-privileges \
--read-only \
--tmpfs /tmp \
-p 8000:8000 \
goldziher/kreuzberg:v4-core
Production Deployment¶
Docker Compose¶
Basic Configuration:
version: '3.8'
services:
kreuzberg-api:
image: goldziher/kreuzberg:v4-core
ports:
- "8000:8000"
environment:
- KREUZBERG_CORS_ORIGINS=https://myapp.com,https://api.myapp.com
- KREUZBERG_MAX_UPLOAD_SIZE_MB=500
- RUST_LOG=info
volumes:
- ./config:/config
- cache-data:/app/.kreuzberg
command: serve --host 0.0.0.0 --port 8000 --config /config/kreuzberg.toml
restart: unless-stopped
healthcheck:
test: ["CMD", "kreuzberg", "--version"]
interval: 30s
timeout: 10s
retries: 3
start_period: 5s
volumes:
cache-data:
With LibreOffice (Full Image):
services:
kreuzberg-full:
image: goldziher/kreuzberg:v4-full
ports:
- "8000:8000"
environment:
- KREUZBERG_CORS_ORIGINS=https://myapp.com
volumes:
- cache-data:/app/.kreuzberg
restart: unless-stopped
Start Services:
Kubernetes Deployment¶
Deployment Manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: kreuzberg-api
labels:
app: kreuzberg
spec:
replicas: 3
selector:
matchLabels:
app: kreuzberg
template:
metadata:
labels:
app: kreuzberg
spec:
containers:
- name: kreuzberg
image: goldziher/kreuzberg:v4-core
ports:
- containerPort: 8000
name: http
env:
- name: KREUZBERG_CORS_ORIGINS
value: "https://myapp.com"
- name: KREUZBERG_MAX_UPLOAD_SIZE_MB
value: "500"
- name: RUST_LOG
value: "info"
args: ["serve", "--host", "0.0.0.0", "--port", "8000"]
livenessProbe:
exec:
command:
- kreuzberg
- --version
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
volumeMounts:
- name: cache
mountPath: /app/.kreuzberg
volumes:
- name: cache
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: kreuzberg-api
spec:
selector:
app: kreuzberg
ports:
- protocol: TCP
port: 80
targetPort: 8000
type: LoadBalancer
Apply Configuration:
Environment Variables¶
Configure Docker containers via environment variables:
Server Binding:
KREUZBERG_HOST=0.0.0.0 # Listen address (default: 127.0.0.1)
KREUZBERG_PORT=8000 # Port number (default: 8000)
Upload Limits:
CORS Configuration:
# Comma-separated list of allowed origins
KREUZBERG_CORS_ORIGINS="https://app.example.com,https://api.example.com"
Logging:
Cache Configuration:
Volume Mounts¶
Cache Persistence:
# Mount cache directory for persistence
docker run -p 8000:8000 \
-v kreuzberg-cache:/app/.kreuzberg \
goldziher/kreuzberg:v4-core
Configuration Files:
# Mount configuration file
docker run -p 8000:8000 \
-v $(pwd)/kreuzberg.toml:/config/kreuzberg.toml \
goldziher/kreuzberg:v4-core \
serve --config /config/kreuzberg.toml
File Processing:
# Mount documents directory (read-only)
docker run -v $(pwd)/documents:/data:ro \
goldziher/kreuzberg:v4-core \
extract /data/document.pdf
Image Comparison¶
| Feature | Core | Full | Difference |
|---|---|---|---|
| Base Image | debian:trixie-slim | debian:trixie-slim | - |
| Size | ~1.0-1.3GB | ~1.5-2.1GB | ~500-800MB |
| Tesseract OCR | ✅ 12 languages | ✅ 12 languages | - |
| Pandoc | ✅ 3.6.3 | ✅ 3.6.3 | - |
| pdfium | ✅ | ✅ | - |
| Modern Office | ✅ DOCX, PPTX, XLSX | ✅ DOCX, PPTX, XLSX | - |
| Legacy Office | ❌ | ✅ DOC, PPT, XLS | LibreOffice 25.8.2 |
| Pull Time | ~30s | ~45s | ~15s slower |
| Startup Time | ~1s | ~1s | Negligible |
Building Custom Images¶
Building from Source¶
Clone the repository and build:
Custom Dockerfiles¶
Create a custom Dockerfile based on official images:
FROM goldziher/kreuzberg:v4-core
# Install additional system dependencies
USER root
RUN apt-get update && \
apt-get install -y --no-install-recommends \
your-package-here && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Switch back to non-root user
USER kreuzberg
# Custom configuration
COPY kreuzberg.toml /app/kreuzberg.toml
# Custom entrypoint
CMD ["serve", "--config", "/app/kreuzberg.toml"]
Performance Tuning¶
Resource Allocation¶
Recommended Resources:
| Workload | Memory | CPU | Notes |
|---|---|---|---|
| Light | 512MB | 0.5 cores | Small documents, low concurrency |
| Medium | 1GB | 1 core | Typical documents, moderate concurrency |
| Heavy | 2GB+ | 2+ cores | Large documents, OCR, high concurrency |
Docker Run:
Docker Compose:
services:
kreuzberg:
image: goldziher/kreuzberg:v4-core
deploy:
resources:
limits:
memory: 1G
cpus: '1'
reservations:
memory: 512M
cpus: '0.5'
Scaling¶
Horizontal Scaling:
# Scale to 5 replicas
docker-compose up -d --scale kreuzberg-api=5
# Kubernetes
kubectl scale deployment kreuzberg-api --replicas=5
Load Balancing: - Use reverse proxy (Nginx, Caddy, Traefik) - Kubernetes Service with LoadBalancer type - Docker Swarm mode
Troubleshooting¶
Container Won't Start¶
Check logs:
Common Issues: - Port already in use: Change -p mapping - Insufficient permissions: Ensure volume mounts have correct permissions - Memory limit too low: Increase --memory limit
Permission Errors¶
Images run as non-root user kreuzberg (UID 1000). Ensure mounted volumes have correct permissions:
Large File Processing¶
Increase memory limit:
Increase upload size:
LibreOffice Not Available¶
LibreOffice is only available in the Full image variant. If you need legacy Office format support:
# Switch to full image
docker pull goldziher/kreuzberg:v4-full
docker run -p 8000:8000 goldziher/kreuzberg:v4-full
Next Steps¶
- API Server Guide - Complete API documentation
- CLI Usage - Command-line interface
- Configuration - Configuration options
- Advanced Features - Chunking, language detection, token reduction