Runtime voor alle open-weight modellen

Ollama as runtime of choice

We always run the best open-weight reasoning model on your own server. Ollama is our preferred runtime, alongside vLLM and other inference engines. No cloud, no external APIs.

What is Ollama?

Ollama is an open-source framework for running LLMs locally. We use Ollama as our preferred runtime to run models like Kimi K2, GPT-OSS, DeepSeek R1, Qwen 3 and Llama on your own hardware. We pick the strongest open-weight reasoning model per use case.

The right model type for every job

We run every relevant open-weight model on your own server. We pick the strongest model per use case.

Reasoning models

For complex analysis, planning and agentic tasks. State-of-the-art chain-of-thought performance.

Kimi K2 · GPT-OSS · DeepSeek R1

Multilingual models

Strong performance in 100+ languages. Ideal for international organisations.

Qwen 3 · Llama 3 · Mistral

Code models

Specialised in code generation, review and software engineering. For developer tooling and automation.

Qwen Coder · DeepSeek Coder · Code Llama

Vision models

Multimodal models that understand images, documents and screenshots. For OCR, document analysis and visual reasoning.

Qwen-VL · Llama 3.2 Vision · LLaVA

Embedding models

For semantic search, RAG and knowledge retrieval. The engine behind every knowledge base.

Nomic Embed · BGE · Jina

Fine-tuned models

Custom models trained on your data and domain. For maximum precision in your field.

Custom fine-tunes · LoRA adapters

Why Ollama as runtime?

Local execution

Models run on your own servers. No external API calls.

Full privacy

Data never leaves your controlled environment.

Fast inference

GPU-accelerated. Comparable speeds to cloud APIs.

Model management

Easily switch between models per task.

Enterprise ready

Scaled for enterprise. Load balancing and failover.

Fine-tuning ready

Support for fine-tuning on your organisation's data.

Why not Big Tech models?

OpenAI, Google and Anthropic models carry fundamental risks:

Data sent to American servers

No control over model updates

CLOUD Act: US government can request data

Per-token pricing makes costs unpredictable

View all models