Ollama as runtime of choice
We always run the best open-weight reasoning model on your own server. Ollama is our preferred runtime, alongside vLLM and other inference engines. No cloud, no external APIs.
What is Ollama?
Ollama is an open-source framework for running LLMs locally. We use Ollama as our preferred runtime to run models like Kimi K2, GPT-OSS, DeepSeek R1, Qwen 3 and Llama on your own hardware. We pick the strongest open-weight reasoning model per use case.
The right model type for every job
We run every relevant open-weight model on your own server. We pick the strongest model per use case.
Reasoning models
For complex analysis, planning and agentic tasks. State-of-the-art chain-of-thought performance.
Kimi K2 · GPT-OSS · DeepSeek R1
Multilingual models
Strong performance in 100+ languages. Ideal for international organisations.
Qwen 3 · Llama 3 · Mistral
Code models
Specialised in code generation, review and software engineering. For developer tooling and automation.
Qwen Coder · DeepSeek Coder · Code Llama
Vision models
Multimodal models that understand images, documents and screenshots. For OCR, document analysis and visual reasoning.
Qwen-VL · Llama 3.2 Vision · LLaVA
Embedding models
For semantic search, RAG and knowledge retrieval. The engine behind every knowledge base.
Nomic Embed · BGE · Jina
Fine-tuned models
Custom models trained on your data and domain. For maximum precision in your field.
Custom fine-tunes · LoRA adapters
Why Ollama as runtime?
Local execution
Models run on your own servers. No external API calls.
Full privacy
Data never leaves your controlled environment.
Fast inference
GPU-accelerated. Comparable speeds to cloud APIs.
Model management
Easily switch between models per task.
Enterprise ready
Scaled for enterprise. Load balancing and failover.
Fine-tuning ready
Support for fine-tuning on your organisation's data.
Why not Big Tech models?
OpenAI, Google and Anthropic models carry fundamental risks: