Skip to content
agentscamp
Tool

Ollama

An open-source tool to run open-weight LLMs locally with a single command, including a local OpenAI-compatible API.

open sourcecli
Updated Jun 4, 2026
local-llmcliinferenceopen-source

Ollama is the simplest way to run open-weight LLMs on your own machine. Install it, run ollama run llama3, and you have a model answering prompts locally — no API key, no account, and nothing leaving your computer. It handles downloading and quantizing models, manages a local model library, and exposes a local API (including OpenAI-compatible endpoints) so you can build against a model running on localhost.

It is aimed at developers who want a model for local development, prototyping, privacy-sensitive work, or offline use. Ollama is about single-machine convenience — it's how you try an open model or wire one into an app on your laptop, not how you serve thousands of concurrent users.

Highlights

  • One-command runollama run <model> pulls and runs a model with no setup; a curated library covers popular open models.
  • Local API — a REST API plus OpenAI-compatible endpoints, so app code can target a local model by changing the base URL.
  • Customizable — a Modelfile lets you set system prompts, parameters, and templates, or import your own GGUF weights.
  • Cross-platform — native apps for macOS, Windows, and Linux; runs on CPU or GPU depending on your hardware.
  • Private and offline — models run entirely on your machine, so no data leaves it and it works without a connection.

In an AI-assisted workflow

Run a model and call its local OpenAI-compatible endpoint from your app:

ollama run llama3.1            # pull + chat in the terminal
# or serve and call it like OpenAI:
#   base_url="http://localhost:11434/v1"  (any OpenAI client)

TIP

Model size and quantization decide whether a model fits your RAM/VRAM and how fast it runs — start with a smaller or more-quantized variant and size up. For a GUI alternative to the CLI, see LM Studio.

Good to know

Ollama is free and open source under MIT for local use on your own machine, and runs on macOS, Windows, and Linux; an optional paid Ollama Cloud (Pro/Max) runs larger hosted models but isn't required. It's built for local, single-user use; when you need to serve a model to many concurrent users in production, move to a dedicated serving engine like vLLM and weigh the trade-offs in Self-Host vs API.

Related