LM Studio
A desktop app for discovering, downloading, and running open-weight LLMs locally with a GUI and a local OpenAI-compatible server.
LM Studio is a desktop application for running open-weight LLMs locally through a graphical interface. Where a CLI tool asks you to know the model name and flags, LM Studio lets you browse and download models, chat with them in a built-in UI, and tune parameters with sliders — then, when you're ready to build, flip on a local server that exposes an OpenAI-compatible API. It's the most approachable on-ramp to local models for people who'd rather not live in the terminal.
It is aimed at developers, researchers, and power users who want to experiment with local models, keep data on their own machine, and develop against a local endpoint — all without managing a Python environment. It runs GGUF (and on Apple Silicon, MLX) models on CPU or GPU.
Highlights
- Model discovery & download — browse and pull open models from within the app, with guidance on what fits your hardware.
- Built-in chat UI — converse with a local model and adjust parameters visually, no code required.
- Local OpenAI-compatible server — serve the loaded model on localhost so your app's OpenAI client works unchanged.
- GGUF & MLX — runs quantized models efficiently on CPU/GPU, with native Apple Silicon (MLX) support.
- Private by default — everything runs locally; no account needed and no data leaves your machine.
In an AI-assisted workflow
Download a model in the GUI, start the local server, and point your OpenAI client at it:
# in LM Studio: pick a model → "Local Server" → Start
# base_url="http://localhost:1234/v1" (any OpenAI client)TIP
LM Studio (GUI) and Ollama (CLI) solve the same problem — running models locally — from opposite ends. Choose by preference: a visual app for exploring and tuning, a command line for scripting and automation.
Good to know
LM Studio is free to download and use for both personal and commercial/work use, and runs on macOS, Windows, and Linux; organizations can buy an optional Enterprise tier (SSO, governance). Like other local runners it's built for single-machine development and privacy, not high-concurrency production serving — for that, see vLLM and the Self-Host vs API trade-offs.
Related
- OllamaAn open-source tool to run open-weight LLMs locally with a single command, including a local OpenAI-compatible API.
- vLLMA high-throughput, memory-efficient inference and serving engine for LLMs, with PagedAttention, continuous batching, and an OpenAI-compatible API server.
- Self-Host vs API: When Does Running Your Own LLM Actually Pay Off?The real economics of self-hosting an LLM vs. calling a hosted API — GPU utilization, privacy, latency, and the hidden ops costs that decide the crossover.
- LLM Inference EngineerUse this agent to serve and optimize self-hosted LLM inference — sizing GPUs, configuring a serving engine like vLLM (continuous batching, PagedAttention, tensor parallelism), applying quantization, and tuning throughput and tail latency against a cost and p95 budget. Examples — "serve Llama-3-70B at p95 under 2s on our GPUs", "our self-hosted model is slow and the GPUs sit half-idle — raise throughput", "quantize this model to fit one GPU without wrecking quality".