Pipecat
An open-source Python framework for real-time voice and multimodal conversational AI — it orchestrates streaming STT, LLM, and TTS into composable pipelines.
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational agents. It orchestrates the streaming STT → LLM → TTS loop, the audio transport (WebRTC/WebSocket), and turn-taking into composable pipelines, with integrations across dozens of speech and model providers — so you build the agent's behavior instead of the real-time plumbing.
Pipecat is an open-source Python framework for real-time voice and multimodal conversational AI. It solves the hard, generic part of a voice agent: orchestrating the streaming STT → LLM → TTS loop, managing the audio transport, and handling turn-taking — all as composable pipelines. You assemble a pipeline from provider-backed components and Pipecat runs the real-time hand-offs, so you focus on the agent's behavior rather than the streaming infrastructure.
It's aimed at developers building custom voice agents who want best-of-breed providers per stage instead of a single bundled API — and the control over latency, cost, and model choice that brings.
Highlights
- Composable real-time pipeline — wire streaming STT, LLM, and TTS into one low-latency loop.
- Broad integrations — works with dozens of STT/LLM/TTS providers (Deepgram, ElevenLabs, OpenAI, Anthropic, and many more).
- Transports built in — WebRTC and WebSocket for browser, phone, and app audio.
- Turn-taking & interruptions — voice-activity detection, endpointing, and barge-in handled in the framework.
- Single or multi-agent — compose one agent or coordinate several with handoff and parallel processing.
In an AI-assisted workflow
# a Pipecat pipeline wires the streaming stages into one real-time loop
from pipecat.pipeline.pipeline import Pipeline
pipeline = Pipeline([transport.input(), stt, llm, tts, transport.output()])Swap any stage's provider without rewriting the loop — the pipeline structure stays the same.
TIP
Pipecat shines when you want to mix providers — e.g. Deepgram for STT, your own LLM via a gateway, and ElevenLabs for TTS — and still get tuned turn-taking and barge-in for free. For a single-vendor prototype, a bundled voice-agent API is faster to start.
Good to know
Pipecat is open source (BSD-2-Clause) and free; you pay the underlying STT/LLM/TTS providers for usage. It's a Python framework you run yourself (locally or in the cloud), with WebRTC/WebSocket transports for getting audio in and out. To pick the providers it orchestrates, compare Deepgram (STT) and ElevenLabs (TTS); the voice-agent-engineer builds and tunes the whole pipeline.
Related
- How to Build a Voice Agent: The STT → LLM → TTS PipelineHow to build a real-time voice agent: the STT → LLM → TTS pipeline, the latency budget that makes or breaks it, and how to wire each stage.
- Voice Agent EngineerUse this agent to build or fix a real-time voice agent — the streaming STT → LLM → TTS pipeline, conversational (mouth-to-ear) latency, turn-taking, barge-in/interruptions, and per-stage provider selection. Examples — "our voice bot feels laggy and talks over people, fix the turn-taking and latency", "build a phone agent that transcribes, answers with our LLM, and speaks back", "get our voice agent's response time under a second".
- DeepgramA voice-AI platform with fast, accurate speech-to-text (Nova) and low-latency text-to-speech (Aura), plus a bundled Voice Agent API.
- ElevenLabsA voice-AI platform for high-quality text-to-speech, voice cloning, dubbing, and real-time conversational agents, via API.