Skip to content
agentscamp
Tool

Pipecat

An open-source Python framework for real-time voice and multimodal conversational AI — it orchestrates streaming STT, LLM, and TTS into composable pipelines.

open sourcesdk
Updated Jun 4, 2026
voicereal-timeframeworkopen-sourcepython

Pipecat is an open-source Python framework for building real-time voice and multimodal conversational agents. It orchestrates the streaming STT → LLM → TTS loop, the audio transport (WebRTC/WebSocket), and turn-taking into composable pipelines, with integrations across dozens of speech and model providers — so you build the agent's behavior instead of the real-time plumbing.

Pipecat is an open-source Python framework for real-time voice and multimodal conversational AI. It solves the hard, generic part of a voice agent: orchestrating the streaming STT → LLM → TTS loop, managing the audio transport, and handling turn-taking — all as composable pipelines. You assemble a pipeline from provider-backed components and Pipecat runs the real-time hand-offs, so you focus on the agent's behavior rather than the streaming infrastructure.

It's aimed at developers building custom voice agents who want best-of-breed providers per stage instead of a single bundled API — and the control over latency, cost, and model choice that brings.

Highlights

  • Composable real-time pipeline — wire streaming STT, LLM, and TTS into one low-latency loop.
  • Broad integrations — works with dozens of STT/LLM/TTS providers (Deepgram, ElevenLabs, OpenAI, Anthropic, and many more).
  • Transports built in — WebRTC and WebSocket for browser, phone, and app audio.
  • Turn-taking & interruptions — voice-activity detection, endpointing, and barge-in handled in the framework.
  • Single or multi-agent — compose one agent or coordinate several with handoff and parallel processing.

In an AI-assisted workflow

# a Pipecat pipeline wires the streaming stages into one real-time loop
from pipecat.pipeline.pipeline import Pipeline
pipeline = Pipeline([transport.input(), stt, llm, tts, transport.output()])

Swap any stage's provider without rewriting the loop — the pipeline structure stays the same.

TIP

Pipecat shines when you want to mix providers — e.g. Deepgram for STT, your own LLM via a gateway, and ElevenLabs for TTS — and still get tuned turn-taking and barge-in for free. For a single-vendor prototype, a bundled voice-agent API is faster to start.

Good to know

Pipecat is open source (BSD-2-Clause) and free; you pay the underlying STT/LLM/TTS providers for usage. It's a Python framework you run yourself (locally or in the cloud), with WebRTC/WebSocket transports for getting audio in and out. To pick the providers it orchestrates, compare Deepgram (STT) and ElevenLabs (TTS); the voice-agent-engineer builds and tunes the whole pipeline.

Related