AI Glossary

80 AI and LLM-engineering terms, defined precisely — answer first, with the deeper guide linked.

A

A2A (Agent2Agent Protocol)A2A is an open protocol that lets AI agents discover each other's capabilities and delegate tasks across vendors, complementing MCP's tool connections.
Agent EngineeringAgent engineering is the discipline of building reliable AI agents — designing the tools, context, guardrails, evals, and recovery paths around the model.
Agent HarnessAn agent harness is the system around the model that makes it an agent — the loop, tools, context management, permissions, and recovery machinery.
Agent MemoryAgent memory is how an AI agent retains information beyond its context window — working state during a task and persistent knowledge across sessions.
Agent SkillsAgent Skills are reusable procedures packaged as folders with a SKILL.md file — loaded by an AI agent on demand when a task matches, now an open standard.
Agentic AIAgentic AI is the class of AI systems that act toward goals — planning, calling tools, and iterating on results — rather than only generating content.
AI AgentAn AI agent is an LLM-driven system that pursues a goal in a loop — calling tools, observing results, iterating — instead of returning one answer.
AI SlopAI slop is low-effort, mass-produced AI-generated content — fluent, generic, and unchecked — flooding feeds, search results, and codebases.
Attention MechanismAttention lets a model weigh how relevant every other token is to each token, building a context-aware representation as a weighted blend of their values.

B

Batch InferenceBatch inference processes many LLM requests asynchronously instead of one-at-a-time interactively — typically at ~50% discount via provider batch APIs.

C

D

E

F

G

H

I

InferenceInference is running a trained model to produce output — for LLMs, generating tokens one at a time. Its cost and latency define the economics of AI products.

J

JailbreakA jailbreak is a prompt crafted to bypass a model's safety training and policies — making it produce output it was trained to refuse.

K

L

M

N

Needle in a HaystackNeedle in a haystack is a long-context eval that hides a fact in filler text and tests whether the model can retrieve it at varying depths and lengths.

O

Open WeightsAn open-weights model publishes its parameters for anyone to download and run — unlike API-only models — with licenses from permissive to restricted.

P

Q

QuantizationQuantization shrinks a model by storing weights in lower precision (8-, 4-, even 2-bit) — cutting memory and speeding inference at a small accuracy cost.

R

S

T

V

Z

Zero-Shot PromptingZero-shot prompting asks a model to perform a task from instructions alone, with no examples — the default mode for capable modern LLMs.