Agentic Complexity: Medium

Memory in Go

Give an agent a pluggable memory store — short-term context buffer, episodic session history, and optional long-term semantic store — through one interface so the storage backend can be swapped.

The Problem

LLMs are stateless — every call starts with a blank slate. Without explicit memory management, an agent forgets what the user said three turns ago, can’t reference previous conclusions, and gives inconsistent answers across a session. Naively dumping the entire conversation into every prompt hits the context window limit quickly.

The Solution

Define a Memory interface with Add(), Recent(), and Search(). Concrete implementations swap the storage strategy: InMemoryStore keeps a fixed-size circular buffer of recent entries, EpisodicStore maintains a full append-only log grouped by session, and a VectorStore stub shows how semantic search fits the same interface. The agent calls memory.Recent(n) before each LLM call to hydrate the prompt, and memory.Add() after each turn to record the exchange.

Structure

Memory Pattern
Step 1 of 4

Hydrating the Prompt

Before each LLM call the agent calls memory.Recent(n) to retrieve the last n turns. The returned slice is formatted into the prompt as conversation history, giving the model context without overwhelming the token budget.

Implementation

package main

import "time"

// Role identifies who produced a memory entry.
type Role string

const (
	RoleUser      Role = "user"
	RoleAssistant Role = "assistant"
	RoleSystem    Role = "system"
)

// MemoryEntry is one recorded turn in the agent's history.
type MemoryEntry struct {
	Role      Role
	Content   string
	Timestamp time.Time
	SessionID string
}

// Memory is a pluggable store for agent conversation history.
type Memory interface {
	Add(entry MemoryEntry)
	Recent(n int) []MemoryEntry
	Search(query string) []MemoryEntry
	Clear()
}

Real-World Analogy

A doctor’s consultation notes: during the appointment (InMemoryStore), the doctor remembers the last few things discussed. After the visit, notes go into the patient’s full record (EpisodicStore). When a specialist is needed, they search the record by symptom keyword (VectorStore). Each retrieval mode serves a different clinical need from the same underlying record.

Pros and Cons

ProsCons
Single Memory interface lets you swap backends without changing agent codeManaging token budget across memory and response requires care
InMemoryStore has zero dependencies for simple use casesCircular buffer discards old entries — fine for recency, bad for recall
EpisodicStore enables full session replay for debuggingGrowing history increases prompt cost on every turn
VectorStore enables semantic search across thousands of entriesEmbedding and vector search add infrastructure complexity

Best Practices

  • Always bound the context you inject — call Recent(n) with a fixed n and measure the token count before sending to the LLM.
  • Use RWMutex in all memory implementations — agents that run tools concurrently will read and write simultaneously.
  • Tag each MemoryEntry with SessionID from the start; retrofitting session isolation into a flat store is painful.
  • Summarize old episodes periodically rather than discarding them — a summary entry preserves semantics without burning tokens.
  • Write a Search() implementation even if it’s just keyword matching initially; the interface makes upgrading to semantic search a drop-in replacement.

When to Use

  • Any multi-turn conversational agent that needs to reference earlier messages.
  • Long-running assistants that persist across sessions.
  • Agents that must recall specific facts from a large history without re-sending everything.

When NOT to Use

  • Single-turn request/response agents with no conversational state.
  • Workflows where the full context always fits in one LLM call and memory management adds unnecessary overhead.