Memory System
Long-term memory with hybrid vector and keyword search, auto-capture, auto-recall, and offline embeddings.
The desktop app includes a long-term memory system that lets agents remember important facts, preferences, and context across sessions. This means your agent learns from previous interactions and provides more relevant responses over time.
How It Works
The memory system uses a hybrid search architecture combining two techniques:
- Vector search (sqlite-vec) -- Finds semantically similar memories using cosine similarity. Good for paraphrase matching ("user prefers dark mode" matches "what theme does the user like?").
- Full-text search (FTS5 BM25) -- Finds exact keyword matches. Good for code symbols, IDs, and specific terms.
Results from both searches are combined using Reciprocal Rank Fusion (RRF), then boosted by:
- Recency -- More recent memories rank higher
- Importance -- Explicitly important memories rank higher
- Access frequency -- Frequently recalled memories rank higher
Memory Lifecycle
Auto-Recall (Before Each Task)
When you start a new task, the system automatically:
- Takes your prompt and searches existing memories
- Returns the most relevant memories
- Formats them as context for the agent
- The agent sees these memories and uses them to inform its work
This happens transparently -- you don't need to manually tell the agent what to remember.
Auto-Capture (After Messages)
After you send messages, the system:
- Extracts potential facts from your messages using rules
- Checks for duplicates (0.95 similarity threshold)
- Stores new facts as memories with appropriate categories
LLM-Based Capture (Optional)
At configurable intervals, a lightweight model (Haiku) can:
- Analyze the conversation so far
- Extract structured facts
- Store them with categories and importance scores
This produces higher-quality memories than rule-based extraction.
Session Indexing (Optional)
When enabled, the system:
- Chunks completed task conversations into ~400-token overlapping segments
- Generates vector embeddings for each chunk
- Makes entire past conversations searchable by semantic similarity
Memory Categories
Memories are categorized for better organization:
| Category | Examples |
|---|---|
| Preference | "User prefers TypeScript over JavaScript" |
| Fact | "The project uses PostgreSQL 16" |
| Decision | "Team decided to use Tailwind CSS 4" |
| Entity | "Alice is the team lead for the API project" |
| Other | Uncategorized facts |
Memory Tools
Agents have 4 memory tools available during execution:
| Tool | Description |
|---|---|
| recall | Search memories by query |
| store | Save a new memory |
| forget | Remove a specific memory |
| list | List all memories with filtering |
These tools are automatically registered when memory is enabled.
Embedding Providers
Choose how text is converted to vectors for similarity search:
| Provider | Model | Dimensions | Speed | Cost |
|---|---|---|---|---|
| Local (default) | gte-multilingual-base | 768 | ~40-60ms | Free |
| OpenAI | text-embedding-3-small | 1536 | Network-dependent | API-priced |
| Gemini | text-embedding-004 | 768 | Network-dependent | API-priced |
The local provider is recommended for most users:
- No API key required
- Works offline
- Supports 75 languages
- Uses ONNX Runtime for fast inference
- Model auto-downloads on first use (~340 MB)
Configuration
Configure memory in Settings > Memory:
| Setting | Description | Default |
|---|---|---|
| Enable Memory | Turn the memory system on/off | Off |
| Auto-Capture | Automatically extract facts from messages | On |
| Auto-Recall | Automatically search memories before tasks | On |
| Embedding Provider | Local, OpenAI, or Gemini | Local |
| LLM Capture Interval | How often LLM extraction runs | Configurable |
| Session Indexing | Index completed conversations | Off |
Memory Management
- Reindex -- Rebuild all vector embeddings (useful after changing providers)
- Export -- Download all memories as a file
- Import -- Load memories from a file
- Capacity limits -- Configurable maximum with LRU eviction
Safety Features
The memory system includes several safety measures:
- Prompt injection guard -- Memories are XML-escaped before injection into agent context
- Capacity limits -- Prevents unbounded memory growth
- Deduplication -- 0.95 similarity threshold prevents storing near-identical facts
- Source tracking -- Each memory records its origin (manual, auto_capture, mcp_tool, or api)
Learn More
- Agent System -- How agents use memory during execution
- Workspace Security -- Data protection and isolation