Your n8n AI agent works brilliantly during a conversation. It remembers context, builds on previous answers, and handles follow-up questions like a pro. Then the workflow ends, and all that intelligence evaporates.
The next time a user returns? Your agent has no idea who they are, what they discussed, or what preferences they expressed. It’s like talking to someone with amnesia, every conversation starts from scratch.
This is the single biggest limitation holding back production AI agents in n8n. Without persistent memory, you can’t build:
- Customer support bots that remember past tickets and preferences
- Sales assistants that track conversation history across sessions
- Research agents that accumulate knowledge over time
- Personal assistants that learn user patterns and adapt
The good news? Solving n8n AI agent memory isn’t complicated once you understand the options. This guide walks through every approach, from n8n’s built-in memory nodes to custom vector database implementations, with working code you can deploy today.
Why Default AI Agents Forget Everything
Before diving into solutions, let’s understand the problem.
n8n’s AI Agent node uses window buffer memory by default. This stores conversation history in a JavaScript array that lives only during workflow execution. When the workflow completes, that array is garbage collected. Gone.
// What happens internally (simplified)
const conversationHistory = []; // Lives only during execution
agent.onMessage((msg) => {
conversationHistory.push({ role: 'user', content: msg });
const response = await llm.chat(conversationHistory);
conversationHistory.push({ role: 'assistant', content: response });
});
// Workflow ends → conversationHistory disappears
This design makes sense for simple, stateless workflows. But modern AI agent use cases demand more.
The n8n community has been vocal about this limitation. Threads like “How to persist agent memory between executions” and “Long-term memory for AI agents” consistently rank among the most active discussions in the community forums.
Let’s fix it.
Built-In Memory Options: What n8n Provides
n8n ships with several memory nodes designed to address persistence. Here’s what each offers, and where each falls short.
Window Buffer Memory
What it does: Keeps the last N messages in memory during execution.
Configuration: Set the window size (default: 5 messages) to control context length.
Best for: Single-session chatbots where users complete their task in one go.
Limitation: No persistence whatsoever. When the workflow ends, memory is lost.
{
"parameters": {
"sessionKey": "={{ $json.sessionId }}",
"windowSize": 10
},
"type": "n8n-nodes-langchain.memoryWindowBuffer"
}
Postgres Chat Memory
What it does: Stores conversation history in a PostgreSQL database table.
Configuration: Connect to your Postgres instance, specify a session key field to partition conversations.
Best for: Applications where you need conversation history persistence and already use Postgres.
Limitation: Stores raw chat messages only. No semantic search, no embedding-based retrieval. As conversations grow, you’ll hit token limits when injecting full history into the LLM context.
{
"parameters": {
"sessionIdType": "fromInput",
"sessionKey": "={{ $json.userId }}",
"tableName": "chat_memory"
},
"type": "n8n-nodes-langchain.memoryPostgresChat"
}
Redis Chat Memory
What it does: Stores conversation history in Redis with automatic TTL (time-to-live) expiration.
Configuration: Connect to Redis, set session key and expiration time.
Best for: High-throughput applications where you want fast access and automatic cleanup of old conversations.
Limitation: Same as Postgres, stores raw messages without semantic capabilities. Redis memory pressure can become an issue at scale.
{
"parameters": {
"sessionKey": "={{ $json.sessionId }}",
"sessionTTL": 3600
},
"type": "n8n-nodes-langchain.memoryRedisChat"
}
The Fundamental Problem
All three built-in options share the same architectural flaw: they store literal conversation history, not semantic knowledge.
This means:
- Token costs explode as conversations grow
- Retrieval is chronological, not relevance-based
- Knowledge can’t be synthesized across sessions
- Unrelated details clog context windows
For production AI agents, you need something smarter.
Custom Memory with Supabase and pgvector
The first step up from built-in options is combining Postgres with vector embeddings. This gives you semantic memory, the ability to retrieve relevant past context based on meaning, not just recency.
Supabase makes this particularly easy because pgvector comes pre-installed on their managed Postgres instances.
Architecture Overview
User Message → Generate Embedding → Query Similar Memories →
Inject Relevant Context → LLM Response → Store New Memory
Step 1: Create the Memory Table
-- Enable pgvector extension (one-time setup)
CREATE EXTENSION IF NOT EXISTS vector;
-- Create memory table
CREATE TABLE agent_memory (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
content TEXT NOT NULL,
embedding VECTOR(1536), -- OpenAI text-embedding-3-small dimension
memory_type TEXT DEFAULT 'conversation',
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create index for fast similarity search
CREATE INDEX ON agent_memory
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
Step 2: Build the Memory Retrieval Sub-Workflow
Create a sub-workflow that your AI agent can call as a tool:
Trigger: Execute Workflow Trigger (for calling from main workflow)
Node 1: Generate Embedding
// Code node: Generate embedding for the query
const query = $input.first().json.query;
const response = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
'Authorization': `Bearer ${$env.OPENAI_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'text-embedding-3-small',
input: query
})
});
const data = await response.json();
return { embedding: data.data[0].embedding };
Node 2: Query Supabase
// HTTP Request node to Supabase
// POST to your Supabase REST endpoint with RPC call
// SQL function to create (run once):
/*
CREATE OR REPLACE FUNCTION search_memories(
query_embedding VECTOR(1536),
match_user_id TEXT,
match_count INT DEFAULT 5
)
RETURNS TABLE (content TEXT, similarity FLOAT)
LANGUAGE plpgsql AS $$
BEGIN
RETURN QUERY
SELECT
agent_memory.content,
1 - (agent_memory.embedding <=> query_embedding) AS similarity
FROM agent_memory
WHERE agent_memory.user_id = match_user_id
ORDER BY agent_memory.embedding <=> query_embedding
LIMIT match_count;
END;
$$;
*/
Node 3: Format Results
// Return formatted context for the agent
const memories = $input.all().map(item => item.json.content);
return {
relevantContext: memories.join('\n---\n'),
memoryCount: memories.length
};
Step 3: Wire Into Your AI Agent
In your main AI Agent workflow, add this sub-workflow as a tool:
{
"name": "recall_memory",
"description": "Search past conversations and stored knowledge for relevant context. Use this before answering questions that might benefit from historical information.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "What to search for in memory"
}
},
"required": ["query"]
}
}
The agent will now autonomously decide when to consult memory based on the user’s question.
Qdrant or Pinecone for Dedicated Vector Memory
When your memory requirements outgrow a Postgres side-table, dedicated vector databases offer better performance and more features.
When to Choose Dedicated Vector DBs
| Factor | Postgres + pgvector | Dedicated Vector DB |
|---|---|---|
| Memory size | < 100K vectors | > 100K vectors |
| Query latency | ~50-200ms | ~10-50ms |
| Filtering | Basic SQL WHERE | Advanced metadata filters |
| Scaling | Vertical only | Horizontal sharding |
| Cost | Included with DB | Separate service cost |
| Setup complexity | Low | Medium |
Qdrant Implementation (Self-Hosted)
Qdrant is particularly attractive for n8n users because it’s open source and runs well in Docker alongside your n8n instance.
Docker setup:
# docker-compose.yml addition
services:
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
volumes:
- qdrant_storage:/qdrant/storage
volumes:
qdrant_storage:
Creating a collection:
curl -X PUT 'http://localhost:6333/collections/agent_memory' \
-H 'Content-Type: application/json' \
-d '{
"vectors": {
"size": 1536,
"distance": "Cosine"
}
}'
n8n HTTP Request node for search:
{
"method": "POST",
"url": "http://qdrant:6333/collections/agent_memory/points/search",
"body": {
"vector": "={{ $json.embedding }}",
"limit": 5,
"filter": {
"must": [
{ "key": "user_id", "match": { "value": "{{ $json.userId }}" } }
]
},
"with_payload": true
}
}
Pinecone Implementation (Managed)
For teams that prefer managed infrastructure, Pinecone offers a serverless option that scales automatically.
// Code node: Pinecone upsert
const pinecone = {
apiKey: $env.PINECONE_API_KEY,
environment: $env.PINECONE_ENVIRONMENT,
index: 'agent-memory'
};
const vectors = [{
id: $json.memoryId,
values: $json.embedding,
metadata: {
userId: $json.userId,
content: $json.content,
timestamp: Date.now()
}
}];
const response = await fetch(
`https://${pinecone.index}-${pinecone.environment}.svc.pinecone.io/vectors/upsert`,
{
method: 'POST',
headers: {
'Api-Key': pinecone.apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({ vectors, namespace: $json.userId })
}
);
return { success: response.ok };
The Hybrid Approach: Production-Ready Architecture
For serious production deployments, the answer isn’t choosing one memory type, it’s combining them strategically.
Three-Tier Memory Architecture
Tier 1: Session Context (Redis)
- Current conversation buffer
- Fast read/write
- Auto-expires after session timeout
- 5-10 most recent messages
Tier 2: Semantic Memory (Vector DB)
- Long-term knowledge and context
- Similarity-based retrieval
- Summarized insights, not raw transcripts
- Pruned and consolidated regularly
Tier 3: Structured Facts (Postgres)
- User preferences and settings
- Explicit facts (name, company, timezone)
- Relationship data
- Queryable with standard SQL
Implementation Pattern
Incoming Message
↓
┌──────────────────┐
│ Load from Redis │ → Recent conversation context
└──────────────────┘
↓
┌──────────────────┐
│ Query Vector DB │ → Relevant past knowledge
└──────────────────┘
↓
┌──────────────────┐
│ Fetch from PG │ → User facts and preferences
└──────────────────┘
↓
┌──────────────────┐
│ Compose Context │ → Merge all memory layers
└──────────────────┘
↓
┌──────────────────┐
│ AI Agent │ → Generate response
└──────────────────┘
↓
┌──────────────────┐
│ Store Memories │ → Update all three tiers
└──────────────────┘
Context Composition Code
// Code node: Merge memory layers
const sessionContext = $('Redis').item.json.messages || [];
const semanticMemories = $('Qdrant').item.json.results || [];
const userFacts = $('Postgres').item.json || {};
// Build context string for injection
let context = '';
if (userFacts.name) {
context += `User: ${userFacts.name}`;
if (userFacts.company) context += ` (${userFacts.company})`;
if (userFacts.preferences) context += `\nPreferences: ${userFacts.preferences}`;
context += '\n\n';
}
if (semanticMemories.length > 0) {
context += 'Relevant past context:\n';
context += semanticMemories
.map(m => `- ${m.payload.content}`)
.join('\n');
context += '\n\n';
}
if (sessionContext.length > 0) {
context += 'Recent conversation:\n';
context += sessionContext
.map(m => `${m.role}: ${m.content}`)
.join('\n');
}
return { composedContext: context };
Memory Options Comparison Table
| Approach | Persistence | Semantic Search | Complexity | Cost | Best For |
|---|---|---|---|---|---|
| Window Buffer | ❌ None | ❌ No | ⭐ Low | Free | Simple chatbots |
| Redis Chat | ✅ TTL-based | ❌ No | ⭐⭐ Low | Low | High-throughput sessions |
| Postgres Chat | ✅ Permanent | ❌ No | ⭐⭐ Low | Low | Basic persistence |
| Supabase + pgvector | ✅ Permanent | ✅ Yes | ⭐⭐⭐ Medium | Low | Growing projects |
| Qdrant (self-hosted) | ✅ Permanent | ✅ Yes | ⭐⭐⭐ Medium | Low | Full control |
| Pinecone (managed) | ✅ Permanent | ✅ Yes | ⭐⭐ Low | Medium | Hands-off scaling |
| Hybrid (all three) | ✅ Permanent | ✅ Yes | ⭐⭐⭐⭐ High | Medium | Production systems |
Performance Considerations
Token Costs Add Up Fast
Every memory you inject consumes tokens. At GPT-4 pricing (~$30/1M input tokens), retrieving 2,000 tokens of context per request adds ~$0.06 per 1,000 requests. That’s $60/month at moderate volume.
Mitigation strategies:
- Summarize long conversations before storing
- Use embedding similarity thresholds to filter weak matches
- Implement tiered retrieval (facts first, then semantic search only if needed)
When to Summarize vs Retrieve
| Scenario | Strategy |
|---|---|
| Conversation > 20 messages | Summarize older messages |
| Memory > 50 items for user | Prune or consolidate |
| Similar memories clustering | Merge into single insight |
| Time-sensitive info | Add decay factor to relevance |
Pruning Implementation
// Scheduled workflow: Memory maintenance
// Run nightly to consolidate and prune
// 1. Find memories older than 30 days with low access count
// 2. Cluster similar memories using embedding distance
// 3. Summarize clusters into single memories
// 4. Delete originals, keep summaries
const oldMemories = await supabase
.from('agent_memory')
.select('*')
.lt('created_at', thirtyDaysAgo)
.eq('access_count', 0);
// Cluster and summarize logic here...
Recommendation: Start Simple, Scale Smart
If you’re just getting started with n8n AI agent memory, here’s the path of least resistance:
- Week 1: Implement Postgres Chat Memory for basic persistence
- Week 2-4: Add Supabase/pgvector for semantic retrieval
- Month 2+: Evaluate whether you need Redis for session speed or a dedicated vector DB for scale
Don’t over-engineer on day one. Most AI agent projects don’t need Pinecone-level infrastructure until they’re handling thousands of daily users.
The hybrid architecture is the goal, but you can build toward it incrementally.
Build Production AI Agents Faster
Implementing persistent memory is just one piece of production-ready AI agents. You also need error handling, rate limiting, monitoring, and graceful degradation. (For more on building self-maintaining AI agent systems, check out our guide on Claude Code building self-healing n8n workflows.)
At Marden SEO, we help businesses deploy AI automation systems that actually work in production, not just demos that look impressive but break under real load. Our n8n expertise spans custom memory implementations, multi-agent orchestration, and the operational infrastructure that keeps agents running reliably.
If you’re building AI agents and want to skip the trial-and-error phase, reach out to discuss your project. We’ve solved these problems before.
Have questions about implementing AI agent memory in n8n? Drop them in the comments below or reach out directly. This is an evolving space, and the best practices are being written in real-time by practitioners like you.
Related reading
Want this built for you?
We design and ship production n8n automation for agencies, and train your team to own it.
Book a build →