Your n8n AI agent works brilliantly during a conversation. It remembers context, builds on previous answers, and handles follow-up questions like a pro. Then the workflow ends, and all that intelligence evaporates.

The next time a user returns? Your agent has no idea who they are, what they discussed, or what preferences they expressed. It’s like talking to someone with amnesia, every conversation starts from scratch.

This is the single biggest limitation holding back production AI agents in n8n. Without persistent memory, you can’t build:

  • Customer support bots that remember past tickets and preferences
  • Sales assistants that track conversation history across sessions
  • Research agents that accumulate knowledge over time
  • Personal assistants that learn user patterns and adapt

The good news? Solving n8n AI agent memory isn’t complicated once you understand the options. This guide walks through every approach, from n8n’s built-in memory nodes to custom vector database implementations, with working code you can deploy today.

Why Default AI Agents Forget Everything

Before diving into solutions, let’s understand the problem.

n8n’s AI Agent node uses window buffer memory by default. This stores conversation history in a JavaScript array that lives only during workflow execution. When the workflow completes, that array is garbage collected. Gone.

// What happens internally (simplified)
const conversationHistory = []; // Lives only during execution

agent.onMessage((msg) => {
  conversationHistory.push({ role: 'user', content: msg });
  const response = await llm.chat(conversationHistory);
  conversationHistory.push({ role: 'assistant', content: response });
});

// Workflow ends → conversationHistory disappears

This design makes sense for simple, stateless workflows. But modern AI agent use cases demand more.

The n8n community has been vocal about this limitation. Threads like “How to persist agent memory between executions” and “Long-term memory for AI agents” consistently rank among the most active discussions in the community forums.

Let’s fix it.

Built-In Memory Options: What n8n Provides

n8n ships with several memory nodes designed to address persistence. Here’s what each offers, and where each falls short.

Window Buffer Memory

What it does: Keeps the last N messages in memory during execution.

Configuration: Set the window size (default: 5 messages) to control context length.

Best for: Single-session chatbots where users complete their task in one go.

Limitation: No persistence whatsoever. When the workflow ends, memory is lost.

{
  "parameters": {
    "sessionKey": "={{ $json.sessionId }}",
    "windowSize": 10
  },
  "type": "n8n-nodes-langchain.memoryWindowBuffer"
}

Postgres Chat Memory

What it does: Stores conversation history in a PostgreSQL database table.

Configuration: Connect to your Postgres instance, specify a session key field to partition conversations.

Best for: Applications where you need conversation history persistence and already use Postgres.

Limitation: Stores raw chat messages only. No semantic search, no embedding-based retrieval. As conversations grow, you’ll hit token limits when injecting full history into the LLM context.

{
  "parameters": {
    "sessionIdType": "fromInput",
    "sessionKey": "={{ $json.userId }}",
    "tableName": "chat_memory"
  },
  "type": "n8n-nodes-langchain.memoryPostgresChat"
}

Redis Chat Memory

What it does: Stores conversation history in Redis with automatic TTL (time-to-live) expiration.

Configuration: Connect to Redis, set session key and expiration time.

Best for: High-throughput applications where you want fast access and automatic cleanup of old conversations.

Limitation: Same as Postgres, stores raw messages without semantic capabilities. Redis memory pressure can become an issue at scale.

{
  "parameters": {
    "sessionKey": "={{ $json.sessionId }}",
    "sessionTTL": 3600
  },
  "type": "n8n-nodes-langchain.memoryRedisChat"
}

The Fundamental Problem

All three built-in options share the same architectural flaw: they store literal conversation history, not semantic knowledge.

This means:

  1. Token costs explode as conversations grow
  2. Retrieval is chronological, not relevance-based
  3. Knowledge can’t be synthesized across sessions
  4. Unrelated details clog context windows

For production AI agents, you need something smarter.

Custom Memory with Supabase and pgvector

The first step up from built-in options is combining Postgres with vector embeddings. This gives you semantic memory, the ability to retrieve relevant past context based on meaning, not just recency.

Supabase makes this particularly easy because pgvector comes pre-installed on their managed Postgres instances.

Architecture Overview

User Message → Generate Embedding → Query Similar Memories → 
Inject Relevant Context → LLM Response → Store New Memory

Step 1: Create the Memory Table

-- Enable pgvector extension (one-time setup)
CREATE EXTENSION IF NOT EXISTS vector;

-- Create memory table
CREATE TABLE agent_memory (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id TEXT NOT NULL,
  content TEXT NOT NULL,
  embedding VECTOR(1536),  -- OpenAI text-embedding-3-small dimension
  memory_type TEXT DEFAULT 'conversation',
  metadata JSONB DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create index for fast similarity search
CREATE INDEX ON agent_memory 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Step 2: Build the Memory Retrieval Sub-Workflow

Create a sub-workflow that your AI agent can call as a tool:

Trigger: Execute Workflow Trigger (for calling from main workflow)

Node 1: Generate Embedding

// Code node: Generate embedding for the query
const query = $input.first().json.query;

const response = await fetch('https://api.openai.com/v1/embeddings', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${$env.OPENAI_API_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'text-embedding-3-small',
    input: query
  })
});

const data = await response.json();
return { embedding: data.data[0].embedding };

Node 2: Query Supabase

// HTTP Request node to Supabase
// POST to your Supabase REST endpoint with RPC call

// SQL function to create (run once):
/*
CREATE OR REPLACE FUNCTION search_memories(
  query_embedding VECTOR(1536),
  match_user_id TEXT,
  match_count INT DEFAULT 5
)
RETURNS TABLE (content TEXT, similarity FLOAT)
LANGUAGE plpgsql AS $$
BEGIN
  RETURN QUERY
  SELECT 
    agent_memory.content,
    1 - (agent_memory.embedding <=> query_embedding) AS similarity
  FROM agent_memory
  WHERE agent_memory.user_id = match_user_id
  ORDER BY agent_memory.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;
*/

Node 3: Format Results

// Return formatted context for the agent
const memories = $input.all().map(item => item.json.content);
return {
  relevantContext: memories.join('\n---\n'),
  memoryCount: memories.length
};

Step 3: Wire Into Your AI Agent

In your main AI Agent workflow, add this sub-workflow as a tool:

{
  "name": "recall_memory",
  "description": "Search past conversations and stored knowledge for relevant context. Use this before answering questions that might benefit from historical information.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "What to search for in memory"
      }
    },
    "required": ["query"]
  }
}

The agent will now autonomously decide when to consult memory based on the user’s question.

Qdrant or Pinecone for Dedicated Vector Memory

When your memory requirements outgrow a Postgres side-table, dedicated vector databases offer better performance and more features.

When to Choose Dedicated Vector DBs

Factor Postgres + pgvector Dedicated Vector DB
Memory size < 100K vectors > 100K vectors
Query latency ~50-200ms ~10-50ms
Filtering Basic SQL WHERE Advanced metadata filters
Scaling Vertical only Horizontal sharding
Cost Included with DB Separate service cost
Setup complexity Low Medium

Qdrant Implementation (Self-Hosted)

Qdrant is particularly attractive for n8n users because it’s open source and runs well in Docker alongside your n8n instance.

Docker setup:

# docker-compose.yml addition
services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
    volumes:
      - qdrant_storage:/qdrant/storage

volumes:
  qdrant_storage:

Creating a collection:

curl -X PUT 'http://localhost:6333/collections/agent_memory' \
  -H 'Content-Type: application/json' \
  -d '{
    "vectors": {
      "size": 1536,
      "distance": "Cosine"
    }
  }'

n8n HTTP Request node for search:

{
  "method": "POST",
  "url": "http://qdrant:6333/collections/agent_memory/points/search",
  "body": {
    "vector": "={{ $json.embedding }}",
    "limit": 5,
    "filter": {
      "must": [
        { "key": "user_id", "match": { "value": "{{ $json.userId }}" } }
      ]
    },
    "with_payload": true
  }
}

Pinecone Implementation (Managed)

For teams that prefer managed infrastructure, Pinecone offers a serverless option that scales automatically.

// Code node: Pinecone upsert
const pinecone = {
  apiKey: $env.PINECONE_API_KEY,
  environment: $env.PINECONE_ENVIRONMENT,
  index: 'agent-memory'
};

const vectors = [{
  id: $json.memoryId,
  values: $json.embedding,
  metadata: {
    userId: $json.userId,
    content: $json.content,
    timestamp: Date.now()
  }
}];

const response = await fetch(
  `https://${pinecone.index}-${pinecone.environment}.svc.pinecone.io/vectors/upsert`,
  {
    method: 'POST',
    headers: {
      'Api-Key': pinecone.apiKey,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ vectors, namespace: $json.userId })
  }
);

return { success: response.ok };

The Hybrid Approach: Production-Ready Architecture

For serious production deployments, the answer isn’t choosing one memory type, it’s combining them strategically.

Three-Tier Memory Architecture

Tier 1: Session Context (Redis)

  • Current conversation buffer
  • Fast read/write
  • Auto-expires after session timeout
  • 5-10 most recent messages

Tier 2: Semantic Memory (Vector DB)

  • Long-term knowledge and context
  • Similarity-based retrieval
  • Summarized insights, not raw transcripts
  • Pruned and consolidated regularly

Tier 3: Structured Facts (Postgres)

  • User preferences and settings
  • Explicit facts (name, company, timezone)
  • Relationship data
  • Queryable with standard SQL

Implementation Pattern

Incoming Message

┌──────────────────┐
│ Load from Redis  │ → Recent conversation context
└──────────────────┘

┌──────────────────┐
│ Query Vector DB  │ → Relevant past knowledge
└──────────────────┘

┌──────────────────┐
│ Fetch from PG    │ → User facts and preferences
└──────────────────┘

┌──────────────────┐
│ Compose Context  │ → Merge all memory layers
└──────────────────┘

┌──────────────────┐
│ AI Agent         │ → Generate response
└──────────────────┘

┌──────────────────┐
│ Store Memories   │ → Update all three tiers
└──────────────────┘

Context Composition Code

// Code node: Merge memory layers
const sessionContext = $('Redis').item.json.messages || [];
const semanticMemories = $('Qdrant').item.json.results || [];
const userFacts = $('Postgres').item.json || {};

// Build context string for injection
let context = '';

if (userFacts.name) {
  context += `User: ${userFacts.name}`;
  if (userFacts.company) context += ` (${userFacts.company})`;
  if (userFacts.preferences) context += `\nPreferences: ${userFacts.preferences}`;
  context += '\n\n';
}

if (semanticMemories.length > 0) {
  context += 'Relevant past context:\n';
  context += semanticMemories
    .map(m => `- ${m.payload.content}`)
    .join('\n');
  context += '\n\n';
}

if (sessionContext.length > 0) {
  context += 'Recent conversation:\n';
  context += sessionContext
    .map(m => `${m.role}: ${m.content}`)
    .join('\n');
}

return { composedContext: context };

Memory Options Comparison Table

Approach Persistence Semantic Search Complexity Cost Best For
Window Buffer ❌ None ❌ No ⭐ Low Free Simple chatbots
Redis Chat ✅ TTL-based ❌ No ⭐⭐ Low Low High-throughput sessions
Postgres Chat ✅ Permanent ❌ No ⭐⭐ Low Low Basic persistence
Supabase + pgvector ✅ Permanent ✅ Yes ⭐⭐⭐ Medium Low Growing projects
Qdrant (self-hosted) ✅ Permanent ✅ Yes ⭐⭐⭐ Medium Low Full control
Pinecone (managed) ✅ Permanent ✅ Yes ⭐⭐ Low Medium Hands-off scaling
Hybrid (all three) ✅ Permanent ✅ Yes ⭐⭐⭐⭐ High Medium Production systems

Performance Considerations

Token Costs Add Up Fast

Every memory you inject consumes tokens. At GPT-4 pricing (~$30/1M input tokens), retrieving 2,000 tokens of context per request adds ~$0.06 per 1,000 requests. That’s $60/month at moderate volume.

Mitigation strategies:

  • Summarize long conversations before storing
  • Use embedding similarity thresholds to filter weak matches
  • Implement tiered retrieval (facts first, then semantic search only if needed)

When to Summarize vs Retrieve

Scenario Strategy
Conversation > 20 messages Summarize older messages
Memory > 50 items for user Prune or consolidate
Similar memories clustering Merge into single insight
Time-sensitive info Add decay factor to relevance

Pruning Implementation

// Scheduled workflow: Memory maintenance
// Run nightly to consolidate and prune

// 1. Find memories older than 30 days with low access count
// 2. Cluster similar memories using embedding distance
// 3. Summarize clusters into single memories
// 4. Delete originals, keep summaries

const oldMemories = await supabase
  .from('agent_memory')
  .select('*')
  .lt('created_at', thirtyDaysAgo)
  .eq('access_count', 0);

// Cluster and summarize logic here...

Recommendation: Start Simple, Scale Smart

If you’re just getting started with n8n AI agent memory, here’s the path of least resistance:

  1. Week 1: Implement Postgres Chat Memory for basic persistence
  2. Week 2-4: Add Supabase/pgvector for semantic retrieval
  3. Month 2+: Evaluate whether you need Redis for session speed or a dedicated vector DB for scale

Don’t over-engineer on day one. Most AI agent projects don’t need Pinecone-level infrastructure until they’re handling thousands of daily users.

The hybrid architecture is the goal, but you can build toward it incrementally.

Build Production AI Agents Faster

Implementing persistent memory is just one piece of production-ready AI agents. You also need error handling, rate limiting, monitoring, and graceful degradation. (For more on building self-maintaining AI agent systems, check out our guide on Claude Code building self-healing n8n workflows.)

At Marden SEO, we help businesses deploy AI automation systems that actually work in production, not just demos that look impressive but break under real load. Our n8n expertise spans custom memory implementations, multi-agent orchestration, and the operational infrastructure that keeps agents running reliably.

If you’re building AI agents and want to skip the trial-and-error phase, reach out to discuss your project. We’ve solved these problems before.


Have questions about implementing AI agent memory in n8n? Drop them in the comments below or reach out directly. This is an evolving space, and the best practices are being written in real-time by practitioners like you.

Want this built for you?

We design and ship production n8n automation for agencies, and train your team to own it.

Book a build →