How OpenClaw Manages AI Agent Memory (A Case Study)
OpenClaw is an autonomous AI agent framework that runs continuously, handles tasks, and maintains state across sessions. Unlike ChatGPT (which resets every conversation), OpenClaw agents persist.
This creates a hard problem: how do you manage memory for an AI that runs 24/7?
I’ve been running OpenClaw agents for six months. In this case study, I’ll walk through:
- How OpenClaw’s memory system works (file-based architecture)
- What it does well (and why)
- Where it breaks down at scale
- What I’ve learned building on top of it
This isn’t theory. This is lived experience from running autonomous agents in production.
OpenClaw’s Memory Architecture (File-Based)
OpenClaw uses a flat file system for memory. No databases, no vector stores โ just Markdown files in your workspace.
Here’s the structure:
workspace/
โโ AGENTS.md # Core identity (who you are, rules)
โโ MEMORY.md # Long-term curated memories
โโ USER.md # Info about the human you're helping
โโ TOOLS.md # Tool-specific config (SSH hosts, API keys)
โโ memory/
โโ 2026-02-18.md # Today's raw logs
โโ 2026-02-17.md # Yesterday's logs
โโ 2026-02-16.md # Older logs
How It Works (Session Lifecycle)
1. Agent Wakes Up (New Session)
Before doing anything, the agent reads:
AGENTS.mdโ Core identity and instructionsMEMORY.mdโ Long-term memories (manually curated)memory/YYYY-MM-DD.md(today + yesterday) โ Recent context
Total context loaded: ~10,000-15,000 tokens (at the start)
2. Agent Works (Handles Tasks)
As the agent interacts:
- All conversations get logged to
memory/YYYY-MM-DD.md(append-only) - Decisions and new facts are supposed to be added to
MEMORY.md(manual or semi-automated)
3. Agent Sleeps (End of Session)
Nothing happens automatically. The memory files just… sit there. You’re expected to manually review and consolidate.
What Gets Stored in Each File
AGENTS.md (Core Identity)
# Who You Are
You are Pappu, a personal AI assistant for Rahul.
# Rules
- Never send emails without explicit approval
- Always use `--account pappu@bluntedges.com` for Google Workspace
- Be concise, no fluff
MEMORY.md (Long-Term Curated Memories)
# Insurance Policies
- HDFC Life: Renews April 15, 2026
- ICICI Pru: Renews June 22, 2026
# Preferences
- Prefers Python over JavaScript
- Uses Obsidian for PKM
- Lives in Mumbai, uses IST timezone
memory/2026-02-18.md (Daily Raw Logs)
[09:15] User: Check my calendar
[09:16] Agent: You have 2 meetings today
[10:42] User: Draft email to Sarah about Q2 budget
[10:45] Agent: [email draft]
[10:46] User: Send it
[14:22] User: I'm switching from Notion to Obsidian
[14:23] Agent: Noted. Want help migrating?
What OpenClaw Memory Does Well
1. Transparency
Everything is plain text. You can grep, rg, or manually search memory files. No black boxes.
2. Portability
Files are Markdown. You can version them with git, sync them across machines, backup easily.
3. Simplicity
No databases to configure. No vector embeddings to compute. Just files.
4. Git-Friendly
I commit MEMORY.md and config files to git. Daily logs stay local. This gives me version history for important context.
5. Human-Readable
Non-technical users can open MEMORY.md in any text editor and understand it. No SQL queries required.
Where OpenClaw Memory Breaks Down
Now for the problems. These aren’t bugs in OpenClaw โ they’re fundamental limitations of flat file memory.
1. No Automatic Consolidation
After a month of daily use, I had:
MEMORY.md: 15,000 tokensmemory/folder: 30 daily log files, totaling 200,000 tokens
The agent only loads today + yesterday by default. Anything older is invisible unless you manually reference it.
Result: The agent “forgets” decisions made last week because they’re buried in old daily logs.
Workaround: Manually review weekly and update MEMORY.md. This takes hours.
2. No Search (Beyond grep)
Want to find all mentions of “insurance”? You can rg insurance memory/, but:
- No semantic search (can’t find “policy renewals” when searching “insurance”)
- No ranking by relevance
- No filtering by date/category
Result: You waste time manually scanning results.
3. Context Bloat
As MEMORY.md grows, it consumes more tokens. After 3 months:
MEMORY.md: 40,000 tokens- Core files (
AGENTS.md,USER.md, etc.): 8,000 tokens - Total loaded context: 48,000 tokens before any conversation starts
Result: Slower responses, higher costs, attention dilution.
4. No Relevance Scoring
All facts in MEMORY.md are treated equally. The agent can’t distinguish:
- “User’s name is Rahul” (permanent, high priority)
- “User was debugging a React component on Jan 15” (transient, low priority)
Result: Outdated trivia clutters your context.
5. No Structure or Relationships
Memory is a flat list. There’s no way to represent:
- “Project X depends on Library Y”
- “Decision A supersedes Decision B”
- “Preference C only applies to Context D”
Result: The agent can’t reason about relationships between memories.
My Hacks for Scaling OpenClaw Memory
Here’s what I built on top of OpenClaw to make memory usable:
Hack 1: Weekly Consolidation Script
#!/bin/bash
# weekly-consolidate.sh
# Combine last 7 days of logs
cat memory/2026-02-{12..18}.md > /tmp/week-logs.md
# Use GPT-4 to extract key facts
curl -X POST https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "Extract decisions, preferences, and important facts. Discard noise."},
{"role": "user", "content": "'"$(cat /tmp/week-logs.md)"'"}
]
}' | jq -r '.choices[0].message.content' >> MEMORY-updates.md
# Manually review MEMORY-updates.md and merge into MEMORY.md
Result: Reduces manual review time from 3 hours to 30 minutes.
Hack 2: Semantic Search with Qdrant
I built a local vector database for archived memories:
from qdrant_client import QdrantClient
import openai
client = QdrantClient("localhost", port=6333)
# Index old daily logs
for log_file in glob.glob("memory/2026-*.md"):
text = open(log_file).read()
embedding = openai.Embedding.create(
model="text-embedding-ada-002",
input=text
)["data"][0]["embedding"]
client.upsert(
collection_name="memory_archive",
points=[{
"id": log_file,
"vector": embedding,
"payload": {"text": text, "date": log_file}
}]
)
Now I can semantically search archived logs:
def search_memory(query):
embedding = openai.Embedding.create(
model="text-embedding-ada-002",
input=query
)["data"][0]["embedding"]
results = client.search(
collection_name="memory_archive",
query_vector=embedding,
limit=5
)
return results
Result: Can find relevant context from months ago without manual scanning.
Hack 3: Token Budget Enforcer
Before each API call, I prune MEMORY.md to fit a 5,000-token budget:
import tiktoken
def enforce_token_budget(memory_file, budget=5000):
lines = open(memory_file).readlines()
encoding = tiktoken.encoding_for_model("gpt-4")
loaded = []
tokens = 0
for line in lines:
line_tokens = len(encoding.encode(line))
if tokens + line_tokens <= budget:
loaded.append(line)
tokens += line_tokens
else:
break
return "".join(loaded)
Result: Prevents context bloat, keeps costs manageable.
Lessons Learned (6 Months of OpenClaw Agents)
Lesson 1: Manual Memory Curation Doesn’t Scale
After month 2, I stopped manually updating MEMORY.md weekly. It was too time-consuming.
Takeaway: Automation is mandatory for long-running agents.
Lesson 2: Flat Files Are Good for Bootstrapping
For the first few weeks, plain Markdown files worked great. Simple, transparent, easy to debug.
Takeaway: Start simple. Add complexity only when needed.
Lesson 3: You Need Tiered Storage Eventually
Once you have >10K tokens of memory, you can’t load it all into context.
Takeaway: Separate working memory (always loaded) from warm storage (searchable).
Lesson 4: Daily Logs Are Gold (Don’t Delete Them)
I almost deleted old daily logs to save disk space. Glad I didn’t โ they’re invaluable for debugging and auditing.
Takeaway: Archive logs, don’t delete. Storage is cheap.
Lesson 5: Git + Memory = Time Machine
Committing MEMORY.md to git means I can revert if a consolidation goes wrong.
Takeaway: Version control isn’t just for code.
What I’d Build Next (If I Had Time)
Here’s what OpenClaw memory needs:
- Automatic nightly consolidation (Dream Routine)
- Semantic search built-in (no manual Qdrant setup)
- Relevance scoring + decay (old memories fade unless reinforced)
- Structured memory (JSON or SQLite, not just Markdown)
- Memory graph visualization (see relationships between facts)
This is what MyDeepBrain is designed to provide.
Key Takeaways
- OpenClaw uses flat file memory (Markdown in
memory/folder) - Transparent and portable but doesn’t scale beyond a few weeks
- No auto-consolidation โ manual curation required
- No semantic search โ grep is your only option
- Hacks exist (consolidation scripts, vector DBs) but they’re brittle
OpenClaw’s memory system is good enough for prototyping, but if you’re running agents long-term, you need better infrastructure.
Building on OpenClaw? MyDeepBrain adds automatic consolidation, semantic search, and tiered memory to OpenClaw agents. Join the waitlist.
Want early access to MyDeepBrain?
We're building a self-hosted memory platform for AI assistants. Join the waitlist to be notified when we launch.
Join Waitlist