SECTION 01
Setup, plus kick off your exports in the background.
~ 5 minutes active, then waiting
Two things start the timer: ChatGPT export takes 1–3 days, Claude.ai export takes 5 minutes. Request both right now so you don't sit around later.
What you need
- Claude Code — the CLI (install docs)
- Obsidian — the viewer for the vault (obsidian.md, free)
- Python 3.10+ — for the parser scripts
- ~500 MB free disk for a typical 1–2 year history
Request your ChatGPT export FIRST (it's the slow one)
- Go to
chatgpt.com→ Settings → Data Controls → Export data - Click Export. You'll get an email in 1–3 days with a zip.
- Move on. We'll handle it in section 05 when it arrives.
Request your Claude.ai export (5 minute timer)
- Go to
claude.ai→ profile icon → Settings → Privacy → Export data - Click Export. An email lands in about 5 minutes with a download.
Clone this repo (or download the scripts)
$ git clone https://github.com/M1w234/claude-brain-builder ~/claude-brain-builder $ cd ~/claude-brain-builder
Every script and prompt referenced in the next sections is embedded inline below each step as a collapsible <details> block with a Copy button. You can either work from the cloned repo or copy each script from this page.
SECTION 02
Build the engine from your terminal Claude Code sessions.
~ 20 minutes
Claude Code stores every conversation locally as JSONL under ~/.claude/projects/. This section parses that into markdown, fans out 10 parallel sub-agents to backlink it by topic / tool / entity, and consolidates the result into hub pages.
1. Scaffold the vault
$ mkdir -p ~/ClaudeCodeBrain/{projects,topics,tools,entities,_build/scripts,_build/reports,_build/assignments} $ cp scripts/*.py ~/ClaudeCodeBrain/_build/scripts/
2. Parse your terminal JSONL → markdown
Reads every .jsonl file under ~/.claude/projects/, dedupes by cwd (worktrees collapse to their root project), wraps each tool call in a collapsible <details> block, caps tool-result content at ~4000 chars to keep files readable, and writes one markdown file per session.
$ python3 ~/ClaudeCodeBrain/_build/scripts/export.pyscripts/export.py
#!/usr/bin/env python3
"""Export ~/.claude/projects/ JSONL into an Obsidian vault.
Layout produced:
projects/<project>/sessions/<date>_<slug>_<id8>.md
projects/<project>/subagents/<parent-id8>__<agent-id8>.md
projects/<project>/_project.md
README.md
"""
from __future__ import annotations
import hashlib
import json
import re
import sys
from collections import defaultdict
from datetime import datetime, timezone
from pathlib import Path
HOME = Path.home()
SRC = HOME / ".claude" / "projects"
VAULT = HOME / "ClaudeCodeBrain"
TOOL_RESULT_CAP = 4000
USER_MSG_PREVIEW = 80
SUMMARY_LEN = 160
def slugify(s: str, max_len: int = 60) -> str:
s = (s or "").strip().lower()
s = re.sub(r"[^\w\s-]", "", s)
s = re.sub(r"[\s_]+", "-", s)
s = re.sub(r"-+", "-", s).strip("-")
return s[:max_len] or "session"
def project_name_from_cwd(cwd: str) -> str:
"""Derive a kebab-case project slug from a cwd path.
Handles worktrees by stripping everything from /.claude/worktrees/...
"""
if not cwd:
return "unknown"
# Strip worktree path
if "/.claude/worktrees/" in cwd:
cwd = cwd.split("/.claude/worktrees/")[0]
# Special case: home directory itself
home = str(HOME)
if cwd == home:
return "home"
# Take last segment
name = Path(cwd).name
return slugify(name) or "unknown"
def short_id(uuid_str: str) -> str:
if not uuid_str:
return "xxxxxxxx"
return uuid_str.replace("-", "")[:8]
def parse_iso_ts(ts: str) -> datetime | None:
if not ts:
return None
try:
if ts.endswith("Z"):
ts = ts[:-1] + "+00:00"
return datetime.fromisoformat(ts)
except Exception:
return None
def render_content_item(item: dict, ctx: dict) -> str:
"""Render a single content item from a message into markdown."""
t = item.get("type", "")
if t == "text":
return (item.get("text") or "").strip()
if t == "thinking":
thinking = (item.get("thinking") or "").strip()
if not thinking:
return ""
return f"<details><summary>💭 Thinking</summary>\n\n{thinking}\n\n</details>"
if t == "tool_use":
name = item.get("name", "Tool")
input_obj = item.get("input", {})
ctx["tool_uses"][item.get("id", "")] = name
try:
input_str = json.dumps(input_obj, indent=2, ensure_ascii=False)
except Exception:
input_str = str(input_obj)
# Pull a one-line summary from common fields
summary_bits = []
if isinstance(input_obj, dict):
for key in ("description", "command", "file_path", "pattern", "url", "prompt"):
v = input_obj.get(key)
if v:
s = str(v).split("\n", 1)[0]
summary_bits.append(f"`{key}`: {s[:100]}")
break
summary = " — ".join(summary_bits) if summary_bits else ""
head = f"🔧 **{name}**" + (f" — {summary}" if summary else "")
return (
f"<details><summary>{head}</summary>\n\n"
f"```json\n{input_str}\n```\n\n</details>"
)
if t == "tool_result":
content = item.get("content", "")
if isinstance(content, list):
parts = []
for c in content:
if isinstance(c, dict):
if c.get("type") == "text":
parts.append(c.get("text", ""))
elif c.get("type") == "image":
parts.append("[image]")
else:
parts.append(str(c))
else:
parts.append(str(c))
content = "\n".join(parts)
elif not isinstance(content, str):
content = str(content)
truncated = len(content) > TOOL_RESULT_CAP
if truncated:
content = content[:TOOL_RESULT_CAP] + f"\n\n…[truncated {len(item.get('content', '')) - TOOL_RESULT_CAP} chars]"
is_error = item.get("is_error")
tu_id = item.get("tool_use_id", "")
tool_name = ctx["tool_uses"].get(tu_id, "tool")
flag = " ⚠️ error" if is_error else ""
return (
f"<details><summary>📤 {tool_name} result{flag}</summary>\n\n"
f"```\n{content}\n```\n\n</details>"
)
if t == "image":
return "[image]"
return ""
def render_message(line: dict, ctx: dict) -> str | None:
"""Render one JSONL line into a markdown block, or None to skip."""
t = line.get("type")
msg = line.get("message", {})
if t == "user":
if not isinstance(msg, dict):
return None
content = msg.get("content", "")
# Tool results often live inside user messages as content list items
if isinstance(content, list):
rendered = [render_content_item(c, ctx) for c in content if isinstance(c, dict)]
rendered = [r for r in rendered if r]
if not rendered:
return None
# If everything is a tool_result, label section accordingly
kinds = {c.get("type") for c in content if isinstance(c, dict)}
if kinds == {"tool_result"}:
return "\n\n".join(rendered)
return "### 👤 User\n\n" + "\n\n".join(rendered)
if isinstance(content, str):
text = content.strip()
if not text:
return None
# Stash first user prompt as session slug source
if "first_user_text" not in ctx:
ctx["first_user_text"] = text
return f"### 👤 User\n\n{text}"
return None
if t == "assistant":
if not isinstance(msg, dict):
return None
content = msg.get("content", [])
if isinstance(content, list):
rendered = [render_content_item(c, ctx) for c in content if isinstance(c, dict)]
rendered = [r for r in rendered if r]
if not rendered:
return None
return "### 🤖 Assistant\n\n" + "\n\n".join(rendered)
return None
if t == "attachment":
att = line.get("attachment", {}) or {}
kind = att.get("type") or "attachment"
return f"*[attachment: {kind}]*"
return None
def parse_session(jsonl_path: Path) -> dict | None:
"""Read a session JSONL and return a dict with messages, cwd, metadata."""
cwd = None
session_id = jsonl_path.stem
is_sidechain = False
parent_session_id = None
first_ts = None
last_ts = None
rendered_blocks: list[str] = []
ctx = {"tool_uses": {}}
user_msg_count = 0
assistant_msg_count = 0
tool_use_count = 0
try:
with open(jsonl_path, "r", encoding="utf-8", errors="replace") as f:
for raw in f:
raw = raw.strip()
if not raw:
continue
try:
obj = json.loads(raw)
except Exception:
continue
t = obj.get("type")
if t in ("queue-operation", "last-prompt", "system"):
continue
if cwd is None and obj.get("cwd"):
cwd = obj["cwd"]
if obj.get("isSidechain"):
is_sidechain = True
if obj.get("sessionId"):
session_id = obj["sessionId"]
ts = obj.get("timestamp")
if ts:
parsed = parse_iso_ts(ts)
if parsed:
if first_ts is None or parsed < first_ts:
first_ts = parsed
if last_ts is None or parsed > last_ts:
last_ts = parsed
if t == "user":
msg = obj.get("message", {})
c = msg.get("content") if isinstance(msg, dict) else None
is_pure_tool_result = isinstance(c, list) and all(
isinstance(x, dict) and x.get("type") == "tool_result" for x in c
)
if not is_pure_tool_result:
user_msg_count += 1
if t == "assistant":
assistant_msg_count += 1
msg = obj.get("message", {})
c = msg.get("content") if isinstance(msg, dict) else None
if isinstance(c, list):
for it in c:
if isinstance(it, dict) and it.get("type") == "tool_use":
tool_use_count += 1
block = render_message(obj, ctx)
if block:
rendered_blocks.append(block)
except Exception as e:
print(f" !! parse error in {jsonl_path}: {e}", file=sys.stderr)
return None
if not rendered_blocks:
return None
# Detect parent session for subagents from path: .../<parent-uuid>/subagents/agent-xxx.jsonl
# Subagent JSONLs share the parent's sessionId, so override with file stem for uniqueness.
agent_file_id = None
if jsonl_path.parent.name == "subagents":
parent_session_id = jsonl_path.parent.parent.name
is_sidechain = True
agent_file_id = jsonl_path.stem # e.g. "agent-ad4b252958c8059b3"
return {
"session_id": session_id,
"agent_file_id": agent_file_id,
"cwd": cwd,
"is_sidechain": is_sidechain,
"parent_session_id": parent_session_id,
"first_ts": first_ts,
"last_ts": last_ts,
"blocks": rendered_blocks,
"ctx": ctx,
"user_msg_count": user_msg_count,
"assistant_msg_count": assistant_msg_count,
"tool_use_count": tool_use_count,
"jsonl_path": str(jsonl_path),
}
def write_session_md(session: dict, vault: Path) -> Path | None:
cwd = session["cwd"]
if not cwd:
cwd = "/Users/unknown"
project = project_name_from_cwd(cwd)
sid8 = short_id(session["session_id"])
first_ts = session["first_ts"] or datetime.now(timezone.utc)
date_str = first_ts.strftime("%Y-%m-%d")
summary_seed = (session["ctx"].get("first_user_text") or "").split("\n", 1)[0]
slug = slugify(summary_seed[:80]) if summary_seed else "session"
if session["is_sidechain"]:
out_dir = vault / "projects" / project / "subagents"
parent_id8 = short_id(session["parent_session_id"] or "")
agent_id = session.get("agent_file_id") or session["session_id"]
# Hash the full agent file stem so named agents (agent-X-Y) don't collide
agent_hash = hashlib.sha1(agent_id.encode()).hexdigest()[:10]
# Try to keep a readable slug from named agents (e.g. "prompt_suggestion")
readable = re.sub(r"^agent-?a?", "", agent_id)
readable_slug = slugify(readable, max_len=30)
filename = f"{parent_id8}__{readable_slug}_{agent_hash}.md"
else:
out_dir = vault / "projects" / project / "sessions"
filename = f"{date_str}_{slug}_{sid8}.md"
out_dir.mkdir(parents=True, exist_ok=True)
out_path = out_dir / filename
# Frontmatter
fm_lines = [
"---",
f"session_id: {session['session_id']}",
f"project: {project}",
f"cwd: {cwd}",
f"date: {date_str}",
f"started: {first_ts.isoformat()}",
]
if session["last_ts"]:
fm_lines.append(f"ended: {session['last_ts'].isoformat()}")
fm_lines.append(f"is_subagent: {str(session['is_sidechain']).lower()}")
if session["parent_session_id"]:
fm_lines.append(f"parent_session: {session['parent_session_id']}")
fm_lines.append(f"user_messages: {session['user_msg_count']}")
fm_lines.append(f"assistant_messages: {session['assistant_msg_count']}")
fm_lines.append(f"tool_uses: {session['tool_use_count']}")
fm_lines.append("summary: ")
fm_lines.append("topics: []")
fm_lines.append("tools: []")
fm_lines.append("entities: []")
fm_lines.append("---")
fm = "\n".join(fm_lines)
# Heading + body
title_seed = summary_seed or "(no user prompt)"
title = title_seed[:SUMMARY_LEN].strip()
if session["is_sidechain"]:
parent_link = ""
if session["parent_session_id"]:
parent_link = f"\n\n*Subagent of parent session `{short_id(session['parent_session_id'])}`*"
header = f"# 🤖 Subagent: {title}{parent_link}"
else:
header = f"# {title}"
body = "\n\n".join(session["blocks"])
return out_path, f"{fm}\n\n{header}\n\n{body}\n"
def collect_existing_session_ids(vault: Path) -> set[str]:
"""Scan existing session markdown frontmatter for session_id values."""
seen: set[str] = set()
for md in (vault / "projects").rglob("*.md"):
if md.name.startswith("_"):
continue
try:
head = md.read_text(encoding="utf-8", errors="replace")[:600]
except Exception:
continue
for line in head.splitlines():
if line.startswith("session_id:"):
seen.add(line.split(":", 1)[1].strip())
break
return seen
def main():
incremental = "--incremental" in sys.argv
if not SRC.exists():
print(f"!! Source missing: {SRC}", file=sys.stderr)
sys.exit(1)
VAULT.mkdir(parents=True, exist_ok=True)
existing_ids: set[str] = set()
if incremental:
existing_ids = collect_existing_session_ids(VAULT)
print(f"Incremental mode: {len(existing_ids)} sessions already in vault.")
# Walk all JSONL files
jsonl_paths: list[Path] = []
for p in SRC.rglob("*.jsonl"):
# Skip empty files
try:
if p.stat().st_size == 0:
continue
except OSError:
continue
jsonl_paths.append(p)
print(f"Found {len(jsonl_paths)} JSONL files. Parsing…")
parsed: list[dict] = []
skipped_empty = 0
seen_session_ids: set[str] = set()
for i, p in enumerate(jsonl_paths, 1):
if i % 25 == 0:
print(f" [{i}/{len(jsonl_paths)}]")
s = parse_session(p)
if s is None:
skipped_empty += 1
continue
# Dedupe main sessions by session_id (same UUID may exist in two cwd folders).
# Subagents share the parent's sessionId, so don't dedupe them and don't pollute the set.
if not s["is_sidechain"]:
if s["session_id"] in seen_session_ids:
skipped_empty += 1
continue
seen_session_ids.add(s["session_id"])
# Incremental: skip sessions already in the vault (preserves backlinks)
if incremental and s["session_id"] in existing_ids and not s["is_sidechain"]:
skipped_empty += 1
continue
parsed.append(s)
print(f"Parsed {len(parsed)} non-empty sessions ({skipped_empty} skipped).")
# Write session files
project_sessions: dict[str, list[dict]] = defaultdict(list)
project_subagents: dict[str, list[dict]] = defaultdict(list)
for s in parsed:
project = project_name_from_cwd(s["cwd"] or "")
result = write_session_md(s, VAULT)
if result is None:
continue
out_path, contents = result
out_path.write_text(contents, encoding="utf-8")
if s["is_sidechain"]:
project_subagents[project].append({"session": s, "path": out_path})
else:
project_sessions[project].append({"session": s, "path": out_path})
print(f"Wrote {sum(len(v) for v in project_sessions.values())} main sessions, "
f"{sum(len(v) for v in project_subagents.values())} subagents across "
f"{len(set(list(project_sessions) + list(project_subagents)))} projects.")
# Project rollups
all_projects = sorted(set(list(project_sessions) + list(project_subagents)))
for project in all_projects:
sessions = project_sessions.get(project, [])
subagents = project_subagents.get(project, [])
sessions_sorted = sorted(
sessions,
key=lambda r: r["session"]["first_ts"] or datetime.min.replace(tzinfo=timezone.utc),
reverse=True,
)
cwds = sorted({r["session"]["cwd"] for r in sessions if r["session"]["cwd"]})
first_seen = min(
(r["session"]["first_ts"] for r in sessions if r["session"]["first_ts"]),
default=None,
)
last_seen = max(
(r["session"]["last_ts"] for r in sessions if r["session"]["last_ts"]),
default=None,
)
lines = ["---", f"project: {project}", "type: project-rollup"]
if cwds:
lines.append(f"cwds: {len(cwds)}")
lines.append(f"sessions: {len(sessions)}")
lines.append(f"subagents: {len(subagents)}")
if first_seen:
lines.append(f"first_seen: {first_seen.date()}")
if last_seen:
lines.append(f"last_seen: {last_seen.date()}")
lines.append("category: ")
lines.append("---")
lines.append("")
lines.append(f"# {project}")
lines.append("")
if cwds:
lines.append("**Working directories:**")
for c in cwds:
lines.append(f"- `{c}`")
lines.append("")
lines.append(f"**{len(sessions)} sessions** · **{len(subagents)} subagents**")
if first_seen and last_seen:
lines.append(f" · {first_seen.date()} → {last_seen.date()}")
lines.append("")
lines.append("## Sessions")
lines.append("")
lines.append("| Date | Title | Msgs | Tools |")
lines.append("|------|-------|------|-------|")
for r in sessions_sorted:
s = r["session"]
stem = r["path"].stem
d = (s["first_ts"] or datetime.now(timezone.utc)).date()
seed = (s["ctx"].get("first_user_text") or "(no prompt)").split("\n", 1)[0]
title = seed[:USER_MSG_PREVIEW].replace("|", "\\|")
link = f"[[projects/{project}/sessions/{stem}|{title}]]"
msgs = s["user_msg_count"] + s["assistant_msg_count"]
tools = s["tool_use_count"]
lines.append(f"| {d} | {link} | {msgs} | {tools} |")
lines.append("")
if subagents:
lines.append("## Subagent runs")
lines.append("")
for r in sorted(subagents, key=lambda r: r["session"]["first_ts"] or datetime.min.replace(tzinfo=timezone.utc), reverse=True):
s = r["session"]
stem = r["path"].stem
seed = (s["ctx"].get("first_user_text") or "(no prompt)").split("\n", 1)[0]
title = seed[:USER_MSG_PREVIEW].replace("|", "\\|")
lines.append(f"- [[projects/{project}/subagents/{stem}|{title}]]")
lines.append("")
rollup_path = VAULT / "projects" / project / f"{project}.md"
rollup_path.parent.mkdir(parents=True, exist_ok=True)
rollup_path.write_text("\n".join(lines), encoding="utf-8")
# Top-level README.md
readme = ["# Claude Code Brain", "",
f"Generated {datetime.now().strftime('%Y-%m-%d %H:%M')}",
"",
f"**{len(parsed)} sessions** across **{len(all_projects)} projects** "
f"({sum(len(v) for v in project_subagents.values())} subagent runs).",
"",
"## Projects",
""]
# Sort projects by recency of last session
project_recency = {}
for p in all_projects:
sessions = project_sessions.get(p, [])
latest = max(
(s["session"]["last_ts"] or s["session"]["first_ts"] for s in sessions if s["session"]["first_ts"]),
default=None,
)
project_recency[p] = latest or datetime.min.replace(tzinfo=timezone.utc)
sorted_projects = sorted(all_projects, key=lambda p: project_recency[p], reverse=True)
readme.append("| Project | Sessions | Subagents | Last activity |")
readme.append("|---------|---------:|----------:|---------------|")
for p in sorted_projects:
ses = len(project_sessions.get(p, []))
sub = len(project_subagents.get(p, []))
last = project_recency[p]
last_s = last.date().isoformat() if last and last.year > 1900 else "—"
readme.append(f"| [[projects/{p}/{p}\\|{p}]] | {ses} | {sub} | {last_s} |")
readme.append("")
readme.append("## Hubs")
readme.append("")
readme.append("- [[TOP-TOPICS]] — leaderboard (generated after backlink pass)")
readme.append("- `topics/` — topic hub pages (generated after backlink pass)")
readme.append("- `tools/` — framework/service hub pages")
readme.append("- `entities/` — products, companies, people")
(VAULT / "README.md").write_text("\n".join(readme) + "\n", encoding="utf-8")
print(f"\n✓ Wrote vault at {VAULT}")
print(f" {len(all_projects)} projects, {len(parsed)} sessions")
if __name__ == "__main__":
main()
3. Split work into 10 balanced agent slices
The next step spawns 10 parallel sub-agents that each need a disjoint list of files. This script does the LPT bin-packing.
$ python3 ~/ClaudeCodeBrain/_build/scripts/split_assignments.pyscripts/split_assignments.py
#!/usr/bin/env python3
"""Split session markdown files into 10 balanced slices for parallel backlink agents."""
from __future__ import annotations
import json
import os
from pathlib import Path
VAULT = Path.home() / "ClaudeCodeBrain"
N_AGENTS = 10
ASSIGN_DIR = VAULT / "_build" / "assignments"
def main():
ASSIGN_DIR.mkdir(parents=True, exist_ok=True)
# Collect all session and subagent files with their byte size
files: list[tuple[Path, int]] = []
for proj_dir in sorted((VAULT / "projects").iterdir()):
if not proj_dir.is_dir():
continue
for sub in ("sessions", "subagents"):
d = proj_dir / sub
if d.exists():
for f in sorted(d.iterdir()):
if f.suffix == ".md" and not f.name.startswith("_"):
files.append((f, f.stat().st_size))
# Greedy LPT (longest-processing-time) bin packing by size
# Sort largest first, put each into the currently smallest bin
files.sort(key=lambda x: -x[1])
bins: list[dict] = [{"files": [], "total": 0} for _ in range(N_AGENTS)]
for f, sz in files:
target = min(bins, key=lambda b: b["total"])
target["files"].append(str(f))
target["total"] += sz
summary = []
for i, b in enumerate(bins, 1):
out = ASSIGN_DIR / f"agent_{i:02d}.txt"
out.write_text("\n".join(b["files"]) + "\n", encoding="utf-8")
summary.append({
"agent": i,
"files": len(b["files"]),
"bytes": b["total"],
"kb": round(b["total"] / 1024),
})
report = {
"total_files": len(files),
"total_bytes": sum(s["bytes"] for s in summary),
"agents": summary,
}
print(json.dumps(report, indent=2))
(ASSIGN_DIR / "_split_report.json").write_text(json.dumps(report, indent=2), encoding="utf-8")
if __name__ == "__main__":
main()
4. Spawn 10 backlink agents in Claude Code
Open a Claude Code session in your home directory. Paste the prompt below 10 times, changing the NN from 01 through 10 each time. Each agent processes ~10% of your sessions in parallel.
Or — paste this orchestrator prompt and let Claude dispatch all 10 for you:
Spawn 10 parallel backlink agents using the Agent tool. Each agent gets a number 01-10. Brief each one with the instructions at ~/ClaudeCodeBrain/_build/agent_instructions.md and its assignment list at ~/ClaudeCodeBrain/_build/assignments/agent_NN.txt. Set run_in_background: true for all 10. Reports go to ~/ClaudeCodeBrain/_build/reports/agent_NN.json.
The instructions each agent reads:
prompts/backlink_terminal_agent.md
# Backlink Agent Instructions
You are one of 10 parallel agents extracting topics/tools/entities from Claude Code session markdown files. Every agent has its own disjoint slice — DO NOT touch files outside your assignment list.
## Your inputs
- **Assignment file**: `/Users/michaelwong/ClaudeCodeBrain/_build/assignments/agent_NN.txt` (where NN is your number, e.g. `01`, `02`). One file path per line.
- **Vault root**: `/Users/michaelwong/ClaudeCodeBrain/`
## What to extract per session
For each session file in your assignment list:
1. **topics** — 3 to 8 kebab-case slugs naming what the session was about
(good: `lead-capture`, `vercel-deploy`, `instagram-scraper`, `prompt-design`)
(bad: `code`, `help`, `debug`, `general`, `coding`)
Be specific. If the session is just a one-shot prompt-suggestion subagent or has very little content, return 1-2 topics; don't pad.
2. **tools** — frameworks, libraries, services, or named tooling actually used or discussed
(e.g. `nextjs`, `supabase`, `apify`, `obsidian`, `wordpress`, `react`, `claude-code`, `n8n`, `gh-cli`)
Lowercase, kebab-case. Skip if none.
3. **entities** — products, companies, people, real estate listings, brands by name
(e.g. `team-wong-hawaii`, `michael-wong`, `openai`, `notion`, `nch`, `kailua`)
Skip if none.
4. **summary** — ONE sentence (max 25 words) describing what the session actually accomplished or attempted. Past tense. No filler.
## What to write per session
For each file, **make exactly one Edit call** that replaces the empty frontmatter fields with populated values AND appends the wikilink section. You can do this in a single Edit by replacing a multi-line block.
**Find:**
```
summary:
topics: []
tools: []
entities: []
---
```
**Replace with:**
```
summary: "<your summary>"
topics: [topic-1, topic-2, topic-3]
tools: [tool-1, tool-2]
entities: [entity-1]
---
```
Then make a SECOND Edit (or append using a single edit by including more context) to add at the end of the file:
```
## 🔗 Topics & Entities
- **Topics:** [[topics/topic-1]] · [[topics/topic-2]] · [[topics/topic-3]]
- **Tools:** [[tools/tool-1]] · [[tools/tool-2]]
- **Entities:** [[entities/entity-1]]
```
Skip a line if its list is empty.
The cleanest pattern is to do BOTH edits in one Edit call by replacing the closing frontmatter `---\n` plus the next line of the heading with frontmatter+heading+backlinks. Or do two separate edits. Either works.
## Your report
After processing every file in your assignment, write a JSON report to:
`/Users/michaelwong/ClaudeCodeBrain/_build/reports/agent_NN.json`
Shape:
```json
{
"agent": NN,
"files_processed": <int>,
"files_failed": <int>,
"errors": ["path: reason", ...],
"sessions": [
{
"file": "/absolute/path.md",
"session_id": "<uuid>",
"project": "<kebab-case>",
"topics": ["topic-1", "topic-2"],
"tools": ["nextjs"],
"entities": ["openai"],
"summary": "..."
},
...
]
}
```
`session_id` and `project` come from the YAML frontmatter — don't guess.
## Rules
- **Read-only on `topics/`, `tools/`, `entities/` directories** — those hub pages are written by the consolidation pass, not by you. Editing them = race condition.
- **Don't read or edit files outside your assignment list.**
- **Don't use the Bash tool to grep across the vault** — process only your assignment files.
- If a file fails to parse or edit, log the error and continue. Don't abort.
- Keep your responses minimal. The work is the file edits and the final JSON report.
- DO NOT run any tools in `_build/scripts/` or modify any other agent's reports.
- Process files in batches of ~5 using parallel tool calls (multiple Read calls in one message, then multiple Edit calls in one message) for speed.
## When to skip
- If a session is trivially short (<5 lines of body content) AND has no extractable signal, write minimal frontmatter with `topics: [unclassified]` and a one-line summary, but still add the wikilink section pointing to `[[topics/unclassified]]`.
- If a file has already been processed (frontmatter `topics:` is not `[]`), skip it but still include it in your report with whatever values are in the frontmatter.
When you're done, write the JSON report and respond with: `Done. {N} files processed, {E} errors.`
Wait for all 10 completion notifications. Each agent processes ~25 files in 5–7 minutes; the wall-clock total is whichever is slowest.
5. Consolidate the reports into topic / tool / entity hubs
Reads all 10 JSON reports, builds inverted indexes, and writes one markdown page per topic/tool/entity into topics/, tools/, entities/. Also writes the TOP-TOPICS.md leaderboard.
$ python3 ~/ClaudeCodeBrain/_build/scripts/consolidate.pyscripts/consolidate.py
#!/usr/bin/env python3
"""Consolidate the 10 backlink agent reports into hub pages.
Reads:
_build/reports/agent_*.json
Writes (atomically — only this script touches these dirs):
topics/<slug>.md — one per topic, with sessions grouped by project
tools/<slug>.md — one per tool/framework
entities/<slug>.md — one per entity
TOP-TOPICS.md — leaderboard sorted by frequency
"""
from __future__ import annotations
import argparse
import json
import re
from collections import defaultdict
from pathlib import Path
VAULT = Path.home() / "ClaudeCodeBrain"
REPORTS_DIR = VAULT / "_build" / "reports"
WEB_REPORTS_DIR = VAULT / "_build" / "web_reports"
FM_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL)
def slug_clean(s: str) -> str:
s = (s or "").strip().lower()
s = re.sub(r"[^a-z0-9\s-]", "", s)
s = re.sub(r"\s+", "-", s)
s = re.sub(r"-+", "-", s).strip("-")
return s
def parse_frontmatter(text: str) -> dict:
m = FM_RE.match(text)
if not m:
return {}
fm: dict = {}
for line in m.group(1).splitlines():
if ":" not in line:
continue
key, _, value = line.partition(":")
fm[key.strip()] = value.strip()
return fm
def parse_list(value: str) -> list[str]:
value = (value or "").strip()
if not value or value == "[]":
return []
if value.startswith("[") and value.endswith("]"):
inner = value[1:-1].strip()
if not inner:
return []
return [
item.strip().strip('"').strip("'")
for item in inner.split(",")
if item.strip().strip('"').strip("'")
]
return [value.strip().strip('"').strip("'")]
def sessions_from_frontmatter(existing_files: set[str]) -> list[dict]:
"""Load session metadata directly from markdown for files missing reports.
Walks both projects/ (terminal sessions) and web-archive/conversations/.
"""
sessions: list[dict] = []
candidates: list[Path] = []
candidates.extend(sorted((VAULT / "projects").glob("*/*/*.md")))
web_dir = VAULT / "web-archive" / "conversations"
if web_dir.exists():
candidates.extend(sorted(web_dir.iterdir()))
for md in candidates:
if md.suffix != ".md" or md.name.startswith("_"):
continue
# Filter projects/ to sessions/subagents only
if "/projects/" in str(md) and md.parent.name not in ("sessions", "subagents"):
continue
md_s = str(md)
if md_s in existing_files:
continue
try:
text = md.read_text(encoding="utf-8", errors="replace")
except Exception:
continue
fm = parse_frontmatter(text)
if not fm:
continue
topics = parse_list(fm.get("topics", ""))
tools = parse_list(fm.get("tools", ""))
entities = parse_list(fm.get("entities", ""))
if not (topics or tools or entities):
continue
# Determine project: terminal session uses parent dir; web uses "web-archive"
if "/web-archive/" in md_s:
project = "web-archive"
sid = fm.get("conversation_id", fm.get("session_id", ""))
else:
project = fm.get("project") or md.parent.parent.name
sid = fm.get("session_id", "")
sessions.append({
"file": md_s,
"session_id": sid,
"project": project,
"summary": fm.get("summary", "").strip('"').strip("'"),
"topics": topics,
"tools": tools,
"entities": entities,
})
return sessions
def session_link(session: dict) -> str:
"""Build a wikilink to the session file using project + filename."""
file_path = session.get("file", "")
if not file_path:
return f"`{session.get('session_id', 'unknown')}`"
p = Path(file_path)
stem = p.stem
title = session.get("summary") or stem
title = title.replace("|", "\\|").replace("[", "(").replace("]", ")")
if len(title) > 100:
title = title[:97] + "…"
# Web archive: linked under web-archive/conversations/
if "/web-archive/" in str(p):
return f"[[web-archive/conversations/{stem}|{title}]]"
project = session.get("project") or p.parent.parent.name
parent_kind = p.parent.name # "sessions" or "subagents"
return f"[[projects/{project}/{parent_kind}/{stem}|{title}]]"
def write_hub_page(kind: str, slug: str, sessions: list[dict], outdir: Path) -> None:
outdir.mkdir(parents=True, exist_ok=True)
by_project: dict[str, list[dict]] = defaultdict(list)
for s in sessions:
by_project[s.get("project", "unknown")].append(s)
title = slug.replace("-", " ").title()
lines = [
"---",
f"type: {kind}-hub",
f"slug: {slug}",
f"sessions: {len(sessions)}",
f"projects: {len(by_project)}",
"---",
"",
f"# {title}",
"",
f"**{len(sessions)} sessions** across **{len(by_project)} projects**.",
"",
]
for project in sorted(by_project, key=lambda p: -len(by_project[p])):
proj_sessions = by_project[project]
if project == "web-archive":
heading = f"## [[web-archive/README|web archive]] · {len(proj_sessions)}"
else:
heading = f"## [[projects/{project}/{project}|{project}]] · {len(proj_sessions)}"
lines.append(heading)
lines.append("")
for s in proj_sessions:
lines.append(f"- {session_link(s)}")
lines.append("")
out_path = outdir / f"{slug}.md"
out_path.write_text("\n".join(lines), encoding="utf-8")
def main():
parser = argparse.ArgumentParser(description="Build topic/tool/entity hub pages from agent reports.")
parser.add_argument(
"--min-sessions", type=int, default=1,
help="Only generate hub pages for topics/tools/entities that appear in at least "
"this many sessions. Default 1 (everything gets a page). Bump to 3+ if your "
"vault is large and the graph is overwhelmed by long-tail topics.",
)
args = parser.parse_args()
min_sessions = max(1, args.min_sessions)
if not REPORTS_DIR.exists():
raise SystemExit(f"!! reports dir missing: {REPORTS_DIR}")
# Aggregate all session entries from terminal AND web reports
all_sessions: list[dict] = []
reports_seen = 0
existing_files: set[str] = set()
report_dirs = [REPORTS_DIR]
if WEB_REPORTS_DIR.exists():
report_dirs.append(WEB_REPORTS_DIR)
for rd in report_dirs:
for report_path in sorted(rd.glob("agent_*.json")):
try:
data = json.loads(report_path.read_text(encoding="utf-8"))
except Exception as e:
print(f"!! could not parse {report_path}: {e}")
continue
reports_seen += 1
for s in data.get("sessions", []):
all_sessions.append(s)
if s.get("file"):
existing_files.add(str(s["file"]))
fallback_sessions = sessions_from_frontmatter(existing_files)
all_sessions.extend(fallback_sessions)
print(
f"Loaded {reports_seen} reports / {len(all_sessions)} session entries "
f"({len(fallback_sessions)} from markdown frontmatter)."
)
# Build inverted indexes
topic_index: dict[str, list[dict]] = defaultdict(list)
tool_index: dict[str, list[dict]] = defaultdict(list)
entity_index: dict[str, list[dict]] = defaultdict(list)
for s in all_sessions:
for t in s.get("topics", []) or []:
slug = slug_clean(t)
if slug:
topic_index[slug].append(s)
for t in s.get("tools", []) or []:
slug = slug_clean(t)
if slug:
tool_index[slug].append(s)
for e in s.get("entities", []) or []:
slug = slug_clean(e)
if slug:
entity_index[slug].append(s)
print(f" Unique topics: {len(topic_index)}")
print(f" Unique tools: {len(tool_index)}")
print(f" Unique entities: {len(entity_index)}")
# Wipe + regenerate hub dirs (atomic-ish — only this script writes here)
for sub in ("topics", "tools", "entities"):
d = VAULT / sub
if d.exists():
for old in d.glob("*.md"):
old.unlink()
written = {"topic": 0, "tool": 0, "entity": 0}
skipped = {"topic": 0, "tool": 0, "entity": 0}
for slug, sessions in topic_index.items():
if len(sessions) < min_sessions:
skipped["topic"] += 1
continue
write_hub_page("topic", slug, sessions, VAULT / "topics")
written["topic"] += 1
for slug, sessions in tool_index.items():
if len(sessions) < min_sessions:
skipped["tool"] += 1
continue
write_hub_page("tool", slug, sessions, VAULT / "tools")
written["tool"] += 1
for slug, sessions in entity_index.items():
if len(sessions) < min_sessions:
skipped["entity"] += 1
continue
write_hub_page("entity", slug, sessions, VAULT / "entities")
written["entity"] += 1
if min_sessions > 1:
print(f" Hub pages written (≥{min_sessions} sessions): "
f"{written['topic']} topics, {written['tool']} tools, {written['entity']} entities")
print(f" Long-tail dropped: {skipped['topic']} topics, {skipped['tool']} tools, {skipped['entity']} entities")
else:
print(f" Hub pages written: "
f"{written['topic']} topics, {written['tool']} tools, {written['entity']} entities")
# TOP-TOPICS leaderboard (top 50 by session count)
leaderboard = sorted(topic_index.items(), key=lambda kv: -len(kv[1]))
lines = ["# Top Topics", "", "Sorted by session count.", "",
"| Rank | Topic | Sessions |", "|-----:|-------|---------:|"]
for i, (slug, sessions) in enumerate(leaderboard[:50], 1):
lines.append(f"| {i} | [[topics/{slug}\\|{slug}]] | {len(sessions)} |")
lines.append("")
lines.append(f"## Tools (top 30)")
lines.append("")
lines.append("| Rank | Tool | Sessions |")
lines.append("|-----:|------|---------:|")
tool_lb = sorted(tool_index.items(), key=lambda kv: -len(kv[1]))
for i, (slug, sessions) in enumerate(tool_lb[:30], 1):
lines.append(f"| {i} | [[tools/{slug}\\|{slug}]] | {len(sessions)} |")
lines.append("")
lines.append(f"## Entities (top 30)")
lines.append("")
lines.append("| Rank | Entity | Sessions |")
lines.append("|-----:|--------|---------:|")
ent_lb = sorted(entity_index.items(), key=lambda kv: -len(kv[1]))
for i, (slug, sessions) in enumerate(ent_lb[:30], 1):
lines.append(f"| {i} | [[entities/{slug}\\|{slug}]] | {len(sessions)} |")
lines.append("")
(VAULT / "TOP-TOPICS.md").write_text("\n".join(lines), encoding="utf-8")
print(f"\n✓ Wrote {len(topic_index)} topic / {len(tool_index)} tool / "
f"{len(entity_index)} entity hub pages + TOP-TOPICS.md")
if __name__ == "__main__":
main()
--min-sessions 3 to only create hub pages for topics that appear in 3 or more sessions. Long-tail topics stay in session frontmatter for full-text search but don't get their own graph node.6. Refresh the project rollups + README
$ python3 ~/ClaudeCodeBrain/_build/scripts/rebuild_rollups.pyscripts/rebuild_rollups.py
#!/usr/bin/env python3
"""Walk the vault and rebuild every _project.md rollup + the top-level README.md.
Source of truth is the YAML frontmatter inside each session/subagent markdown.
Run after incremental exports to refresh rollups without parsing JSONL.
"""
from __future__ import annotations
import re
from collections import defaultdict
from datetime import datetime, timezone
from pathlib import Path
VAULT = Path.home() / "ClaudeCodeBrain"
PROJECTS = VAULT / "projects"
FM_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL)
def parse_frontmatter(text: str) -> dict:
m = FM_RE.match(text)
if not m:
return {}
fm: dict = {}
for line in m.group(1).splitlines():
if ":" in line:
k, _, v = line.partition(":")
fm[k.strip()] = v.strip()
return fm
def parse_iso(s: str) -> datetime | None:
if not s:
return None
try:
if s.endswith("Z"):
s = s[:-1] + "+00:00"
return datetime.fromisoformat(s)
except Exception:
return None
def first_heading_or_summary(text: str, fm: dict) -> str:
summary = fm.get("summary", "").strip().strip('"').strip("'")
if summary:
return summary[:80]
# Fall back to first H1
for line in text.splitlines():
if line.startswith("# "):
return line[2:].strip()[:80]
return "(no title)"
def main():
if not PROJECTS.exists():
raise SystemExit(f"!! No projects dir at {PROJECTS}")
project_sessions: dict[str, list[dict]] = defaultdict(list)
project_subagents: dict[str, list[dict]] = defaultdict(list)
for proj_dir in sorted(PROJECTS.iterdir()):
if not proj_dir.is_dir():
continue
project = proj_dir.name
for sub, bucket in (("sessions", project_sessions), ("subagents", project_subagents)):
d = proj_dir / sub
if not d.exists():
continue
for f in sorted(d.iterdir()):
if f.suffix != ".md" or f.name.startswith("_"):
continue
try:
text = f.read_text(encoding="utf-8", errors="replace")
except Exception:
continue
fm = parse_frontmatter(text)
bucket[project].append({
"path": f,
"fm": fm,
"title": first_heading_or_summary(text, fm),
"first_ts": parse_iso(fm.get("started", "")),
"last_ts": parse_iso(fm.get("ended", "")),
"cwd": fm.get("cwd", ""),
"user_msgs": int(fm.get("user_messages", 0) or 0),
"asst_msgs": int(fm.get("assistant_messages", 0) or 0),
"tool_uses": int(fm.get("tool_uses", 0) or 0),
})
all_projects = sorted(set(list(project_sessions) + list(project_subagents)))
print(f"Rebuilding rollups for {len(all_projects)} projects…")
for project in all_projects:
sessions = project_sessions.get(project, [])
subagents = project_subagents.get(project, [])
sessions_sorted = sorted(
sessions, key=lambda r: r["first_ts"] or datetime.min.replace(tzinfo=timezone.utc),
reverse=True,
)
cwds = sorted({r["cwd"] for r in sessions if r["cwd"]})
first_seen = min((r["first_ts"] for r in sessions if r["first_ts"]), default=None)
last_seen = max((r["last_ts"] or r["first_ts"] for r in sessions if r["first_ts"]), default=None)
lines = ["---", f"project: {project}", "type: project-rollup"]
if cwds:
lines.append(f"cwds: {len(cwds)}")
lines.append(f"sessions: {len(sessions)}")
lines.append(f"subagents: {len(subagents)}")
if first_seen:
lines.append(f"first_seen: {first_seen.date()}")
if last_seen:
lines.append(f"last_seen: {last_seen.date()}")
lines.append("category: ")
lines.append("---")
lines.append("")
lines.append(f"# {project}")
lines.append("")
if cwds:
lines.append("**Working directories:**")
for c in cwds:
lines.append(f"- `{c}`")
lines.append("")
lines.append(f"**{len(sessions)} sessions** · **{len(subagents)} subagents**")
if first_seen and last_seen:
lines.append(f" · {first_seen.date()} → {last_seen.date()}")
lines.append("")
lines.append("## Sessions")
lines.append("")
lines.append("| Date | Title | Msgs | Tools |")
lines.append("|------|-------|------|-------|")
for r in sessions_sorted:
stem = r["path"].stem
d = (r["first_ts"] or datetime.now(timezone.utc)).date()
title = r["title"].replace("|", "\\|")
link = f"[[projects/{project}/sessions/{stem}|{title}]]"
msgs = r["user_msgs"] + r["asst_msgs"]
lines.append(f"| {d} | {link} | {msgs} | {r['tool_uses']} |")
lines.append("")
if subagents:
lines.append("## Subagent runs")
lines.append("")
for r in sorted(subagents, key=lambda r: r["first_ts"] or datetime.min.replace(tzinfo=timezone.utc), reverse=True):
stem = r["path"].stem
title = r["title"].replace("|", "\\|")
lines.append(f"- [[projects/{project}/subagents/{stem}|{title}]]")
lines.append("")
proj_dir = PROJECTS / project
(proj_dir / f"{project}.md").write_text("\n".join(lines), encoding="utf-8")
# README
readme = ["# Claude Code Brain", "",
f"Generated {datetime.now().strftime('%Y-%m-%d %H:%M')}",
"",
f"**{sum(len(v) for v in project_sessions.values())} sessions** across "
f"**{len(all_projects)} projects** "
f"({sum(len(v) for v in project_subagents.values())} subagent runs).",
"",
"## Projects",
""]
project_recency: dict[str, datetime] = {}
for p in all_projects:
latest = max(
((r["last_ts"] or r["first_ts"]) for r in project_sessions.get(p, []) if r["first_ts"]),
default=None,
)
project_recency[p] = latest or datetime.min.replace(tzinfo=timezone.utc)
sorted_projects = sorted(all_projects, key=lambda p: project_recency[p], reverse=True)
readme.append("| Project | Sessions | Subagents | Last activity |")
readme.append("|---------|---------:|----------:|---------------|")
for p in sorted_projects:
ses = len(project_sessions.get(p, []))
sub = len(project_subagents.get(p, []))
last = project_recency[p]
last_s = last.date().isoformat() if last and last.year > 1900 else "—"
readme.append(f"| [[projects/{p}/{p}\\|{p}]] | {ses} | {sub} | {last_s} |")
readme.append("")
readme.append("## Hubs")
readme.append("")
readme.append("- [[TOP-TOPICS]] — leaderboard")
readme.append("- `topics/` — topic hub pages")
readme.append("- `tools/` — framework/service hub pages")
readme.append("- `entities/` — products, companies, people")
(VAULT / "README.md").write_text("\n".join(readme) + "\n", encoding="utf-8")
print(f"✓ Rollups + README rebuilt for {len(all_projects)} projects")
if __name__ == "__main__":
main()
7. Open it in Obsidian
Launch Obsidian → Open folder as vault → pick ~/ClaudeCodeBrain/. Open README.md and confirm the project index renders. Hit Cmd+G (Mac) or Ctrl+G (Win) for the graph view.
SECTION 03
Install the recall skill — this is the point.
~ 3 minutes
Without this step, the vault is just a searchable archive. With it, Claude Code reads your past sessions before any code work whenever you use a trigger phrase like "let's work on X" or "check my brain for Y". That's the difference between an archive and a brain.
1. Drop the skill into Claude Code
$ mkdir -p ~/.claude/skills/claude-brain $ cp skill/claude-brain/SKILL.md ~/.claude/skills/claude-brain/
skill/claude-brain/SKILL.md
---
name: claude-brain
description: Read the user's Claude Code Brain vault before working on a project. Triggers on phrases like "let's work on X", "let's pick up X", "last time on Y", "what did we do with Z", "check my brain for X", "what have I built with X", "remember X". The vault at ~/ClaudeCodeBrain/ contains every Claude Code session backlinked by topic, tool, and entity. This skill loads relevant past context BEFORE you touch any code so you don't re-prime, re-explain, or repeat past mistakes.
---
# Claude Brain — recall
The user has a standalone Obsidian vault at `~/ClaudeCodeBrain/` containing every Claude Code session they've ever had, organized by project, backlinked by topic, tool, and entity. This skill is the recall side: it loads relevant past context BEFORE you start coding.
## When to fire
Trigger phrases (any rough match):
- "let's work on <X>" / "let's pick up <X>" / "let's keep going on <X>"
- "last time on <X>" / "remind me where we left off on <X>"
- "check my brain for <X>" / "what have I done with <X>"
- "who is <X>?" / "what is <X>?" — when X looks like a project, person, or tool
- "what have I built with <tool>" / "any past work on <topic>"
Do NOT fire on:
- General coding questions ("how does Python decorator syntax work")
- Brand-new tasks with no historical context cue
- Requests where the user explicitly says "fresh start" or "ignore history"
## What to do (in order)
1. **Read** `~/ClaudeCodeBrain/README.md` — the project index with last-activity dates
2. **Identify** the relevant project, topic, tool, or entity from the user's phrase
3. **Open the right hub:**
- Project name match → `~/ClaudeCodeBrain/projects/<slug>/<slug>.md` (rollup with session table)
- Tool/framework name → `~/ClaudeCodeBrain/tools/<slug>.md`
- Topic name → `~/ClaudeCodeBrain/topics/<slug>.md`
- Person/product name → `~/ClaudeCodeBrain/entities/<slug>.md`
4. **Skim** the 2–5 most recent linked sessions (top of the rollup table or hub page). Read their YAML frontmatter and any visible summary first; only open the full body if you need detail.
5. **Surface** to the user a 1-paragraph "here's what I see in your brain" summary BEFORE starting any code: what they built, what was decided, any unfinished business or known issues.
6. **Then** proceed with the task. Don't re-explain things the brain already shows you know.
## Read-only contract
- This skill ONLY reads from `~/ClaudeCodeBrain/`. Never edit any file there during recall.
- New sessions get synced into the vault by the separate `/slay` command, which the user runs at the end of a session.
- If you find an obvious stale fact (e.g. memory says X exists but it doesn't), surface it instead of acting on it.
## Search shortcuts
- For fuzzy lookup across the whole vault, use `grep` over `~/ClaudeCodeBrain/projects/*/sessions/*.md` — the YAML frontmatter has `topics:`, `tools:`, `entities:` fields you can filter on.
- The graph view in Obsidian is for the human; you don't need it. Wikilinks in markdown (`[[topics/foo]]`) are sufficient for navigation.
## Edge cases
- **No matching project:** Tell the user explicitly. Don't make one up. Suggest 2–3 nearest matches from the README index.
- **Project hub exists but is sparse:** Fall back to topic/tool hubs that mention the same name.
- **Multiple matches** (e.g. "vision-studio-unleashed" merged from `*-test`): all sessions are now under the canonical name. Just open that.
- **User asks about a session that's clearly older than the vault** (predates first session date in README): say so plainly.
2. Drop the /slay sync command
$ mkdir -p ~/.claude/commands $ cp commands/slay.md ~/.claude/commands/
commands/slay.md
---
name: slay
description: Sync new Claude Code sessions into the ~/ClaudeCodeBrain/ Obsidian vault. Adds any sessions written since the last run, refreshes project rollups and the top-level README, and leaves existing backlinks intact. Run at the end of a coding session (or whenever you remember).
---
# /slay — sync new sessions into the brain
Pulls in any Claude Code sessions written to `~/.claude/projects/` since your last sync and writes them into the Obsidian vault at `~/ClaudeCodeBrain/`. **Does NOT re-run the 10 backlink agents** — that's expensive. New sessions land with empty `topics: []` until you do a full re-pass.
## What you do
Run these commands in order. Don't skip the rollup rebuild — that's what makes the new sessions show up in the project index.
```bash
# 1. Add any new session markdown files (skips ones already in the vault)
python3 ~/ClaudeCodeBrain/_build/scripts/export.py --incremental
# 2. Refresh _project.md rollups + README.md from disk
python3 ~/ClaudeCodeBrain/_build/scripts/rebuild_rollups.py
# 3. (Optional) refresh hub pages from existing reports
python3 ~/ClaudeCodeBrain/_build/scripts/consolidate.py
```
Tell the user how many sessions were added and where to find the new files.
## When the user wants a full re-pass
If they say "do a full slay" or "rebuild backlinks" or "fresh brain":
1. Run the export script with NO `--incremental` flag (full rebuild — overwrites session bodies but new agent passes are required afterwards).
2. Re-run the 10 parallel backlink agents (see `~/ClaudeCodeBrain/_build/agent_instructions.md`).
3. Re-run consolidate.py.
4. Re-run apply_renames.py if `_build/audit_plan.json` has changed.
Confirm with the user before doing the full re-pass — it's a substantial run, especially if they have a lot of new history.
## Common gotchas
- Don't run `apply_renames.py` automatically — only after the user has reviewed the audit plan and given approval.
- If the user's vault grew significantly (>50 new sessions), suggest the full re-pass so backlinks stay accurate.
- New sessions created during the *current* Claude Code session won't be in `~/.claude/projects/` yet — they're only flushed when the session ends.
## Done when
- New session markdown files exist under `~/ClaudeCodeBrain/projects/<project>/sessions/`.
- `_project.md` rollups for affected projects show the new sessions.
- Top-level `README.md` reflects updated session counts.
3. Test it
Quit any open Claude Code sessions and open a fresh one. Type one of these phrases (substitute your real project name):
let's work on my-project
Claude should pause, read ~/ClaudeCodeBrain/README.md and the matching project rollup, and reply with a summary of what you built before doing anything else. If it just dives into code, the skill didn't fire — double-check the path and the frontmatter in SKILL.md.
SECTION 04
Fold in your Claude.ai web archive.
~ 25 minutes, once the export email arrives
Same parse → backlink → consolidate pattern as section 02, just pointed at the Claude.ai web export. The result lives under web-archive/ in the same vault and gets woven into the same topic/tool/entity hubs.
1. Download & unzip the Claude.ai export
The email gives you a link to a zip. Unzip it anywhere; you'll point the importer at the resulting folder.
2. Convert JSON → markdown
One file per conversation, with YAML frontmatter for date / title / message counts. Monthly index pages are generated automatically so the graph doesn't blow up.
$ python3 ~/ClaudeCodeBrain/_build/scripts/import_claude_web.py /path/to/claude-export-dirscripts/import_claude_web.py
#!/usr/bin/env python3
"""Import Claude.ai web export into the same Obsidian vault under web-archive/.
Usage:
python3 import_claude_web.py /path/to/data-XXXX-batch-0000
Reads conversations.json, memories.json, projects/, and writes one markdown
file per conversation under ~/ClaudeCodeBrain/web-archive/conversations/.
"""
from __future__ import annotations
import json
import re
import sys
from collections import defaultdict
from datetime import datetime, timezone
from pathlib import Path
VAULT = Path.home() / "ClaudeCodeBrain"
ARCHIVE = VAULT / "web-archive"
def slugify(s: str, max_len: int = 60) -> str:
s = (s or "").strip().lower()
s = re.sub(r"[^\w\s-]", "", s)
s = re.sub(r"[\s_]+", "-", s)
s = re.sub(r"-+", "-", s).strip("-")
return s[:max_len] or "conversation"
def short_id(uuid_str: str) -> str:
return (uuid_str or "").replace("-", "")[:8] or "xxxxxxxx"
def parse_iso(s: str | None) -> datetime | None:
if not s:
return None
try:
if s.endswith("Z"):
s = s[:-1] + "+00:00"
return datetime.fromisoformat(s)
except Exception:
return None
def render_message(msg: dict) -> str:
sender = msg.get("sender", "?")
label = "👤 User" if sender == "human" else "🤖 Claude" if sender == "assistant" else f"❔ {sender}"
parts: list[str] = []
# Prefer the structured content list — may have multiple text blocks
content = msg.get("content")
if isinstance(content, list):
for item in content:
if not isinstance(item, dict):
continue
t = item.get("type")
if t == "text":
txt = item.get("text") or ""
if txt.strip():
parts.append(txt)
elif t == "tool_use":
name = item.get("name", "tool")
parts.append(f"*[tool call: {name}]*")
elif t == "tool_result":
tname = item.get("name", "tool")
parts.append(f"*[tool result: {tname}]*")
elif t == "image":
parts.append("*[image]*")
elif t == "thinking":
think = (item.get("thinking") or "").strip()
if think:
parts.append(f"<details><summary>💭 Thinking</summary>\n\n{think}\n\n</details>")
elif isinstance(content, str) and content.strip():
parts.append(content)
# Fallback to flat .text
if not parts:
text = msg.get("text") or ""
if text.strip():
parts.append(text)
# Note attachments / files separately
extras = []
for f in msg.get("files") or []:
name = f.get("file_name") or f.get("name") or "file"
extras.append(f"*[attached file: {name}]*")
for a in msg.get("attachments") or []:
name = a.get("file_name") or a.get("name") or "attachment"
extras.append(f"*[attachment: {name}]*")
body = "\n\n".join(parts).strip() or "*(empty)*"
if extras:
body = body + "\n\n" + "\n".join(extras)
return f"### {label}\n\n{body}"
def write_conversation(conv: dict, outdir: Path) -> Path | None:
msgs = conv.get("chat_messages") or []
if not msgs:
return None
name = conv.get("name") or "(untitled)"
uuid = conv.get("uuid") or ""
created = parse_iso(conv.get("created_at"))
updated = parse_iso(conv.get("updated_at"))
date_str = (created or datetime.now(timezone.utc)).strftime("%Y-%m-%d")
slug = slugify(name, max_len=80)
sid = short_id(uuid)
filename = f"{date_str}_{slug}_{sid}.md"
out_path = outdir / filename
user_msg_count = sum(1 for m in msgs if m.get("sender") == "human")
asst_msg_count = sum(1 for m in msgs if m.get("sender") == "assistant")
fm = [
"---",
f"conversation_id: {uuid}",
f"source: claude-web",
f"date: {date_str}",
]
if created:
fm.append(f"started: {created.isoformat()}")
if updated:
fm.append(f"updated: {updated.isoformat()}")
fm.append(f"user_messages: {user_msg_count}")
fm.append(f"assistant_messages: {asst_msg_count}")
if conv.get("summary"):
# Single-line summary
s = (conv["summary"] or "").replace("\n", " ").replace('"', "'")
fm.append(f'summary: "{s[:200]}"')
fm.append("topics: []")
fm.append("tools: []")
fm.append("entities: []")
fm.append("---")
body = "\n\n".join(render_message(m) for m in msgs)
contents = "\n".join(fm) + f"\n\n# {name}\n\n{body}\n"
out_path.write_text(contents, encoding="utf-8")
return out_path
def main():
if len(sys.argv) < 2:
print("usage: import_claude_web.py /path/to/export-dir", file=sys.stderr)
sys.exit(1)
src = Path(sys.argv[1])
if not src.exists():
print(f"!! source missing: {src}", file=sys.stderr)
sys.exit(1)
convs_path = src / "conversations.json"
if not convs_path.exists():
print(f"!! conversations.json missing in {src}", file=sys.stderr)
sys.exit(1)
ARCHIVE.mkdir(parents=True, exist_ok=True)
convs_dir = ARCHIVE / "conversations"
convs_dir.mkdir(exist_ok=True)
print(f"Loading {convs_path} ({convs_path.stat().st_size // 1024 // 1024} MB)…")
with open(convs_path, "r", encoding="utf-8") as f:
conversations = json.load(f)
print(f" {len(conversations)} conversations")
written = 0
by_year_month: dict[str, list[dict]] = defaultdict(list)
for i, conv in enumerate(conversations, 1):
if i % 100 == 0:
print(f" [{i}/{len(conversations)}]")
out = write_conversation(conv, convs_dir)
if out:
written += 1
ym = (parse_iso(conv.get("created_at")) or datetime.now(timezone.utc)).strftime("%Y-%m")
by_year_month[ym].append({
"uuid": conv.get("uuid"),
"name": conv.get("name") or "(untitled)",
"path": out,
"created": parse_iso(conv.get("created_at")),
"msgs": len(conv.get("chat_messages") or []),
})
print(f" wrote {written} conversation files")
# Per-month index files (avoids a single mega-README that creates
# a 1000-link starburst in Obsidian's graph view).
by_month_dir = ARCHIVE / "by-month"
by_month_dir.mkdir(exist_ok=True)
for old in by_month_dir.glob("*.md"):
old.unlink()
for ym, items in by_year_month.items():
items_sorted = sorted(
items,
key=lambda x: x["created"] or datetime.min.replace(tzinfo=timezone.utc),
reverse=True,
)
lines = [
"---",
"type: web-archive-month",
f"month: {ym}",
f"conversations: {len(items_sorted)}",
"---",
"",
f"# Web archive — {ym}",
"",
f"**{len(items_sorted)} conversations**",
"",
"| Date | Title | Msgs |",
"|------|-------|-----:|",
]
for it in items_sorted:
d = (it["created"] or datetime.now(timezone.utc)).date()
stem = it["path"].stem
title = it["name"].replace("|", "\\|")[:90]
lines.append(f"| {d} | [[web-archive/conversations/{stem}\\|{title}]] | {it['msgs']} |")
(by_month_dir / f"{ym}.md").write_text("\n".join(lines) + "\n", encoding="utf-8")
# Slim top-level README that links to monthly indexes (not to every conversation)
months_sorted = sorted(by_year_month.keys(), reverse=True)
readme = [
"# Claude Web Archive",
"",
f"Imported from `{src.name}` on {datetime.now().strftime('%Y-%m-%d %H:%M')}.",
"",
f"**{written} conversations** spanning "
f"**{min(by_year_month.keys()) if by_year_month else '—'}** → "
f"**{max(by_year_month.keys()) if by_year_month else '—'}**.",
"",
"## By month",
"",
]
for ym in months_sorted:
readme.append(f"- [[web-archive/by-month/{ym}|{ym}]] — {len(by_year_month[ym])} conversations")
readme.append("")
readme.append("## Other")
readme.append("")
readme.append("- [[web-archive/memories|memories]] — Claude memory export, if present")
readme.append("- `projects/` — Claude.ai project knowledge files")
(ARCHIVE / "README.md").write_text("\n".join(readme) + "\n", encoding="utf-8")
# Memories
mem_path = src / "memories.json"
if mem_path.exists():
try:
mem_data = json.loads(mem_path.read_text(encoding="utf-8"))
mem_lines = ["# Claude Memory", "",
"Imported from `memories.json`.",
""]
if isinstance(mem_data, list) and mem_data:
obj = mem_data[0]
cm = obj.get("conversations_memory")
pm = obj.get("project_memories")
if cm:
mem_lines.append("## Conversations memory")
mem_lines.append("")
if isinstance(cm, str):
mem_lines.append(cm)
else:
mem_lines.append("```json")
mem_lines.append(json.dumps(cm, indent=2)[:8000])
mem_lines.append("```")
mem_lines.append("")
if pm:
mem_lines.append("## Project memories")
mem_lines.append("")
mem_lines.append("```json")
mem_lines.append(json.dumps(pm, indent=2)[:8000])
mem_lines.append("```")
mem_lines.append("")
(ARCHIVE / "memories.md").write_text("\n".join(mem_lines), encoding="utf-8")
print(" wrote memories.md")
except Exception as e:
print(f" !! failed to write memories.md: {e}")
# Projects (each is a JSON describing a Claude.ai project)
projects_dir = src / "projects"
if projects_dir.exists():
out_proj = ARCHIVE / "projects"
out_proj.mkdir(exist_ok=True)
for pf in sorted(projects_dir.glob("*.json")):
try:
pdata = json.loads(pf.read_text(encoding="utf-8"))
except Exception:
continue
name = pdata.get("name") or pf.stem
slug = slugify(name)
description = pdata.get("description") or ""
instr = pdata.get("prompt_template") or pdata.get("custom_instructions") or ""
files = pdata.get("docs") or pdata.get("files") or []
lines = [
"---",
f"project_id: {pdata.get('uuid', pf.stem)}",
f"source: claude-web-project",
f"name: \"{(name or '').replace(chr(10), ' ').replace(chr(34), chr(39))[:120]}\"",
"---",
"",
f"# {name}",
"",
]
if description:
lines.extend([f"**Description:** {description}", ""])
if instr:
lines.extend(["## Instructions", "", instr, ""])
if files:
lines.extend(["## Files", ""])
for f in files:
fn = f.get("file_name") or f.get("name") or "(unnamed)"
fc = f.get("content") or ""
lines.append(f"<details><summary>{fn}</summary>\n\n```\n{fc[:8000]}\n```\n\n</details>")
lines.append("")
(out_proj / f"{slug}-{short_id(pdata.get('uuid', ''))}.md").write_text(
"\n".join(lines), encoding="utf-8"
)
print(f" wrote {len(list(out_proj.glob('*.md')))} project files")
print(f"\n✓ Web archive built at {ARCHIVE}")
if __name__ == "__main__":
main()
3. Split + dispatch 10 web backlink agents
$ mkdir -p ~/ClaudeCodeBrain/_build/{web_assignments,web_reports} $ python3 ~/ClaudeCodeBrain/_build/scripts/split_web_assignments.py
scripts/split_web_assignments.py
#!/usr/bin/env python3
"""Split web archive conversation files into 10 balanced slices for parallel backlink agents."""
from __future__ import annotations
import json
from pathlib import Path
VAULT = Path.home() / "ClaudeCodeBrain"
WEB_CONVS = VAULT / "web-archive" / "conversations"
N_AGENTS = 10
ASSIGN_DIR = VAULT / "_build" / "web_assignments"
def main():
ASSIGN_DIR.mkdir(parents=True, exist_ok=True)
files: list[tuple[Path, int]] = []
for f in sorted(WEB_CONVS.iterdir()):
if f.suffix == ".md":
files.append((f, f.stat().st_size))
files.sort(key=lambda x: -x[1])
bins: list[dict] = [{"files": [], "total": 0} for _ in range(N_AGENTS)]
for f, sz in files:
target = min(bins, key=lambda b: b["total"])
target["files"].append(str(f))
target["total"] += sz
summary = []
for i, b in enumerate(bins, 1):
out = ASSIGN_DIR / f"agent_{i:02d}.txt"
out.write_text("\n".join(b["files"]) + "\n", encoding="utf-8")
summary.append({"agent": i, "files": len(b["files"]), "kb": round(b["total"] / 1024)})
report = {"total_files": len(files), "agents": summary}
print(json.dumps(report, indent=2))
(ASSIGN_DIR / "_split_report.json").write_text(json.dumps(report, indent=2), encoding="utf-8")
if __name__ == "__main__":
main()
The web archive has a different frontmatter shape than terminal sessions (no tool_uses, has conversation_id, etc.), so the agents use a separate prompt:
prompts/backlink_web_agent.md
# Web Archive Backlink Agent Instructions
You are one of 10 parallel agents extracting topics/tools/entities from Claude.ai **web archive** conversation files. Each agent has its own disjoint slice — DO NOT touch files outside your assignment list.
## Your inputs
- **Assignment file**: `/Users/michaelwong/ClaudeCodeBrain/_build/web_assignments/agent_NN.txt` (NN = your number, e.g. `01`)
- **Vault root**: `/Users/michaelwong/ClaudeCodeBrain/`
- **Output report**: `/Users/michaelwong/ClaudeCodeBrain/_build/web_reports/agent_NN.json`
## Frontmatter shape (different from terminal sessions!)
Web archive files look like this:
```yaml
---
conversation_id: <uuid>
source: claude-web
date: 2024-03-12
started: 2024-03-12T...
updated: 2024-03-12T...
user_messages: 5
assistant_messages: 5
summary: "..." # optional, may not exist
topics: []
entities: []
---
```
**Important:** there is NO `tools:` line. You need to ADD it during the edit.
## What to extract
For each conversation in your assignment:
1. **topics** — 3 to 8 kebab-case slugs (good: `nano-banana-pro`, `prompt-engineering`, `kailua-real-estate`; bad: `chat`, `help`)
2. **tools** — services / frameworks / named software discussed (e.g. `nano-banana-pro`, `obsidian`, `n8n`, `wordpress`, `skool`, `twilio`). Lowercase, kebab-case.
3. **entities** — named products, companies, people, places (e.g. `team-wong-hawaii`, `michael-wong`, `claude`, `gpt-4`, `kailua`)
4. **summary** — ONE sentence (max 25 words), past tense. Skip if a `summary:` line already exists with content.
Be specific. Short conversations (1–2 message pairs) often only justify 1–2 topics — don't pad.
## The edit
For each file, do **one Edit call** that:
**Find:**
```
topics: []
entities: []
---
```
**Replace with:**
```
topics: [topic-1, topic-2, topic-3]
tools: [tool-1, tool-2]
entities: [entity-1]
---
## 🔗 Topics & Entities
- **Topics:** [[topics/topic-1]] · [[topics/topic-2]] · [[topics/topic-3]]
- **Tools:** [[tools/tool-1]] · [[tools/tool-2]]
- **Entities:** [[entities/entity-1]]
```
Skip the corresponding line if a list is empty (e.g. no tools → omit the **Tools:** line).
If `summary: "..."` already exists in frontmatter, leave it alone — only update topics/tools/entities. If there's no summary line, add one above topics:
```
summary: "<your one-sentence summary>"
topics: [...]
tools: [...]
entities: [...]
---
```
## Reading the file
Read each file with the Read tool — default 2000 lines is plenty. Most web conversations are 1–20 message pairs. For very long ones, the first user message + scanning the first assistant response is usually enough to extract topics.
## Your report
Write JSON to `/Users/michaelwong/ClaudeCodeBrain/_build/web_reports/agent_NN.json`:
```json
{
"agent": NN,
"files_processed": <int>,
"files_failed": <int>,
"errors": [],
"sessions": [
{
"file": "/absolute/path.md",
"session_id": "<conversation_id>",
"project": "web-archive",
"topics": [...],
"tools": [...],
"entities": [...],
"summary": "..."
},
...
]
}
```
The consolidate script merges these reports with the terminal-session reports and groups everything into the same topic/tool/entity hubs. Use `"project": "web-archive"` so they bucket separately within hub pages.
## Rules
- **Read-only on `topics/`, `tools/`, `entities/`** — those get rewritten by the consolidation pass.
- **Process files in parallel batches of 5** — multiple Read calls in one message, then multiple Edit calls in one message.
- **Don't open files outside your assignment.**
- **Skip files where `topics:` is already populated** (not `[]`). Include them in the report with the existing values.
- If you hit a parse error or weird file, log it in `errors` and continue.
When done, write the JSON report and respond with: `Done. {N} files processed, {E} errors.`
Same orchestration as before — open Claude Code, paste an orchestrator prompt that spawns 10 agents using web_agent_instructions.md and the web_assignments/agent_NN.txt files, with reports to web_reports/.
4. Re-consolidate to merge web + terminal into the same hubs
$ python3 ~/ClaudeCodeBrain/_build/scripts/consolidate.pyThat's it. The same topics/n8n-workflow.md page now lists every conversation from both sources that mentioned n8n, grouped by data source.
SECTION 05
Fold in your ChatGPT archive when it lands.
~ 25 minutes, 1–3 days from now
ChatGPT's export takes 1–3 days, which is why you requested it first in section 01. When the email lands, you'll repeat the section 04 pattern with the ChatGPT importer.
1. Download & unzip the ChatGPT export
ChatGPT sends a zip with one or more conversations.json / conversations-XXX.json files plus some auxiliary metadata.
2. Convert JSON → markdown
$ python3 ~/ClaudeCodeBrain/_build/scripts/import_chatgpt.py /path/to/chatgpt-export-dirscripts/import_chatgpt.py
#!/usr/bin/env python3
"""Import a ChatGPT export into the same Obsidian vault under web-archive/.
ChatGPT's export ZIP contains one or more `conversations*.json` files plus
some auxiliary metadata. Schema is different from Claude.ai's export.
Usage:
python3 import_chatgpt.py /path/to/chatgpt-export-dir
Writes:
web-archive/conversations/<date>_<slug>_<id8>.md (one per conversation)
web-archive/by-month/<YYYY-MM>.md (monthly index pages)
web-archive/README.md (top-level archive index)
Each conversation file uses the SAME frontmatter shape as Claude.ai imports
so the consolidation pass and the claude-brain skill don't have to special-case them.
⚠️ This is a working skeleton. ChatGPT's export schema can change between
releases, so when you run this for the first time, inspect a sample
conversations*.json and adjust message-walking code to match.
The key thing ChatGPT does differently:
- Each conversation has a `mapping` dict keyed by message UUID
- Messages form a tree (one user can have multiple regen siblings).
We follow the `current_node` chain by default to get the displayed thread.
- Roles are "user" / "assistant" / "system" / "tool".
- Content lives under `message.content.parts` (list of strings).
"""
from __future__ import annotations
import json
import re
import sys
from collections import defaultdict
from datetime import datetime, timezone
from pathlib import Path
VAULT = Path.home() / "ClaudeCodeBrain"
ARCHIVE = VAULT / "web-archive"
def slugify(s: str, max_len: int = 60) -> str:
s = (s or "").strip().lower()
s = re.sub(r"[^\w\s-]", "", s)
s = re.sub(r"[\s_]+", "-", s)
s = re.sub(r"-+", "-", s).strip("-")
return s[:max_len] or "conversation"
def short_id(uuid_str: str) -> str:
return (uuid_str or "").replace("-", "")[:8] or "xxxxxxxx"
def parse_chatgpt_timestamp(ts) -> datetime | None:
"""ChatGPT exports use Unix epoch floats for `create_time` / `update_time`."""
if ts is None:
return None
try:
return datetime.fromtimestamp(float(ts), tz=timezone.utc)
except (ValueError, TypeError):
return None
def walk_displayed_thread(conv: dict) -> list[dict]:
"""Walk from the conversation's `current_node` up to root, returning the
visible message thread in chronological order.
ChatGPT's tree can have regen siblings; `current_node` is the leaf of the
thread the user last saw, so this gives the "displayed" conversation.
"""
mapping = conv.get("mapping", {}) or {}
current = conv.get("current_node")
if not current or current not in mapping:
# Fall back: any leaf
leaves = [k for k, v in mapping.items() if not v.get("children")]
current = leaves[0] if leaves else None
chain: list[str] = []
node_id = current
seen = set()
while node_id and node_id in mapping and node_id not in seen:
seen.add(node_id)
chain.append(node_id)
node_id = mapping[node_id].get("parent")
chain.reverse()
messages = []
for nid in chain:
node = mapping.get(nid, {})
msg = node.get("message") or {}
if not msg:
continue
author = msg.get("author") or {}
role = author.get("role") or "user"
if role == "system":
continue # skip system noise
content = msg.get("content") or {}
parts = content.get("parts") if isinstance(content, dict) else None
if parts:
text = "\n\n".join(str(p) for p in parts if p)
else:
text = content.get("text", "") if isinstance(content, dict) else ""
if not text.strip():
continue
messages.append({
"role": role,
"text": text.strip(),
"create_time": msg.get("create_time"),
})
return messages
def render_messages(messages: list[dict]) -> str:
blocks: list[str] = []
for m in messages:
role = m["role"]
label = "👤 User" if role == "user" else "🤖 ChatGPT" if role == "assistant" else f"❔ {role}"
blocks.append(f"### {label}\n\n{m['text']}")
return "\n\n".join(blocks)
def write_conversation(conv: dict, outdir: Path) -> tuple[Path, datetime] | None:
messages = walk_displayed_thread(conv)
if not messages:
return None
title = conv.get("title") or "(untitled)"
cid = conv.get("conversation_id") or conv.get("id") or ""
created = parse_chatgpt_timestamp(conv.get("create_time"))
updated = parse_chatgpt_timestamp(conv.get("update_time"))
date_str = (created or datetime.now(timezone.utc)).strftime("%Y-%m-%d")
sid = short_id(cid)
filename = f"{date_str}_{slugify(title, max_len=80)}_{sid}.md"
out_path = outdir / filename
user_msg_count = sum(1 for m in messages if m["role"] == "user")
asst_msg_count = sum(1 for m in messages if m["role"] == "assistant")
fm = [
"---",
f"conversation_id: {cid}",
"source: chatgpt-web",
f"date: {date_str}",
]
if created:
fm.append(f"started: {created.isoformat()}")
if updated:
fm.append(f"updated: {updated.isoformat()}")
fm.append(f"user_messages: {user_msg_count}")
fm.append(f"assistant_messages: {asst_msg_count}")
fm.append("topics: []")
fm.append("tools: []")
fm.append("entities: []")
fm.append("---")
body = render_messages(messages)
out_path.write_text("\n".join(fm) + f"\n\n# {title}\n\n{body}\n", encoding="utf-8")
return out_path, (created or datetime.now(timezone.utc))
def main():
if len(sys.argv) < 2:
print("usage: import_chatgpt.py /path/to/chatgpt-export-dir", file=sys.stderr)
sys.exit(1)
src = Path(sys.argv[1])
if not src.exists():
print(f"!! source missing: {src}", file=sys.stderr)
sys.exit(1)
# ChatGPT exports may name the file conversations.json or conversations-XXX.json
conv_files = list(src.glob("conversations*.json"))
if not conv_files:
print(f"!! no conversations*.json found in {src}", file=sys.stderr)
sys.exit(1)
ARCHIVE.mkdir(parents=True, exist_ok=True)
convs_dir = ARCHIVE / "conversations"
convs_dir.mkdir(exist_ok=True)
conversations: list[dict] = []
for f in conv_files:
print(f"Loading {f.name} ({f.stat().st_size // 1024 // 1024} MB)…")
with open(f, "r", encoding="utf-8") as fh:
data = json.load(fh)
if isinstance(data, list):
conversations.extend(data)
elif isinstance(data, dict):
conversations.append(data)
print(f" {len(conversations)} conversations total")
by_year_month: dict[str, list[dict]] = defaultdict(list)
written = 0
for i, conv in enumerate(conversations, 1):
if i % 100 == 0:
print(f" [{i}/{len(conversations)}]")
result = write_conversation(conv, convs_dir)
if result:
out_path, created = result
written += 1
ym = created.strftime("%Y-%m")
by_year_month[ym].append({
"name": conv.get("title") or "(untitled)",
"path": out_path,
"created": created,
"msgs": len(walk_displayed_thread(conv)),
})
print(f" wrote {written} conversation files")
# Per-month index files
by_month_dir = ARCHIVE / "by-month"
by_month_dir.mkdir(exist_ok=True)
# NOTE: we DON'T wipe by_month/ here — Claude.ai exports may already
# have written months. We update months we touch.
for ym, items in by_year_month.items():
target = by_month_dir / f"{ym}.md"
items_sorted = sorted(
items,
key=lambda x: x["created"] or datetime.min.replace(tzinfo=timezone.utc),
reverse=True,
)
lines = [
"---",
"type: web-archive-month",
f"month: {ym}",
f"conversations: {len(items_sorted)}",
"---",
"",
f"# Web archive — {ym}",
"",
f"**{len(items_sorted)} ChatGPT conversations imported**",
"",
"| Date | Title | Msgs |",
"|------|-------|-----:|",
]
for it in items_sorted:
d = (it["created"] or datetime.now(timezone.utc)).date()
stem = it["path"].stem
title = it["name"].replace("|", "\\|")[:90]
lines.append(f"| {d} | [[web-archive/conversations/{stem}\\|{title}]] | {it['msgs']} |")
# If a Claude.ai month index already exists, append the ChatGPT block
if target.exists():
existing = target.read_text(encoding="utf-8")
target.write_text(existing + "\n\n---\n\n" + "\n".join(lines) + "\n", encoding="utf-8")
else:
target.write_text("\n".join(lines) + "\n", encoding="utf-8")
print(f"\n✓ ChatGPT archive merged into {ARCHIVE}")
if __name__ == "__main__":
main()
conversations.json entry before running on the full set — verify the mapping tree, current_node, and message.content.parts still match what the script expects.3. Same 10-agent backlink + consolidate pattern
Re-run split_web_assignments.py (it picks up new files), dispatch 10 agents with the web backlink prompt, then re-run consolidate.py.
Now your topics/prompt-engineering.md page has rows from your earliest ChatGPT explorations through your most recent Claude Code work. One brain, three data sources.
SECTION 06
Keep it alive: the /slay sync loop.
~ 30 seconds, weekly
Every time you finish a meaningful Claude Code session and want it folded into the brain, run /slay from any Claude Code window:
/slay
It runs an incremental export (skipping sessions that are already in the vault, so your backlinks aren't clobbered) and refreshes the project rollups + README. New sessions land with empty topics: [] until the next time you run a full backlink pass.
When to re-run the 10-agent backlink pass
- You've accumulated 30+ new sessions since last backlink
- You started a new big project and want it properly indexed
- Search starts feeling "off" — old hubs don't reflect recent work
The same flow from section 02: split_assignments.py → 10 agents → consolidate.py → rebuild_rollups.py.
The skill never needs maintenance
It's read-only and just reads whatever's in the vault. As long as you keep the vault current with /slay, the skill stays useful.
SECTION 07
Gotchas & pitfalls.
Read before you hit them
mv Home home silently deletes data. If you ever need to rename a folder case-only, go via a temp name first: Home → __tmp_home → home.
sk-, sk-ant-, AIzaSy, AKIA, and JWT signatures (eyJ...). Real keys often get pasted into chats and end up baked into the markdown.
export.py handles this by hashing the agent filename stem.
import_claude_web.py writes per-month index files automatically.
import_chatgpt.py is a working stub that follows the displayed-thread (current_node) chain, but verify a couple of conversations rendered correctly before processing the full export.
APPENDIX
Scaling tips for big vaults.
If your archive is 500+ conversations or your topic count balloons past a few hundred, here are optional tweaks. Skip unless you actually hit the issue.
Long-tail topic filtering
Re-run consolidate with --min-sessions N to only render hub pages for topics that connect ≥ N sessions. The long-tail topics stay in session frontmatter (so full-text search still finds them) but don't clutter the graph.
$ python3 ~/ClaudeCodeBrain/_build/scripts/consolidate.py --min-sessions 3Messy folder names from worktrees
If your terminal projects have a lot of git worktrees, your projects/ directory may have duplicates like my-project, my-project-worktree-foo. Run an audit agent (read-only) to produce a rename plan, review it, then apply:
$ python3 ~/ClaudeCodeBrain/_build/scripts/apply_renames.pyscripts/apply_renames.py
#!/usr/bin/env python3
"""Apply the audit plan: rename project folders, merge where requested,
and rewrite all wikilinks across the vault.
Reads:
_build/audit_plan.json (produced by audit agent)
Notes:
macOS APFS is case-insensitive. Case-only renames are done via a __tmp_ name first.
"""
from __future__ import annotations
import json
import re
import shutil
import sys
from pathlib import Path
VAULT = Path.home() / "ClaudeCodeBrain"
PROJECTS = VAULT / "projects"
AUDIT_PLAN = VAULT / "_build" / "audit_plan.json"
def safe_rename(src: Path, dst: Path) -> None:
"""Rename src → dst, handling case-only renames safely on APFS."""
if src == dst:
return
if src.name.lower() == dst.name.lower() and src.parent == dst.parent:
# Case-only rename: go via __tmp_ first
tmp = src.parent / f"__tmp_{src.name}"
src.rename(tmp)
tmp.rename(dst)
return
src.rename(dst)
def merge_project(src: Path, dst: Path) -> None:
"""Merge src project into dst, copying sessions/subagents and removing src."""
dst.mkdir(parents=True, exist_ok=True)
for sub in ("sessions", "subagents"):
s = src / sub
d = dst / sub
if not s.exists():
continue
d.mkdir(parents=True, exist_ok=True)
for f in s.iterdir():
target = d / f.name
if target.exists():
# Suffix to avoid collision
i = 2
while target.exists():
target = d / f"{f.stem}_{i}{f.suffix}"
i += 1
shutil.move(str(f), str(target))
# Drop the src directory (its _project.md will get regenerated)
shutil.rmtree(src, ignore_errors=True)
def rewrite_wikilinks(vault: Path, name_map: dict[str, str]) -> int:
"""Walk every .md and rewrite [[projects/<old>/...]] references."""
if not name_map:
return 0
# Build a single regex that matches any old name
keys = sorted(name_map.keys(), key=len, reverse=True)
pattern = re.compile(
r"(\[\[projects/)("
+ "|".join(re.escape(k) for k in keys)
+ r")(/)"
)
# Also rewrite the YAML `project:` field
project_field = re.compile(r"(?m)^project:\s*(\S+)\s*$")
cwd_pattern = None # we leave cwd alone
edits = 0
for md in vault.rglob("*.md"):
if md.is_dir() or "/_build/" in str(md):
continue
try:
text = md.read_text(encoding="utf-8")
except Exception:
continue
new_text = pattern.sub(lambda m: m.group(1) + name_map[m.group(2)] + m.group(3), text)
def _project_sub(m):
old = m.group(1).strip()
if old in name_map:
return f"project: {name_map[old]}"
return m.group(0)
new_text = project_field.sub(_project_sub, new_text)
if new_text != text:
md.write_text(new_text, encoding="utf-8")
edits += 1
return edits
def main():
if not AUDIT_PLAN.exists():
print(f"!! audit plan missing: {AUDIT_PLAN}", file=sys.stderr)
sys.exit(1)
plan = json.loads(AUDIT_PLAN.read_text(encoding="utf-8"))
projects_plan = plan.get("projects", [])
# Build name map (old → new) and merge plan (old → merge_target)
name_map: dict[str, str] = {}
merges: list[tuple[str, str]] = []
for entry in projects_plan:
old = entry["current_name"]
new = entry.get("canonical_name") or old
merge_into = entry.get("merge_into")
if merge_into:
merges.append((old, merge_into))
elif old != new:
name_map[old] = new
print(f"Plan: {len(name_map)} renames, {len(merges)} merges")
# 1) Apply merges first (move files, then delete src dirs)
for old, target in merges:
src = PROJECTS / old
dst = PROJECTS / target
if not src.exists():
print(f" skip merge {old} → {target}: source missing")
continue
if not dst.exists():
print(f" rename instead of merge: {old} → {target} (target missing)")
safe_rename(src, dst)
continue
print(f" merging {old} → {target}")
merge_project(src, dst)
# After merge, links to the old name should point to target
name_map[old] = target
# 2) Apply renames
for old, new in list(name_map.items()):
src = PROJECTS / old
dst = PROJECTS / new
if not src.exists():
# likely already merged
continue
if dst.exists() and src != dst:
# collision — merge instead
print(f" collision on rename: {old} → {new}, merging")
merge_project(src, dst)
continue
if old != new:
print(f" renaming {old} → {new}")
safe_rename(src, dst)
# 3) Rewrite wikilinks across the vault
edits = rewrite_wikilinks(VAULT, name_map)
print(f" rewrote wikilinks in {edits} files")
# 4) Regenerate _project.md rollups for affected projects
# Simplest approach: re-run export to regenerate everything would clobber backlinks.
# Instead we just leave the rollups; consolidate.py will refresh hub pages from reports.
# Manually update the README.md project list:
readme_path = VAULT / "README.md"
if readme_path.exists():
text = readme_path.read_text(encoding="utf-8")
for old, new in name_map.items():
text = text.replace(f"projects/{old}/_project", f"projects/{new}/_project")
text = text.replace(f"|{old}]]", f"|{new}]]")
readme_path.write_text(text, encoding="utf-8")
print(f"\n✓ Renames + merges applied. Final project count: "
f"{len([p for p in PROJECTS.iterdir() if p.is_dir()])}")
if __name__ == "__main__":
main()
Customizing the graph view
Obsidian's graph view supports color groups (search-query → color) and CSS snippets. See Obsidian's graph docs. A reasonable starting config: dim everything under web-archive/conversations, color your active projects bright, hide README + monthly indexes from the visible graph via the filter -file:README -path:web-archive/by-month.
FINAL
You're done when…
- ChatGPT export requested (1–3 day timer started)
- Claude.ai export downloaded (~5 min)
~/ClaudeCodeBrain/opens in Obsidian and the project index renders- 10 backlink agent reports under
_build/reports/ topics/,tools/,entities/have hub pagesTOP-TOPICS.mdshows your real workstreams in the leaderboard~/.claude/skills/claude-brain/SKILL.mdexists~/.claude/commands/slay.mdexists- Fresh Claude Code session — "let's work on X" → Claude reads the vault before responding
- Claude.ai web archive folded in (sessions 04 done)
- ChatGPT archive folded in (section 05 done, after the email arrives)
- API keys grepped + redacted before sharing the vault