Build Your AI Brain — Full Field Guide

SECTION 01

Setup, plus kick off your exports in the background.

~ 5 minutes active, then waiting

Two things start the timer: ChatGPT export takes 1–3 days, Claude.ai export takes 5 minutes. Request both right now so you don't sit around later.

What you need

Claude Code — the CLI (install docs)
Obsidian — the viewer for the vault (obsidian.md, free)
Python 3.10+ — for the parser scripts
~500 MB free disk for a typical 1–2 year history

Request your ChatGPT export FIRST (it's the slow one)

Go to chatgpt.com → Settings → Data Controls → Export data
Click Export. You'll get an email in 1–3 days with a zip.
Move on. We'll handle it in section 05 when it arrives.

Request your Claude.ai export (5 minute timer)

Go to claude.ai → profile icon → Settings → Privacy → Export data
Click Export. An email lands in about 5 minutes with a download.

Clone this repo (or download the scripts)

$ git clone https://github.com/M1w234/claude-brain-builder ~/claude-brain-builder
$ cd ~/claude-brain-builder

Every script and prompt referenced in the next sections is embedded inline below each step as a collapsible <details> block with a Copy button. You can either work from the cloned repo or copy each script from this page.

Why three data sources? Your Claude Code (terminal) work, your Claude.ai web chats, and your ChatGPT chats each tell a different part of the story. Unified, they become one searchable archive that Claude can consult. Keeping them separate would mean re-priming context every time you switched between them.

SECTION 02

Build the engine from your terminal Claude Code sessions.

~ 20 minutes

Claude Code stores every conversation locally as JSONL under ~/.claude/projects/. This section parses that into markdown, fans out 10 parallel sub-agents to backlink it by topic / tool / entity, and consolidates the result into hub pages.

1. Scaffold the vault

$ mkdir -p ~/ClaudeCodeBrain/{projects,topics,tools,entities,_build/scripts,_build/reports,_build/assignments}
$ cp scripts/*.py ~/ClaudeCodeBrain/_build/scripts/

2. Parse your terminal JSONL → markdown

Reads every .jsonl file under ~/.claude/projects/, dedupes by cwd (worktrees collapse to their root project), wraps each tool call in a collapsible <details> block, caps tool-result content at ~4000 chars to keep files readable, and writes one markdown file per session.

$ python3 ~/ClaudeCodeBrain/_build/scripts/export.py

scripts/export.py · 544 lines

#!/usr/bin/env python3
"""Export ~/.claude/projects/ JSONL into an Obsidian vault.

Layout produced:
  projects/<project>/sessions/<date>_<slug>_<id8>.md
  projects/<project>/subagents/<parent-id8>__<agent-id8>.md
  projects/<project>/_project.md
  README.md
"""
from __future__ import annotations

import hashlib
import json
import re
import sys
from collections import defaultdict
from datetime import datetime, timezone
from pathlib import Path

HOME = Path.home()
SRC = HOME / ".claude" / "projects"
VAULT = HOME / "ClaudeCodeBrain"
TOOL_RESULT_CAP = 4000
USER_MSG_PREVIEW = 80
SUMMARY_LEN = 160


def slugify(s: str, max_len: int = 60) -> str:
    s = (s or "").strip().lower()
    s = re.sub(r"[^\w\s-]", "", s)
    s = re.sub(r"[\s_]+", "-", s)
    s = re.sub(r"-+", "-", s).strip("-")
    return s[:max_len] or "session"


def project_name_from_cwd(cwd: str) -> str:
    """Derive a kebab-case project slug from a cwd path.

    Handles worktrees by stripping everything from /.claude/worktrees/...
    """
    if not cwd:
        return "unknown"
    # Strip worktree path
    if "/.claude/worktrees/" in cwd:
        cwd = cwd.split("/.claude/worktrees/")[0]
    # Special case: home directory itself
    home = str(HOME)
    if cwd == home:
        return "home"
    # Take last segment
    name = Path(cwd).name
    return slugify(name) or "unknown"


def short_id(uuid_str: str) -> str:
    if not uuid_str:
        return "xxxxxxxx"
    return uuid_str.replace("-", "")[:8]


def parse_iso_ts(ts: str) -> datetime | None:
    if not ts:
        return None
    try:
        if ts.endswith("Z"):
            ts = ts[:-1] + "+00:00"
        return datetime.fromisoformat(ts)
    except Exception:
        return None


def render_content_item(item: dict, ctx: dict) -> str:
    """Render a single content item from a message into markdown."""
    t = item.get("type", "")
    if t == "text":
        return (item.get("text") or "").strip()
    if t == "thinking":
        thinking = (item.get("thinking") or "").strip()
        if not thinking:
            return ""
        return f"<details><summary>💭 Thinking</summary>\n\n{thinking}\n\n</details>"
    if t == "tool_use":
        name = item.get("name", "Tool")
        input_obj = item.get("input", {})
        ctx["tool_uses"][item.get("id", "")] = name
        try:
            input_str = json.dumps(input_obj, indent=2, ensure_ascii=False)
        except Exception:
            input_str = str(input_obj)
        # Pull a one-line summary from common fields
        summary_bits = []
        if isinstance(input_obj, dict):
            for key in ("description", "command", "file_path", "pattern", "url", "prompt"):
                v = input_obj.get(key)
                if v:
                    s = str(v).split("\n", 1)[0]
                    summary_bits.append(f"`{key}`: {s[:100]}")
                    break
        summary = " — ".join(summary_bits) if summary_bits else ""
        head = f"🔧 **{name}**" + (f" — {summary}" if summary else "")
        return (
            f"<details><summary>{head}</summary>\n\n"
            f"```json\n{input_str}\n```\n\n</details>"
        )
    if t == "tool_result":
        content = item.get("content", "")
        if isinstance(content, list):
            parts = []
            for c in content:
                if isinstance(c, dict):
                    if c.get("type") == "text":
                        parts.append(c.get("text", ""))
                    elif c.get("type") == "image":
                        parts.append("[image]")
                    else:
                        parts.append(str(c))
                else:
                    parts.append(str(c))
            content = "\n".join(parts)
        elif not isinstance(content, str):
            content = str(content)
        truncated = len(content) > TOOL_RESULT_CAP
        if truncated:
            content = content[:TOOL_RESULT_CAP] + f"\n\n…[truncated {len(item.get('content', '')) - TOOL_RESULT_CAP} chars]"
        is_error = item.get("is_error")
        tu_id = item.get("tool_use_id", "")
        tool_name = ctx["tool_uses"].get(tu_id, "tool")
        flag = " ⚠️ error" if is_error else ""
        return (
            f"<details><summary>📤 {tool_name} result{flag}</summary>\n\n"
            f"```\n{content}\n```\n\n</details>"
        )
    if t == "image":
        return "[image]"
    return ""


def render_message(line: dict, ctx: dict) -> str | None:
    """Render one JSONL line into a markdown block, or None to skip."""
    t = line.get("type")
    msg = line.get("message", {})
    if t == "user":
        if not isinstance(msg, dict):
            return None
        content = msg.get("content", "")
        # Tool results often live inside user messages as content list items
        if isinstance(content, list):
            rendered = [render_content_item(c, ctx) for c in content if isinstance(c, dict)]
            rendered = [r for r in rendered if r]
            if not rendered:
                return None
            # If everything is a tool_result, label section accordingly
            kinds = {c.get("type") for c in content if isinstance(c, dict)}
            if kinds == {"tool_result"}:
                return "\n\n".join(rendered)
            return "### 👤 User\n\n" + "\n\n".join(rendered)
        if isinstance(content, str):
            text = content.strip()
            if not text:
                return None
            # Stash first user prompt as session slug source
            if "first_user_text" not in ctx:
                ctx["first_user_text"] = text
            return f"### 👤 User\n\n{text}"
        return None
    if t == "assistant":
        if not isinstance(msg, dict):
            return None
        content = msg.get("content", [])
        if isinstance(content, list):
            rendered = [render_content_item(c, ctx) for c in content if isinstance(c, dict)]
            rendered = [r for r in rendered if r]
            if not rendered:
                return None
            return "### 🤖 Assistant\n\n" + "\n\n".join(rendered)
        return None
    if t == "attachment":
        att = line.get("attachment", {}) or {}
        kind = att.get("type") or "attachment"
        return f"*[attachment: {kind}]*"
    return None


def parse_session(jsonl_path: Path) -> dict | None:
    """Read a session JSONL and return a dict with messages, cwd, metadata."""
    cwd = None
    session_id = jsonl_path.stem
    is_sidechain = False
    parent_session_id = None
    first_ts = None
    last_ts = None
    rendered_blocks: list[str] = []
    ctx = {"tool_uses": {}}
    user_msg_count = 0
    assistant_msg_count = 0
    tool_use_count = 0

    try:
        with open(jsonl_path, "r", encoding="utf-8", errors="replace") as f:
            for raw in f:
                raw = raw.strip()
                if not raw:
                    continue
                try:
                    obj = json.loads(raw)
                except Exception:
                    continue
                t = obj.get("type")
                if t in ("queue-operation", "last-prompt", "system"):
                    continue
                if cwd is None and obj.get("cwd"):
                    cwd = obj["cwd"]
                if obj.get("isSidechain"):
                    is_sidechain = True
                if obj.get("sessionId"):
                    session_id = obj["sessionId"]
                ts = obj.get("timestamp")
                if ts:
                    parsed = parse_iso_ts(ts)
                    if parsed:
                        if first_ts is None or parsed < first_ts:
                            first_ts = parsed
                        if last_ts is None or parsed > last_ts:
                            last_ts = parsed
                if t == "user":
                    msg = obj.get("message", {})
                    c = msg.get("content") if isinstance(msg, dict) else None
                    is_pure_tool_result = isinstance(c, list) and all(
                        isinstance(x, dict) and x.get("type") == "tool_result" for x in c
                    )
                    if not is_pure_tool_result:
                        user_msg_count += 1
                if t == "assistant":
                    assistant_msg_count += 1
                    msg = obj.get("message", {})
                    c = msg.get("content") if isinstance(msg, dict) else None
                    if isinstance(c, list):
                        for it in c:
                            if isinstance(it, dict) and it.get("type") == "tool_use":
                                tool_use_count += 1
                block = render_message(obj, ctx)
                if block:
                    rendered_blocks.append(block)
    except Exception as e:
        print(f"  !! parse error in {jsonl_path}: {e}", file=sys.stderr)
        return None

    if not rendered_blocks:
        return None

    # Detect parent session for subagents from path: .../<parent-uuid>/subagents/agent-xxx.jsonl
    # Subagent JSONLs share the parent's sessionId, so override with file stem for uniqueness.
    agent_file_id = None
    if jsonl_path.parent.name == "subagents":
        parent_session_id = jsonl_path.parent.parent.name
        is_sidechain = True
        agent_file_id = jsonl_path.stem  # e.g. "agent-ad4b252958c8059b3"

    return {
        "session_id": session_id,
        "agent_file_id": agent_file_id,
        "cwd": cwd,
        "is_sidechain": is_sidechain,
        "parent_session_id": parent_session_id,
        "first_ts": first_ts,
        "last_ts": last_ts,
        "blocks": rendered_blocks,
        "ctx": ctx,
        "user_msg_count": user_msg_count,
        "assistant_msg_count": assistant_msg_count,
        "tool_use_count": tool_use_count,
        "jsonl_path": str(jsonl_path),
    }


def write_session_md(session: dict, vault: Path) -> Path | None:
    cwd = session["cwd"]
    if not cwd:
        cwd = "/Users/unknown"
    project = project_name_from_cwd(cwd)

    sid8 = short_id(session["session_id"])
    first_ts = session["first_ts"] or datetime.now(timezone.utc)
    date_str = first_ts.strftime("%Y-%m-%d")

    summary_seed = (session["ctx"].get("first_user_text") or "").split("\n", 1)[0]
    slug = slugify(summary_seed[:80]) if summary_seed else "session"

    if session["is_sidechain"]:
        out_dir = vault / "projects" / project / "subagents"
        parent_id8 = short_id(session["parent_session_id"] or "")
        agent_id = session.get("agent_file_id") or session["session_id"]
        # Hash the full agent file stem so named agents (agent-X-Y) don't collide
        agent_hash = hashlib.sha1(agent_id.encode()).hexdigest()[:10]
        # Try to keep a readable slug from named agents (e.g. "prompt_suggestion")
        readable = re.sub(r"^agent-?a?", "", agent_id)
        readable_slug = slugify(readable, max_len=30)
        filename = f"{parent_id8}__{readable_slug}_{agent_hash}.md"
    else:
        out_dir = vault / "projects" / project / "sessions"
        filename = f"{date_str}_{slug}_{sid8}.md"
    out_dir.mkdir(parents=True, exist_ok=True)
    out_path = out_dir / filename

    # Frontmatter
    fm_lines = [
        "---",
        f"session_id: {session['session_id']}",
        f"project: {project}",
        f"cwd: {cwd}",
        f"date: {date_str}",
        f"started: {first_ts.isoformat()}",
    ]
    if session["last_ts"]:
        fm_lines.append(f"ended: {session['last_ts'].isoformat()}")
    fm_lines.append(f"is_subagent: {str(session['is_sidechain']).lower()}")
    if session["parent_session_id"]:
        fm_lines.append(f"parent_session: {session['parent_session_id']}")
    fm_lines.append(f"user_messages: {session['user_msg_count']}")
    fm_lines.append(f"assistant_messages: {session['assistant_msg_count']}")
    fm_lines.append(f"tool_uses: {session['tool_use_count']}")
    fm_lines.append("summary: ")
    fm_lines.append("topics: []")
    fm_lines.append("tools: []")
    fm_lines.append("entities: []")
    fm_lines.append("---")
    fm = "\n".join(fm_lines)

    # Heading + body
    title_seed = summary_seed or "(no user prompt)"
    title = title_seed[:SUMMARY_LEN].strip()
    if session["is_sidechain"]:
        parent_link = ""
        if session["parent_session_id"]:
            parent_link = f"\n\n*Subagent of parent session `{short_id(session['parent_session_id'])}`*"
        header = f"# 🤖 Subagent: {title}{parent_link}"
    else:
        header = f"# {title}"

    body = "\n\n".join(session["blocks"])
    return out_path, f"{fm}\n\n{header}\n\n{body}\n"


def collect_existing_session_ids(vault: Path) -> set[str]:
    """Scan existing session markdown frontmatter for session_id values."""
    seen: set[str] = set()
    for md in (vault / "projects").rglob("*.md"):
        if md.name.startswith("_"):
            continue
        try:
            head = md.read_text(encoding="utf-8", errors="replace")[:600]
        except Exception:
            continue
        for line in head.splitlines():
            if line.startswith("session_id:"):
                seen.add(line.split(":", 1)[1].strip())
                break
    return seen


def main():
    incremental = "--incremental" in sys.argv
    if not SRC.exists():
        print(f"!! Source missing: {SRC}", file=sys.stderr)
        sys.exit(1)
    VAULT.mkdir(parents=True, exist_ok=True)

    existing_ids: set[str] = set()
    if incremental:
        existing_ids = collect_existing_session_ids(VAULT)
        print(f"Incremental mode: {len(existing_ids)} sessions already in vault.")

    # Walk all JSONL files
    jsonl_paths: list[Path] = []
    for p in SRC.rglob("*.jsonl"):
        # Skip empty files
        try:
            if p.stat().st_size == 0:
                continue
        except OSError:
            continue
        jsonl_paths.append(p)

    print(f"Found {len(jsonl_paths)} JSONL files. Parsing…")

    parsed: list[dict] = []
    skipped_empty = 0
    seen_session_ids: set[str] = set()
    for i, p in enumerate(jsonl_paths, 1):
        if i % 25 == 0:
            print(f"  [{i}/{len(jsonl_paths)}]")
        s = parse_session(p)
        if s is None:
            skipped_empty += 1
            continue
        # Dedupe main sessions by session_id (same UUID may exist in two cwd folders).
        # Subagents share the parent's sessionId, so don't dedupe them and don't pollute the set.
        if not s["is_sidechain"]:
            if s["session_id"] in seen_session_ids:
                skipped_empty += 1
                continue
            seen_session_ids.add(s["session_id"])
        # Incremental: skip sessions already in the vault (preserves backlinks)
        if incremental and s["session_id"] in existing_ids and not s["is_sidechain"]:
            skipped_empty += 1
            continue
        parsed.append(s)

    print(f"Parsed {len(parsed)} non-empty sessions ({skipped_empty} skipped).")

    # Write session files
    project_sessions: dict[str, list[dict]] = defaultdict(list)
    project_subagents: dict[str, list[dict]] = defaultdict(list)
    for s in parsed:
        project = project_name_from_cwd(s["cwd"] or "")
        result = write_session_md(s, VAULT)
        if result is None:
            continue
        out_path, contents = result
        out_path.write_text(contents, encoding="utf-8")
        if s["is_sidechain"]:
            project_subagents[project].append({"session": s, "path": out_path})
        else:
            project_sessions[project].append({"session": s, "path": out_path})

    print(f"Wrote {sum(len(v) for v in project_sessions.values())} main sessions, "
          f"{sum(len(v) for v in project_subagents.values())} subagents across "
          f"{len(set(list(project_sessions) + list(project_subagents)))} projects.")

    # Project rollups
    all_projects = sorted(set(list(project_sessions) + list(project_subagents)))
    for project in all_projects:
        sessions = project_sessions.get(project, [])
        subagents = project_subagents.get(project, [])
        sessions_sorted = sorted(
            sessions,
            key=lambda r: r["session"]["first_ts"] or datetime.min.replace(tzinfo=timezone.utc),
            reverse=True,
        )
        cwds = sorted({r["session"]["cwd"] for r in sessions if r["session"]["cwd"]})
        first_seen = min(
            (r["session"]["first_ts"] for r in sessions if r["session"]["first_ts"]),
            default=None,
        )
        last_seen = max(
            (r["session"]["last_ts"] for r in sessions if r["session"]["last_ts"]),
            default=None,
        )

        lines = ["---", f"project: {project}", "type: project-rollup"]
        if cwds:
            lines.append(f"cwds: {len(cwds)}")
        lines.append(f"sessions: {len(sessions)}")
        lines.append(f"subagents: {len(subagents)}")
        if first_seen:
            lines.append(f"first_seen: {first_seen.date()}")
        if last_seen:
            lines.append(f"last_seen: {last_seen.date()}")
        lines.append("category: ")
        lines.append("---")
        lines.append("")
        lines.append(f"# {project}")
        lines.append("")
        if cwds:
            lines.append("**Working directories:**")
            for c in cwds:
                lines.append(f"- `{c}`")
            lines.append("")
        lines.append(f"**{len(sessions)} sessions** · **{len(subagents)} subagents**")
        if first_seen and last_seen:
            lines.append(f" · {first_seen.date()} → {last_seen.date()}")
        lines.append("")
        lines.append("## Sessions")
        lines.append("")
        lines.append("| Date | Title | Msgs | Tools |")
        lines.append("|------|-------|------|-------|")
        for r in sessions_sorted:
            s = r["session"]
            stem = r["path"].stem
            d = (s["first_ts"] or datetime.now(timezone.utc)).date()
            seed = (s["ctx"].get("first_user_text") or "(no prompt)").split("\n", 1)[0]
            title = seed[:USER_MSG_PREVIEW].replace("|", "\\|")
            link = f"[[projects/{project}/sessions/{stem}|{title}]]"
            msgs = s["user_msg_count"] + s["assistant_msg_count"]
            tools = s["tool_use_count"]
            lines.append(f"| {d} | {link} | {msgs} | {tools} |")
        lines.append("")
        if subagents:
            lines.append("## Subagent runs")
            lines.append("")
            for r in sorted(subagents, key=lambda r: r["session"]["first_ts"] or datetime.min.replace(tzinfo=timezone.utc), reverse=True):
                s = r["session"]
                stem = r["path"].stem
                seed = (s["ctx"].get("first_user_text") or "(no prompt)").split("\n", 1)[0]
                title = seed[:USER_MSG_PREVIEW].replace("|", "\\|")
                lines.append(f"- [[projects/{project}/subagents/{stem}|{title}]]")
            lines.append("")

        rollup_path = VAULT / "projects" / project / f"{project}.md"
        rollup_path.parent.mkdir(parents=True, exist_ok=True)
        rollup_path.write_text("\n".join(lines), encoding="utf-8")

    # Top-level README.md
    readme = ["# Claude Code Brain", "",
              f"Generated {datetime.now().strftime('%Y-%m-%d %H:%M')}",
              "",
              f"**{len(parsed)} sessions** across **{len(all_projects)} projects** "
              f"({sum(len(v) for v in project_subagents.values())} subagent runs).",
              "",
              "## Projects",
              ""]
    # Sort projects by recency of last session
    project_recency = {}
    for p in all_projects:
        sessions = project_sessions.get(p, [])
        latest = max(
            (s["session"]["last_ts"] or s["session"]["first_ts"] for s in sessions if s["session"]["first_ts"]),
            default=None,
        )
        project_recency[p] = latest or datetime.min.replace(tzinfo=timezone.utc)
    sorted_projects = sorted(all_projects, key=lambda p: project_recency[p], reverse=True)
    readme.append("| Project | Sessions | Subagents | Last activity |")
    readme.append("|---------|---------:|----------:|---------------|")
    for p in sorted_projects:
        ses = len(project_sessions.get(p, []))
        sub = len(project_subagents.get(p, []))
        last = project_recency[p]
        last_s = last.date().isoformat() if last and last.year > 1900 else "—"
        readme.append(f"| [[projects/{p}/{p}\\|{p}]] | {ses} | {sub} | {last_s} |")
    readme.append("")
    readme.append("## Hubs")
    readme.append("")
    readme.append("- [[TOP-TOPICS]] — leaderboard (generated after backlink pass)")
    readme.append("- `topics/` — topic hub pages (generated after backlink pass)")
    readme.append("- `tools/` — framework/service hub pages")
    readme.append("- `entities/` — products, companies, people")
    (VAULT / "README.md").write_text("\n".join(readme) + "\n", encoding="utf-8")

    print(f"\n✓ Wrote vault at {VAULT}")
    print(f"  {len(all_projects)} projects, {len(parsed)} sessions")


if __name__ == "__main__":
    main()

3. Split work into 10 balanced agent slices

The next step spawns 10 parallel sub-agents that each need a disjoint list of files. This script does the LPT bin-packing.

$ python3 ~/ClaudeCodeBrain/_build/scripts/split_assignments.py

scripts/split_assignments.py · 58 lines

#!/usr/bin/env python3
"""Split session markdown files into 10 balanced slices for parallel backlink agents."""
from __future__ import annotations
import json
import os
from pathlib import Path

VAULT = Path.home() / "ClaudeCodeBrain"
N_AGENTS = 10
ASSIGN_DIR = VAULT / "_build" / "assignments"


def main():
    ASSIGN_DIR.mkdir(parents=True, exist_ok=True)

    # Collect all session and subagent files with their byte size
    files: list[tuple[Path, int]] = []
    for proj_dir in sorted((VAULT / "projects").iterdir()):
        if not proj_dir.is_dir():
            continue
        for sub in ("sessions", "subagents"):
            d = proj_dir / sub
            if d.exists():
                for f in sorted(d.iterdir()):
                    if f.suffix == ".md" and not f.name.startswith("_"):
                        files.append((f, f.stat().st_size))

    # Greedy LPT (longest-processing-time) bin packing by size
    # Sort largest first, put each into the currently smallest bin
    files.sort(key=lambda x: -x[1])
    bins: list[dict] = [{"files": [], "total": 0} for _ in range(N_AGENTS)]
    for f, sz in files:
        target = min(bins, key=lambda b: b["total"])
        target["files"].append(str(f))
        target["total"] += sz

    summary = []
    for i, b in enumerate(bins, 1):
        out = ASSIGN_DIR / f"agent_{i:02d}.txt"
        out.write_text("\n".join(b["files"]) + "\n", encoding="utf-8")
        summary.append({
            "agent": i,
            "files": len(b["files"]),
            "bytes": b["total"],
            "kb": round(b["total"] / 1024),
        })

    report = {
        "total_files": len(files),
        "total_bytes": sum(s["bytes"] for s in summary),
        "agents": summary,
    }
    print(json.dumps(report, indent=2))
    (ASSIGN_DIR / "_split_report.json").write_text(json.dumps(report, indent=2), encoding="utf-8")


if __name__ == "__main__":
    main()

4. Spawn 10 backlink agents in Claude Code

Open a Claude Code session in your home directory. Paste the prompt below 10 times, changing the NN from 01 through 10 each time. Each agent processes ~10% of your sessions in parallel.

Or — paste this orchestrator prompt and let Claude dispatch all 10 for you:

Spawn 10 parallel backlink agents using the Agent tool. Each agent gets a number 01-10. Brief each one with the instructions at ~/ClaudeCodeBrain/_build/agent_instructions.md and its assignment list at ~/ClaudeCodeBrain/_build/assignments/agent_NN.txt. Set run_in_background: true for all 10. Reports go to ~/ClaudeCodeBrain/_build/reports/agent_NN.json.

The instructions each agent reads:

prompts/backlink_terminal_agent.md · 109 lines

# Backlink Agent Instructions

You are one of 10 parallel agents extracting topics/tools/entities from Claude Code session markdown files. Every agent has its own disjoint slice — DO NOT touch files outside your assignment list.

## Your inputs

- **Assignment file**: `/Users/michaelwong/ClaudeCodeBrain/_build/assignments/agent_NN.txt` (where NN is your number, e.g. `01`, `02`). One file path per line.
- **Vault root**: `/Users/michaelwong/ClaudeCodeBrain/`

## What to extract per session

For each session file in your assignment list:

1. **topics** — 3 to 8 kebab-case slugs naming what the session was about
   (good: `lead-capture`, `vercel-deploy`, `instagram-scraper`, `prompt-design`)
   (bad: `code`, `help`, `debug`, `general`, `coding`)
   Be specific. If the session is just a one-shot prompt-suggestion subagent or has very little content, return 1-2 topics; don't pad.

2. **tools** — frameworks, libraries, services, or named tooling actually used or discussed
   (e.g. `nextjs`, `supabase`, `apify`, `obsidian`, `wordpress`, `react`, `claude-code`, `n8n`, `gh-cli`)
   Lowercase, kebab-case. Skip if none.

3. **entities** — products, companies, people, real estate listings, brands by name
   (e.g. `team-wong-hawaii`, `michael-wong`, `openai`, `notion`, `nch`, `kailua`)
   Skip if none.

4. **summary** — ONE sentence (max 25 words) describing what the session actually accomplished or attempted. Past tense. No filler.

## What to write per session

For each file, **make exactly one Edit call** that replaces the empty frontmatter fields with populated values AND appends the wikilink section. You can do this in a single Edit by replacing a multi-line block.

**Find:**
```
summary: 
topics: []
tools: []
entities: []
---
```

**Replace with:**
```
summary: "<your summary>"
topics: [topic-1, topic-2, topic-3]
tools: [tool-1, tool-2]
entities: [entity-1]
---
```

Then make a SECOND Edit (or append using a single edit by including more context) to add at the end of the file:

```
## 🔗 Topics & Entities

- **Topics:** [[topics/topic-1]] · [[topics/topic-2]] · [[topics/topic-3]]
- **Tools:** [[tools/tool-1]] · [[tools/tool-2]]
- **Entities:** [[entities/entity-1]]
```

Skip a line if its list is empty.

The cleanest pattern is to do BOTH edits in one Edit call by replacing the closing frontmatter `---\n` plus the next line of the heading with frontmatter+heading+backlinks. Or do two separate edits. Either works.

## Your report

After processing every file in your assignment, write a JSON report to:
`/Users/michaelwong/ClaudeCodeBrain/_build/reports/agent_NN.json`

Shape:
```json
{
  "agent": NN,
  "files_processed": <int>,
  "files_failed": <int>,
  "errors": ["path: reason", ...],
  "sessions": [
    {
      "file": "/absolute/path.md",
      "session_id": "<uuid>",
      "project": "<kebab-case>",
      "topics": ["topic-1", "topic-2"],
      "tools": ["nextjs"],
      "entities": ["openai"],
      "summary": "..."
    },
    ...
  ]
}
```

`session_id` and `project` come from the YAML frontmatter — don't guess.

## Rules

- **Read-only on `topics/`, `tools/`, `entities/` directories** — those hub pages are written by the consolidation pass, not by you. Editing them = race condition.
- **Don't read or edit files outside your assignment list.**
- **Don't use the Bash tool to grep across the vault** — process only your assignment files.
- If a file fails to parse or edit, log the error and continue. Don't abort.
- Keep your responses minimal. The work is the file edits and the final JSON report.
- DO NOT run any tools in `_build/scripts/` or modify any other agent's reports.
- Process files in batches of ~5 using parallel tool calls (multiple Read calls in one message, then multiple Edit calls in one message) for speed.

## When to skip

- If a session is trivially short (<5 lines of body content) AND has no extractable signal, write minimal frontmatter with `topics: [unclassified]` and a one-line summary, but still add the wikilink section pointing to `[[topics/unclassified]]`.
- If a file has already been processed (frontmatter `topics:` is not `[]`), skip it but still include it in your report with whatever values are in the frontmatter.

When you're done, write the JSON report and respond with: `Done. {N} files processed, {E} errors.`

Wait for all 10 completion notifications. Each agent processes ~25 files in 5–7 minutes; the wall-clock total is whichever is slowest.

5. Consolidate the reports into topic / tool / entity hubs

Reads all 10 JSON reports, builds inverted indexes, and writes one markdown page per topic/tool/entity into topics/, tools/, entities/. Also writes the TOP-TOPICS.md leaderboard.

$ python3 ~/ClaudeCodeBrain/_build/scripts/consolidate.py

scripts/consolidate.py · 297 lines

#!/usr/bin/env python3
"""Consolidate the 10 backlink agent reports into hub pages.

Reads:
  _build/reports/agent_*.json

Writes (atomically — only this script touches these dirs):
  topics/<slug>.md       — one per topic, with sessions grouped by project
  tools/<slug>.md        — one per tool/framework
  entities/<slug>.md     — one per entity
  TOP-TOPICS.md          — leaderboard sorted by frequency
"""
from __future__ import annotations

import argparse
import json
import re
from collections import defaultdict
from pathlib import Path

VAULT = Path.home() / "ClaudeCodeBrain"
REPORTS_DIR = VAULT / "_build" / "reports"
WEB_REPORTS_DIR = VAULT / "_build" / "web_reports"

FM_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL)


def slug_clean(s: str) -> str:
    s = (s or "").strip().lower()
    s = re.sub(r"[^a-z0-9\s-]", "", s)
    s = re.sub(r"\s+", "-", s)
    s = re.sub(r"-+", "-", s).strip("-")
    return s


def parse_frontmatter(text: str) -> dict:
    m = FM_RE.match(text)
    if not m:
        return {}
    fm: dict = {}
    for line in m.group(1).splitlines():
        if ":" not in line:
            continue
        key, _, value = line.partition(":")
        fm[key.strip()] = value.strip()
    return fm


def parse_list(value: str) -> list[str]:
    value = (value or "").strip()
    if not value or value == "[]":
        return []
    if value.startswith("[") and value.endswith("]"):
        inner = value[1:-1].strip()
        if not inner:
            return []
        return [
            item.strip().strip('"').strip("'")
            for item in inner.split(",")
            if item.strip().strip('"').strip("'")
        ]
    return [value.strip().strip('"').strip("'")]


def sessions_from_frontmatter(existing_files: set[str]) -> list[dict]:
    """Load session metadata directly from markdown for files missing reports.

    Walks both projects/ (terminal sessions) and web-archive/conversations/.
    """
    sessions: list[dict] = []
    candidates: list[Path] = []
    candidates.extend(sorted((VAULT / "projects").glob("*/*/*.md")))
    web_dir = VAULT / "web-archive" / "conversations"
    if web_dir.exists():
        candidates.extend(sorted(web_dir.iterdir()))
    for md in candidates:
        if md.suffix != ".md" or md.name.startswith("_"):
            continue
        # Filter projects/ to sessions/subagents only
        if "/projects/" in str(md) and md.parent.name not in ("sessions", "subagents"):
            continue
        md_s = str(md)
        if md_s in existing_files:
            continue
        try:
            text = md.read_text(encoding="utf-8", errors="replace")
        except Exception:
            continue
        fm = parse_frontmatter(text)
        if not fm:
            continue
        topics = parse_list(fm.get("topics", ""))
        tools = parse_list(fm.get("tools", ""))
        entities = parse_list(fm.get("entities", ""))
        if not (topics or tools or entities):
            continue
        # Determine project: terminal session uses parent dir; web uses "web-archive"
        if "/web-archive/" in md_s:
            project = "web-archive"
            sid = fm.get("conversation_id", fm.get("session_id", ""))
        else:
            project = fm.get("project") or md.parent.parent.name
            sid = fm.get("session_id", "")
        sessions.append({
            "file": md_s,
            "session_id": sid,
            "project": project,
            "summary": fm.get("summary", "").strip('"').strip("'"),
            "topics": topics,
            "tools": tools,
            "entities": entities,
        })
    return sessions


def session_link(session: dict) -> str:
    """Build a wikilink to the session file using project + filename."""
    file_path = session.get("file", "")
    if not file_path:
        return f"`{session.get('session_id', 'unknown')}`"
    p = Path(file_path)
    stem = p.stem
    title = session.get("summary") or stem
    title = title.replace("|", "\\|").replace("[", "(").replace("]", ")")
    if len(title) > 100:
        title = title[:97] + "…"
    # Web archive: linked under web-archive/conversations/
    if "/web-archive/" in str(p):
        return f"[[web-archive/conversations/{stem}|{title}]]"
    project = session.get("project") or p.parent.parent.name
    parent_kind = p.parent.name  # "sessions" or "subagents"
    return f"[[projects/{project}/{parent_kind}/{stem}|{title}]]"


def write_hub_page(kind: str, slug: str, sessions: list[dict], outdir: Path) -> None:
    outdir.mkdir(parents=True, exist_ok=True)
    by_project: dict[str, list[dict]] = defaultdict(list)
    for s in sessions:
        by_project[s.get("project", "unknown")].append(s)
    title = slug.replace("-", " ").title()
    lines = [
        "---",
        f"type: {kind}-hub",
        f"slug: {slug}",
        f"sessions: {len(sessions)}",
        f"projects: {len(by_project)}",
        "---",
        "",
        f"# {title}",
        "",
        f"**{len(sessions)} sessions** across **{len(by_project)} projects**.",
        "",
    ]
    for project in sorted(by_project, key=lambda p: -len(by_project[p])):
        proj_sessions = by_project[project]
        if project == "web-archive":
            heading = f"## [[web-archive/README|web archive]] · {len(proj_sessions)}"
        else:
            heading = f"## [[projects/{project}/{project}|{project}]] · {len(proj_sessions)}"
        lines.append(heading)
        lines.append("")
        for s in proj_sessions:
            lines.append(f"- {session_link(s)}")
        lines.append("")
    out_path = outdir / f"{slug}.md"
    out_path.write_text("\n".join(lines), encoding="utf-8")


def main():
    parser = argparse.ArgumentParser(description="Build topic/tool/entity hub pages from agent reports.")
    parser.add_argument(
        "--min-sessions", type=int, default=1,
        help="Only generate hub pages for topics/tools/entities that appear in at least "
             "this many sessions. Default 1 (everything gets a page). Bump to 3+ if your "
             "vault is large and the graph is overwhelmed by long-tail topics.",
    )
    args = parser.parse_args()
    min_sessions = max(1, args.min_sessions)

    if not REPORTS_DIR.exists():
        raise SystemExit(f"!! reports dir missing: {REPORTS_DIR}")

    # Aggregate all session entries from terminal AND web reports
    all_sessions: list[dict] = []
    reports_seen = 0
    existing_files: set[str] = set()
    report_dirs = [REPORTS_DIR]
    if WEB_REPORTS_DIR.exists():
        report_dirs.append(WEB_REPORTS_DIR)
    for rd in report_dirs:
        for report_path in sorted(rd.glob("agent_*.json")):
            try:
                data = json.loads(report_path.read_text(encoding="utf-8"))
            except Exception as e:
                print(f"!! could not parse {report_path}: {e}")
                continue
            reports_seen += 1
            for s in data.get("sessions", []):
                all_sessions.append(s)
                if s.get("file"):
                    existing_files.add(str(s["file"]))
    fallback_sessions = sessions_from_frontmatter(existing_files)
    all_sessions.extend(fallback_sessions)
    print(
        f"Loaded {reports_seen} reports / {len(all_sessions)} session entries "
        f"({len(fallback_sessions)} from markdown frontmatter)."
    )

    # Build inverted indexes
    topic_index: dict[str, list[dict]] = defaultdict(list)
    tool_index: dict[str, list[dict]] = defaultdict(list)
    entity_index: dict[str, list[dict]] = defaultdict(list)

    for s in all_sessions:
        for t in s.get("topics", []) or []:
            slug = slug_clean(t)
            if slug:
                topic_index[slug].append(s)
        for t in s.get("tools", []) or []:
            slug = slug_clean(t)
            if slug:
                tool_index[slug].append(s)
        for e in s.get("entities", []) or []:
            slug = slug_clean(e)
            if slug:
                entity_index[slug].append(s)

    print(f"  Unique topics: {len(topic_index)}")
    print(f"  Unique tools: {len(tool_index)}")
    print(f"  Unique entities: {len(entity_index)}")

    # Wipe + regenerate hub dirs (atomic-ish — only this script writes here)
    for sub in ("topics", "tools", "entities"):
        d = VAULT / sub
        if d.exists():
            for old in d.glob("*.md"):
                old.unlink()

    written = {"topic": 0, "tool": 0, "entity": 0}
    skipped = {"topic": 0, "tool": 0, "entity": 0}
    for slug, sessions in topic_index.items():
        if len(sessions) < min_sessions:
            skipped["topic"] += 1
            continue
        write_hub_page("topic", slug, sessions, VAULT / "topics")
        written["topic"] += 1
    for slug, sessions in tool_index.items():
        if len(sessions) < min_sessions:
            skipped["tool"] += 1
            continue
        write_hub_page("tool", slug, sessions, VAULT / "tools")
        written["tool"] += 1
    for slug, sessions in entity_index.items():
        if len(sessions) < min_sessions:
            skipped["entity"] += 1
            continue
        write_hub_page("entity", slug, sessions, VAULT / "entities")
        written["entity"] += 1
    if min_sessions > 1:
        print(f"  Hub pages written (≥{min_sessions} sessions): "
              f"{written['topic']} topics, {written['tool']} tools, {written['entity']} entities")
        print(f"  Long-tail dropped: {skipped['topic']} topics, {skipped['tool']} tools, {skipped['entity']} entities")
    else:
        print(f"  Hub pages written: "
              f"{written['topic']} topics, {written['tool']} tools, {written['entity']} entities")

    # TOP-TOPICS leaderboard (top 50 by session count)
    leaderboard = sorted(topic_index.items(), key=lambda kv: -len(kv[1]))
    lines = ["# Top Topics", "", "Sorted by session count.", "",
             "| Rank | Topic | Sessions |", "|-----:|-------|---------:|"]
    for i, (slug, sessions) in enumerate(leaderboard[:50], 1):
        lines.append(f"| {i} | [[topics/{slug}\\|{slug}]] | {len(sessions)} |")
    lines.append("")
    lines.append(f"## Tools (top 30)")
    lines.append("")
    lines.append("| Rank | Tool | Sessions |")
    lines.append("|-----:|------|---------:|")
    tool_lb = sorted(tool_index.items(), key=lambda kv: -len(kv[1]))
    for i, (slug, sessions) in enumerate(tool_lb[:30], 1):
        lines.append(f"| {i} | [[tools/{slug}\\|{slug}]] | {len(sessions)} |")
    lines.append("")
    lines.append(f"## Entities (top 30)")
    lines.append("")
    lines.append("| Rank | Entity | Sessions |")
    lines.append("|-----:|--------|---------:|")
    ent_lb = sorted(entity_index.items(), key=lambda kv: -len(kv[1]))
    for i, (slug, sessions) in enumerate(ent_lb[:30], 1):
        lines.append(f"| {i} | [[entities/{slug}\\|{slug}]] | {len(sessions)} |")
    lines.append("")
    (VAULT / "TOP-TOPICS.md").write_text("\n".join(lines), encoding="utf-8")

    print(f"\n✓ Wrote {len(topic_index)} topic / {len(tool_index)} tool / "
          f"{len(entity_index)} entity hub pages + TOP-TOPICS.md")


if __name__ == "__main__":
    main()

Long-tail filter If you have a lot of one-off topics from short sessions, add --min-sessions 3 to only create hub pages for topics that appear in 3 or more sessions. Long-tail topics stay in session frontmatter for full-text search but don't get their own graph node.

6. Refresh the project rollups + README

$ python3 ~/ClaudeCodeBrain/_build/scripts/rebuild_rollups.py

scripts/rebuild_rollups.py · 189 lines

#!/usr/bin/env python3
"""Walk the vault and rebuild every _project.md rollup + the top-level README.md.

Source of truth is the YAML frontmatter inside each session/subagent markdown.
Run after incremental exports to refresh rollups without parsing JSONL.
"""
from __future__ import annotations

import re
from collections import defaultdict
from datetime import datetime, timezone
from pathlib import Path

VAULT = Path.home() / "ClaudeCodeBrain"
PROJECTS = VAULT / "projects"


FM_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL)


def parse_frontmatter(text: str) -> dict:
    m = FM_RE.match(text)
    if not m:
        return {}
    fm: dict = {}
    for line in m.group(1).splitlines():
        if ":" in line:
            k, _, v = line.partition(":")
            fm[k.strip()] = v.strip()
    return fm


def parse_iso(s: str) -> datetime | None:
    if not s:
        return None
    try:
        if s.endswith("Z"):
            s = s[:-1] + "+00:00"
        return datetime.fromisoformat(s)
    except Exception:
        return None


def first_heading_or_summary(text: str, fm: dict) -> str:
    summary = fm.get("summary", "").strip().strip('"').strip("'")
    if summary:
        return summary[:80]
    # Fall back to first H1
    for line in text.splitlines():
        if line.startswith("# "):
            return line[2:].strip()[:80]
    return "(no title)"


def main():
    if not PROJECTS.exists():
        raise SystemExit(f"!! No projects dir at {PROJECTS}")

    project_sessions: dict[str, list[dict]] = defaultdict(list)
    project_subagents: dict[str, list[dict]] = defaultdict(list)

    for proj_dir in sorted(PROJECTS.iterdir()):
        if not proj_dir.is_dir():
            continue
        project = proj_dir.name
        for sub, bucket in (("sessions", project_sessions), ("subagents", project_subagents)):
            d = proj_dir / sub
            if not d.exists():
                continue
            for f in sorted(d.iterdir()):
                if f.suffix != ".md" or f.name.startswith("_"):
                    continue
                try:
                    text = f.read_text(encoding="utf-8", errors="replace")
                except Exception:
                    continue
                fm = parse_frontmatter(text)
                bucket[project].append({
                    "path": f,
                    "fm": fm,
                    "title": first_heading_or_summary(text, fm),
                    "first_ts": parse_iso(fm.get("started", "")),
                    "last_ts": parse_iso(fm.get("ended", "")),
                    "cwd": fm.get("cwd", ""),
                    "user_msgs": int(fm.get("user_messages", 0) or 0),
                    "asst_msgs": int(fm.get("assistant_messages", 0) or 0),
                    "tool_uses": int(fm.get("tool_uses", 0) or 0),
                })

    all_projects = sorted(set(list(project_sessions) + list(project_subagents)))
    print(f"Rebuilding rollups for {len(all_projects)} projects…")

    for project in all_projects:
        sessions = project_sessions.get(project, [])
        subagents = project_subagents.get(project, [])
        sessions_sorted = sorted(
            sessions, key=lambda r: r["first_ts"] or datetime.min.replace(tzinfo=timezone.utc),
            reverse=True,
        )
        cwds = sorted({r["cwd"] for r in sessions if r["cwd"]})
        first_seen = min((r["first_ts"] for r in sessions if r["first_ts"]), default=None)
        last_seen = max((r["last_ts"] or r["first_ts"] for r in sessions if r["first_ts"]), default=None)

        lines = ["---", f"project: {project}", "type: project-rollup"]
        if cwds:
            lines.append(f"cwds: {len(cwds)}")
        lines.append(f"sessions: {len(sessions)}")
        lines.append(f"subagents: {len(subagents)}")
        if first_seen:
            lines.append(f"first_seen: {first_seen.date()}")
        if last_seen:
            lines.append(f"last_seen: {last_seen.date()}")
        lines.append("category: ")
        lines.append("---")
        lines.append("")
        lines.append(f"# {project}")
        lines.append("")
        if cwds:
            lines.append("**Working directories:**")
            for c in cwds:
                lines.append(f"- `{c}`")
            lines.append("")
        lines.append(f"**{len(sessions)} sessions** · **{len(subagents)} subagents**")
        if first_seen and last_seen:
            lines.append(f" · {first_seen.date()} → {last_seen.date()}")
        lines.append("")
        lines.append("## Sessions")
        lines.append("")
        lines.append("| Date | Title | Msgs | Tools |")
        lines.append("|------|-------|------|-------|")
        for r in sessions_sorted:
            stem = r["path"].stem
            d = (r["first_ts"] or datetime.now(timezone.utc)).date()
            title = r["title"].replace("|", "\\|")
            link = f"[[projects/{project}/sessions/{stem}|{title}]]"
            msgs = r["user_msgs"] + r["asst_msgs"]
            lines.append(f"| {d} | {link} | {msgs} | {r['tool_uses']} |")
        lines.append("")
        if subagents:
            lines.append("## Subagent runs")
            lines.append("")
            for r in sorted(subagents, key=lambda r: r["first_ts"] or datetime.min.replace(tzinfo=timezone.utc), reverse=True):
                stem = r["path"].stem
                title = r["title"].replace("|", "\\|")
                lines.append(f"- [[projects/{project}/subagents/{stem}|{title}]]")
            lines.append("")

        proj_dir = PROJECTS / project
        (proj_dir / f"{project}.md").write_text("\n".join(lines), encoding="utf-8")

    # README
    readme = ["# Claude Code Brain", "",
              f"Generated {datetime.now().strftime('%Y-%m-%d %H:%M')}",
              "",
              f"**{sum(len(v) for v in project_sessions.values())} sessions** across "
              f"**{len(all_projects)} projects** "
              f"({sum(len(v) for v in project_subagents.values())} subagent runs).",
              "",
              "## Projects",
              ""]
    project_recency: dict[str, datetime] = {}
    for p in all_projects:
        latest = max(
            ((r["last_ts"] or r["first_ts"]) for r in project_sessions.get(p, []) if r["first_ts"]),
            default=None,
        )
        project_recency[p] = latest or datetime.min.replace(tzinfo=timezone.utc)
    sorted_projects = sorted(all_projects, key=lambda p: project_recency[p], reverse=True)
    readme.append("| Project | Sessions | Subagents | Last activity |")
    readme.append("|---------|---------:|----------:|---------------|")
    for p in sorted_projects:
        ses = len(project_sessions.get(p, []))
        sub = len(project_subagents.get(p, []))
        last = project_recency[p]
        last_s = last.date().isoformat() if last and last.year > 1900 else "—"
        readme.append(f"| [[projects/{p}/{p}\\|{p}]] | {ses} | {sub} | {last_s} |")
    readme.append("")
    readme.append("## Hubs")
    readme.append("")
    readme.append("- [[TOP-TOPICS]] — leaderboard")
    readme.append("- `topics/` — topic hub pages")
    readme.append("- `tools/` — framework/service hub pages")
    readme.append("- `entities/` — products, companies, people")
    (VAULT / "README.md").write_text("\n".join(readme) + "\n", encoding="utf-8")
    print(f"✓ Rollups + README rebuilt for {len(all_projects)} projects")


if __name__ == "__main__":
    main()

7. Open it in Obsidian

Launch Obsidian → Open folder as vault → pick ~/ClaudeCodeBrain/. Open README.md and confirm the project index renders. Hit Cmd+G (Mac) or Ctrl+G (Win) for the graph view.

SECTION 03

Install the recall skill — this is the point.

~ 3 minutes

Without this step, the vault is just a searchable archive. With it, Claude Code reads your past sessions before any code work whenever you use a trigger phrase like "let's work on X" or "check my brain for Y". That's the difference between an archive and a brain.

1. Drop the skill into Claude Code

$ mkdir -p ~/.claude/skills/claude-brain
$ cp skill/claude-brain/SKILL.md ~/.claude/skills/claude-brain/

skill/claude-brain/SKILL.md · 53 lines

---
name: claude-brain
description: Read the user's Claude Code Brain vault before working on a project. Triggers on phrases like "let's work on X", "let's pick up X", "last time on Y", "what did we do with Z", "check my brain for X", "what have I built with X", "remember X". The vault at ~/ClaudeCodeBrain/ contains every Claude Code session backlinked by topic, tool, and entity. This skill loads relevant past context BEFORE you touch any code so you don't re-prime, re-explain, or repeat past mistakes.
---

# Claude Brain — recall

The user has a standalone Obsidian vault at `~/ClaudeCodeBrain/` containing every Claude Code session they've ever had, organized by project, backlinked by topic, tool, and entity. This skill is the recall side: it loads relevant past context BEFORE you start coding.

## When to fire

Trigger phrases (any rough match):
- "let's work on <X>" / "let's pick up <X>" / "let's keep going on <X>"
- "last time on <X>" / "remind me where we left off on <X>"
- "check my brain for <X>" / "what have I done with <X>"
- "who is <X>?" / "what is <X>?" — when X looks like a project, person, or tool
- "what have I built with <tool>" / "any past work on <topic>"

Do NOT fire on:
- General coding questions ("how does Python decorator syntax work")
- Brand-new tasks with no historical context cue
- Requests where the user explicitly says "fresh start" or "ignore history"

## What to do (in order)

1. **Read** `~/ClaudeCodeBrain/README.md` — the project index with last-activity dates
2. **Identify** the relevant project, topic, tool, or entity from the user's phrase
3. **Open the right hub:**
   - Project name match → `~/ClaudeCodeBrain/projects/<slug>/<slug>.md` (rollup with session table)
   - Tool/framework name → `~/ClaudeCodeBrain/tools/<slug>.md`
   - Topic name → `~/ClaudeCodeBrain/topics/<slug>.md`
   - Person/product name → `~/ClaudeCodeBrain/entities/<slug>.md`
4. **Skim** the 2–5 most recent linked sessions (top of the rollup table or hub page). Read their YAML frontmatter and any visible summary first; only open the full body if you need detail.
5. **Surface** to the user a 1-paragraph "here's what I see in your brain" summary BEFORE starting any code: what they built, what was decided, any unfinished business or known issues.
6. **Then** proceed with the task. Don't re-explain things the brain already shows you know.

## Read-only contract

- This skill ONLY reads from `~/ClaudeCodeBrain/`. Never edit any file there during recall.
- New sessions get synced into the vault by the separate `/slay` command, which the user runs at the end of a session.
- If you find an obvious stale fact (e.g. memory says X exists but it doesn't), surface it instead of acting on it.

## Search shortcuts

- For fuzzy lookup across the whole vault, use `grep` over `~/ClaudeCodeBrain/projects/*/sessions/*.md` — the YAML frontmatter has `topics:`, `tools:`, `entities:` fields you can filter on.
- The graph view in Obsidian is for the human; you don't need it. Wikilinks in markdown (`[[topics/foo]]`) are sufficient for navigation.

## Edge cases

- **No matching project:** Tell the user explicitly. Don't make one up. Suggest 2–3 nearest matches from the README index.
- **Project hub exists but is sparse:** Fall back to topic/tool hubs that mention the same name.
- **Multiple matches** (e.g. "vision-studio-unleashed" merged from `*-test`): all sessions are now under the canonical name. Just open that.
- **User asks about a session that's clearly older than the vault** (predates first session date in README): say so plainly.

2. Drop the /slay sync command

$ mkdir -p ~/.claude/commands
$ cp commands/slay.md ~/.claude/commands/

commands/slay.md · 48 lines

---
name: slay
description: Sync new Claude Code sessions into the ~/ClaudeCodeBrain/ Obsidian vault. Adds any sessions written since the last run, refreshes project rollups and the top-level README, and leaves existing backlinks intact. Run at the end of a coding session (or whenever you remember).
---

# /slay — sync new sessions into the brain

Pulls in any Claude Code sessions written to `~/.claude/projects/` since your last sync and writes them into the Obsidian vault at `~/ClaudeCodeBrain/`. **Does NOT re-run the 10 backlink agents** — that's expensive. New sessions land with empty `topics: []` until you do a full re-pass.

## What you do

Run these commands in order. Don't skip the rollup rebuild — that's what makes the new sessions show up in the project index.

```bash
# 1. Add any new session markdown files (skips ones already in the vault)
python3 ~/ClaudeCodeBrain/_build/scripts/export.py --incremental

# 2. Refresh _project.md rollups + README.md from disk
python3 ~/ClaudeCodeBrain/_build/scripts/rebuild_rollups.py

# 3. (Optional) refresh hub pages from existing reports
python3 ~/ClaudeCodeBrain/_build/scripts/consolidate.py
```

Tell the user how many sessions were added and where to find the new files.

## When the user wants a full re-pass

If they say "do a full slay" or "rebuild backlinks" or "fresh brain":

1. Run the export script with NO `--incremental` flag (full rebuild — overwrites session bodies but new agent passes are required afterwards).
2. Re-run the 10 parallel backlink agents (see `~/ClaudeCodeBrain/_build/agent_instructions.md`).
3. Re-run consolidate.py.
4. Re-run apply_renames.py if `_build/audit_plan.json` has changed.

Confirm with the user before doing the full re-pass — it's a substantial run, especially if they have a lot of new history.

## Common gotchas

- Don't run `apply_renames.py` automatically — only after the user has reviewed the audit plan and given approval.
- If the user's vault grew significantly (>50 new sessions), suggest the full re-pass so backlinks stay accurate.
- New sessions created during the *current* Claude Code session won't be in `~/.claude/projects/` yet — they're only flushed when the session ends.

## Done when

- New session markdown files exist under `~/ClaudeCodeBrain/projects/<project>/sessions/`.
- `_project.md` rollups for affected projects show the new sessions.
- Top-level `README.md` reflects updated session counts.

3. Test it

Quit any open Claude Code sessions and open a fresh one. Type one of these phrases (substitute your real project name):

let's work on my-project

Claude should pause, read ~/ClaudeCodeBrain/README.md and the matching project rollup, and reply with a summary of what you built before doing anything else. If it just dives into code, the skill didn't fire — double-check the path and the frontmatter in SKILL.md.

That moment — when Claude says "I see from your brain that last time you got Twilio approved and were drafting the consent flow, want to continue there?" — is the entire payoff of this build.

SECTION 04

Fold in your Claude.ai web archive.

~ 25 minutes, once the export email arrives

Same parse → backlink → consolidate pattern as section 02, just pointed at the Claude.ai web export. The result lives under web-archive/ in the same vault and gets woven into the same topic/tool/entity hubs.

1. Download & unzip the Claude.ai export

The email gives you a link to a zip. Unzip it anywhere; you'll point the importer at the resulting folder.

2. Convert JSON → markdown

One file per conversation, with YAML frontmatter for date / title / message counts. Monthly index pages are generated automatically so the graph doesn't blow up.

$ python3 ~/ClaudeCodeBrain/_build/scripts/import_claude_web.py /path/to/claude-export-dir

scripts/import_claude_web.py · 321 lines

#!/usr/bin/env python3
"""Import Claude.ai web export into the same Obsidian vault under web-archive/.

Usage:
  python3 import_claude_web.py /path/to/data-XXXX-batch-0000

Reads conversations.json, memories.json, projects/, and writes one markdown
file per conversation under ~/ClaudeCodeBrain/web-archive/conversations/.
"""
from __future__ import annotations

import json
import re
import sys
from collections import defaultdict
from datetime import datetime, timezone
from pathlib import Path

VAULT = Path.home() / "ClaudeCodeBrain"
ARCHIVE = VAULT / "web-archive"


def slugify(s: str, max_len: int = 60) -> str:
    s = (s or "").strip().lower()
    s = re.sub(r"[^\w\s-]", "", s)
    s = re.sub(r"[\s_]+", "-", s)
    s = re.sub(r"-+", "-", s).strip("-")
    return s[:max_len] or "conversation"


def short_id(uuid_str: str) -> str:
    return (uuid_str or "").replace("-", "")[:8] or "xxxxxxxx"


def parse_iso(s: str | None) -> datetime | None:
    if not s:
        return None
    try:
        if s.endswith("Z"):
            s = s[:-1] + "+00:00"
        return datetime.fromisoformat(s)
    except Exception:
        return None


def render_message(msg: dict) -> str:
    sender = msg.get("sender", "?")
    label = "👤 User" if sender == "human" else "🤖 Claude" if sender == "assistant" else f"❔ {sender}"
    parts: list[str] = []

    # Prefer the structured content list — may have multiple text blocks
    content = msg.get("content")
    if isinstance(content, list):
        for item in content:
            if not isinstance(item, dict):
                continue
            t = item.get("type")
            if t == "text":
                txt = item.get("text") or ""
                if txt.strip():
                    parts.append(txt)
            elif t == "tool_use":
                name = item.get("name", "tool")
                parts.append(f"*[tool call: {name}]*")
            elif t == "tool_result":
                tname = item.get("name", "tool")
                parts.append(f"*[tool result: {tname}]*")
            elif t == "image":
                parts.append("*[image]*")
            elif t == "thinking":
                think = (item.get("thinking") or "").strip()
                if think:
                    parts.append(f"<details><summary>💭 Thinking</summary>\n\n{think}\n\n</details>")
    elif isinstance(content, str) and content.strip():
        parts.append(content)

    # Fallback to flat .text
    if not parts:
        text = msg.get("text") or ""
        if text.strip():
            parts.append(text)

    # Note attachments / files separately
    extras = []
    for f in msg.get("files") or []:
        name = f.get("file_name") or f.get("name") or "file"
        extras.append(f"*[attached file: {name}]*")
    for a in msg.get("attachments") or []:
        name = a.get("file_name") or a.get("name") or "attachment"
        extras.append(f"*[attachment: {name}]*")

    body = "\n\n".join(parts).strip() or "*(empty)*"
    if extras:
        body = body + "\n\n" + "\n".join(extras)
    return f"### {label}\n\n{body}"


def write_conversation(conv: dict, outdir: Path) -> Path | None:
    msgs = conv.get("chat_messages") or []
    if not msgs:
        return None

    name = conv.get("name") or "(untitled)"
    uuid = conv.get("uuid") or ""
    created = parse_iso(conv.get("created_at"))
    updated = parse_iso(conv.get("updated_at"))

    date_str = (created or datetime.now(timezone.utc)).strftime("%Y-%m-%d")
    slug = slugify(name, max_len=80)
    sid = short_id(uuid)
    filename = f"{date_str}_{slug}_{sid}.md"
    out_path = outdir / filename

    user_msg_count = sum(1 for m in msgs if m.get("sender") == "human")
    asst_msg_count = sum(1 for m in msgs if m.get("sender") == "assistant")

    fm = [
        "---",
        f"conversation_id: {uuid}",
        f"source: claude-web",
        f"date: {date_str}",
    ]
    if created:
        fm.append(f"started: {created.isoformat()}")
    if updated:
        fm.append(f"updated: {updated.isoformat()}")
    fm.append(f"user_messages: {user_msg_count}")
    fm.append(f"assistant_messages: {asst_msg_count}")
    if conv.get("summary"):
        # Single-line summary
        s = (conv["summary"] or "").replace("\n", " ").replace('"', "'")
        fm.append(f'summary: "{s[:200]}"')
    fm.append("topics: []")
    fm.append("tools: []")
    fm.append("entities: []")
    fm.append("---")

    body = "\n\n".join(render_message(m) for m in msgs)
    contents = "\n".join(fm) + f"\n\n# {name}\n\n{body}\n"
    out_path.write_text(contents, encoding="utf-8")
    return out_path


def main():
    if len(sys.argv) < 2:
        print("usage: import_claude_web.py /path/to/export-dir", file=sys.stderr)
        sys.exit(1)
    src = Path(sys.argv[1])
    if not src.exists():
        print(f"!! source missing: {src}", file=sys.stderr)
        sys.exit(1)

    convs_path = src / "conversations.json"
    if not convs_path.exists():
        print(f"!! conversations.json missing in {src}", file=sys.stderr)
        sys.exit(1)

    ARCHIVE.mkdir(parents=True, exist_ok=True)
    convs_dir = ARCHIVE / "conversations"
    convs_dir.mkdir(exist_ok=True)

    print(f"Loading {convs_path} ({convs_path.stat().st_size // 1024 // 1024} MB)…")
    with open(convs_path, "r", encoding="utf-8") as f:
        conversations = json.load(f)
    print(f"  {len(conversations)} conversations")

    written = 0
    by_year_month: dict[str, list[dict]] = defaultdict(list)
    for i, conv in enumerate(conversations, 1):
        if i % 100 == 0:
            print(f"  [{i}/{len(conversations)}]")
        out = write_conversation(conv, convs_dir)
        if out:
            written += 1
            ym = (parse_iso(conv.get("created_at")) or datetime.now(timezone.utc)).strftime("%Y-%m")
            by_year_month[ym].append({
                "uuid": conv.get("uuid"),
                "name": conv.get("name") or "(untitled)",
                "path": out,
                "created": parse_iso(conv.get("created_at")),
                "msgs": len(conv.get("chat_messages") or []),
            })

    print(f"  wrote {written} conversation files")

    # Per-month index files (avoids a single mega-README that creates
    # a 1000-link starburst in Obsidian's graph view).
    by_month_dir = ARCHIVE / "by-month"
    by_month_dir.mkdir(exist_ok=True)
    for old in by_month_dir.glob("*.md"):
        old.unlink()
    for ym, items in by_year_month.items():
        items_sorted = sorted(
            items,
            key=lambda x: x["created"] or datetime.min.replace(tzinfo=timezone.utc),
            reverse=True,
        )
        lines = [
            "---",
            "type: web-archive-month",
            f"month: {ym}",
            f"conversations: {len(items_sorted)}",
            "---",
            "",
            f"# Web archive — {ym}",
            "",
            f"**{len(items_sorted)} conversations**",
            "",
            "| Date | Title | Msgs |",
            "|------|-------|-----:|",
        ]
        for it in items_sorted:
            d = (it["created"] or datetime.now(timezone.utc)).date()
            stem = it["path"].stem
            title = it["name"].replace("|", "\\|")[:90]
            lines.append(f"| {d} | [[web-archive/conversations/{stem}\\|{title}]] | {it['msgs']} |")
        (by_month_dir / f"{ym}.md").write_text("\n".join(lines) + "\n", encoding="utf-8")

    # Slim top-level README that links to monthly indexes (not to every conversation)
    months_sorted = sorted(by_year_month.keys(), reverse=True)
    readme = [
        "# Claude Web Archive",
        "",
        f"Imported from `{src.name}` on {datetime.now().strftime('%Y-%m-%d %H:%M')}.",
        "",
        f"**{written} conversations** spanning "
        f"**{min(by_year_month.keys()) if by_year_month else '—'}** → "
        f"**{max(by_year_month.keys()) if by_year_month else '—'}**.",
        "",
        "## By month",
        "",
    ]
    for ym in months_sorted:
        readme.append(f"- [[web-archive/by-month/{ym}|{ym}]] — {len(by_year_month[ym])} conversations")
    readme.append("")
    readme.append("## Other")
    readme.append("")
    readme.append("- [[web-archive/memories|memories]] — Claude memory export, if present")
    readme.append("- `projects/` — Claude.ai project knowledge files")
    (ARCHIVE / "README.md").write_text("\n".join(readme) + "\n", encoding="utf-8")

    # Memories
    mem_path = src / "memories.json"
    if mem_path.exists():
        try:
            mem_data = json.loads(mem_path.read_text(encoding="utf-8"))
            mem_lines = ["# Claude Memory", "",
                         "Imported from `memories.json`.",
                         ""]
            if isinstance(mem_data, list) and mem_data:
                obj = mem_data[0]
                cm = obj.get("conversations_memory")
                pm = obj.get("project_memories")
                if cm:
                    mem_lines.append("## Conversations memory")
                    mem_lines.append("")
                    if isinstance(cm, str):
                        mem_lines.append(cm)
                    else:
                        mem_lines.append("```json")
                        mem_lines.append(json.dumps(cm, indent=2)[:8000])
                        mem_lines.append("```")
                    mem_lines.append("")
                if pm:
                    mem_lines.append("## Project memories")
                    mem_lines.append("")
                    mem_lines.append("```json")
                    mem_lines.append(json.dumps(pm, indent=2)[:8000])
                    mem_lines.append("```")
                    mem_lines.append("")
            (ARCHIVE / "memories.md").write_text("\n".join(mem_lines), encoding="utf-8")
            print("  wrote memories.md")
        except Exception as e:
            print(f"  !! failed to write memories.md: {e}")

    # Projects (each is a JSON describing a Claude.ai project)
    projects_dir = src / "projects"
    if projects_dir.exists():
        out_proj = ARCHIVE / "projects"
        out_proj.mkdir(exist_ok=True)
        for pf in sorted(projects_dir.glob("*.json")):
            try:
                pdata = json.loads(pf.read_text(encoding="utf-8"))
            except Exception:
                continue
            name = pdata.get("name") or pf.stem
            slug = slugify(name)
            description = pdata.get("description") or ""
            instr = pdata.get("prompt_template") or pdata.get("custom_instructions") or ""
            files = pdata.get("docs") or pdata.get("files") or []
            lines = [
                "---",
                f"project_id: {pdata.get('uuid', pf.stem)}",
                f"source: claude-web-project",
                f"name: \"{(name or '').replace(chr(10), ' ').replace(chr(34), chr(39))[:120]}\"",
                "---",
                "",
                f"# {name}",
                "",
            ]
            if description:
                lines.extend([f"**Description:** {description}", ""])
            if instr:
                lines.extend(["## Instructions", "", instr, ""])
            if files:
                lines.extend(["## Files", ""])
                for f in files:
                    fn = f.get("file_name") or f.get("name") or "(unnamed)"
                    fc = f.get("content") or ""
                    lines.append(f"<details><summary>{fn}</summary>\n\n```\n{fc[:8000]}\n```\n\n</details>")
                lines.append("")
            (out_proj / f"{slug}-{short_id(pdata.get('uuid', ''))}.md").write_text(
                "\n".join(lines), encoding="utf-8"
            )
        print(f"  wrote {len(list(out_proj.glob('*.md')))} project files")

    print(f"\n✓ Web archive built at {ARCHIVE}")


if __name__ == "__main__":
    main()

3. Split + dispatch 10 web backlink agents

$ mkdir -p ~/ClaudeCodeBrain/_build/{web_assignments,web_reports}
$ python3 ~/ClaudeCodeBrain/_build/scripts/split_web_assignments.py

scripts/split_web_assignments.py · 40 lines

#!/usr/bin/env python3
"""Split web archive conversation files into 10 balanced slices for parallel backlink agents."""
from __future__ import annotations
import json
from pathlib import Path

VAULT = Path.home() / "ClaudeCodeBrain"
WEB_CONVS = VAULT / "web-archive" / "conversations"
N_AGENTS = 10
ASSIGN_DIR = VAULT / "_build" / "web_assignments"


def main():
    ASSIGN_DIR.mkdir(parents=True, exist_ok=True)

    files: list[tuple[Path, int]] = []
    for f in sorted(WEB_CONVS.iterdir()):
        if f.suffix == ".md":
            files.append((f, f.stat().st_size))

    files.sort(key=lambda x: -x[1])
    bins: list[dict] = [{"files": [], "total": 0} for _ in range(N_AGENTS)]
    for f, sz in files:
        target = min(bins, key=lambda b: b["total"])
        target["files"].append(str(f))
        target["total"] += sz

    summary = []
    for i, b in enumerate(bins, 1):
        out = ASSIGN_DIR / f"agent_{i:02d}.txt"
        out.write_text("\n".join(b["files"]) + "\n", encoding="utf-8")
        summary.append({"agent": i, "files": len(b["files"]), "kb": round(b["total"] / 1024)})

    report = {"total_files": len(files), "agents": summary}
    print(json.dumps(report, indent=2))
    (ASSIGN_DIR / "_split_report.json").write_text(json.dumps(report, indent=2), encoding="utf-8")


if __name__ == "__main__":
    main()

The web archive has a different frontmatter shape than terminal sessions (no tool_uses, has conversation_id, etc.), so the agents use a separate prompt:

prompts/backlink_web_agent.md · 118 lines

# Web Archive Backlink Agent Instructions

You are one of 10 parallel agents extracting topics/tools/entities from Claude.ai **web archive** conversation files. Each agent has its own disjoint slice — DO NOT touch files outside your assignment list.

## Your inputs

- **Assignment file**: `/Users/michaelwong/ClaudeCodeBrain/_build/web_assignments/agent_NN.txt` (NN = your number, e.g. `01`)
- **Vault root**: `/Users/michaelwong/ClaudeCodeBrain/`
- **Output report**: `/Users/michaelwong/ClaudeCodeBrain/_build/web_reports/agent_NN.json`

## Frontmatter shape (different from terminal sessions!)

Web archive files look like this:

```yaml
---
conversation_id: <uuid>
source: claude-web
date: 2024-03-12
started: 2024-03-12T...
updated: 2024-03-12T...
user_messages: 5
assistant_messages: 5
summary: "..."   # optional, may not exist
topics: []
entities: []
---
```

**Important:** there is NO `tools:` line. You need to ADD it during the edit.

## What to extract

For each conversation in your assignment:

1. **topics** — 3 to 8 kebab-case slugs (good: `nano-banana-pro`, `prompt-engineering`, `kailua-real-estate`; bad: `chat`, `help`)
2. **tools** — services / frameworks / named software discussed (e.g. `nano-banana-pro`, `obsidian`, `n8n`, `wordpress`, `skool`, `twilio`). Lowercase, kebab-case.
3. **entities** — named products, companies, people, places (e.g. `team-wong-hawaii`, `michael-wong`, `claude`, `gpt-4`, `kailua`)
4. **summary** — ONE sentence (max 25 words), past tense. Skip if a `summary:` line already exists with content.

Be specific. Short conversations (1–2 message pairs) often only justify 1–2 topics — don't pad.

## The edit

For each file, do **one Edit call** that:

**Find:**
```
topics: []
entities: []
---
```

**Replace with:**
```
topics: [topic-1, topic-2, topic-3]
tools: [tool-1, tool-2]
entities: [entity-1]
---

## 🔗 Topics & Entities

- **Topics:** [[topics/topic-1]] · [[topics/topic-2]] · [[topics/topic-3]]
- **Tools:** [[tools/tool-1]] · [[tools/tool-2]]
- **Entities:** [[entities/entity-1]]
```

Skip the corresponding line if a list is empty (e.g. no tools → omit the **Tools:** line).

If `summary: "..."` already exists in frontmatter, leave it alone — only update topics/tools/entities. If there's no summary line, add one above topics:
```
summary: "<your one-sentence summary>"
topics: [...]
tools: [...]
entities: [...]
---
```

## Reading the file

Read each file with the Read tool — default 2000 lines is plenty. Most web conversations are 1–20 message pairs. For very long ones, the first user message + scanning the first assistant response is usually enough to extract topics.

## Your report

Write JSON to `/Users/michaelwong/ClaudeCodeBrain/_build/web_reports/agent_NN.json`:

```json
{
  "agent": NN,
  "files_processed": <int>,
  "files_failed": <int>,
  "errors": [],
  "sessions": [
    {
      "file": "/absolute/path.md",
      "session_id": "<conversation_id>",
      "project": "web-archive",
      "topics": [...],
      "tools": [...],
      "entities": [...],
      "summary": "..."
    },
    ...
  ]
}
```

The consolidate script merges these reports with the terminal-session reports and groups everything into the same topic/tool/entity hubs. Use `"project": "web-archive"` so they bucket separately within hub pages.

## Rules

- **Read-only on `topics/`, `tools/`, `entities/`** — those get rewritten by the consolidation pass.
- **Process files in parallel batches of 5** — multiple Read calls in one message, then multiple Edit calls in one message.
- **Don't open files outside your assignment.**
- **Skip files where `topics:` is already populated** (not `[]`). Include them in the report with the existing values.
- If you hit a parse error or weird file, log it in `errors` and continue.

When done, write the JSON report and respond with: `Done. {N} files processed, {E} errors.`

Same orchestration as before — open Claude Code, paste an orchestrator prompt that spawns 10 agents using web_agent_instructions.md and the web_assignments/agent_NN.txt files, with reports to web_reports/.

4. Re-consolidate to merge web + terminal into the same hubs

$ python3 ~/ClaudeCodeBrain/_build/scripts/consolidate.py

That's it. The same topics/n8n-workflow.md page now lists every conversation from both sources that mentioned n8n, grouped by data source.

SECTION 05

Fold in your ChatGPT archive when it lands.

~ 25 minutes, 1–3 days from now

ChatGPT's export takes 1–3 days, which is why you requested it first in section 01. When the email lands, you'll repeat the section 04 pattern with the ChatGPT importer.

1. Download & unzip the ChatGPT export

ChatGPT sends a zip with one or more conversations.json / conversations-XXX.json files plus some auxiliary metadata.

2. Convert JSON → markdown

$ python3 ~/ClaudeCodeBrain/_build/scripts/import_chatgpt.py /path/to/chatgpt-export-dir

scripts/import_chatgpt.py · 249 lines

#!/usr/bin/env python3
"""Import a ChatGPT export into the same Obsidian vault under web-archive/.

ChatGPT's export ZIP contains one or more `conversations*.json` files plus
some auxiliary metadata. Schema is different from Claude.ai's export.

Usage:
  python3 import_chatgpt.py /path/to/chatgpt-export-dir

Writes:
  web-archive/conversations/<date>_<slug>_<id8>.md   (one per conversation)
  web-archive/by-month/<YYYY-MM>.md                  (monthly index pages)
  web-archive/README.md                              (top-level archive index)

Each conversation file uses the SAME frontmatter shape as Claude.ai imports
so the consolidation pass and the claude-brain skill don't have to special-case them.

⚠️  This is a working skeleton. ChatGPT's export schema can change between
    releases, so when you run this for the first time, inspect a sample
    conversations*.json and adjust message-walking code to match.

    The key thing ChatGPT does differently:
      - Each conversation has a `mapping` dict keyed by message UUID
      - Messages form a tree (one user can have multiple regen siblings).
        We follow the `current_node` chain by default to get the displayed thread.
      - Roles are "user" / "assistant" / "system" / "tool".
      - Content lives under `message.content.parts` (list of strings).
"""
from __future__ import annotations

import json
import re
import sys
from collections import defaultdict
from datetime import datetime, timezone
from pathlib import Path

VAULT = Path.home() / "ClaudeCodeBrain"
ARCHIVE = VAULT / "web-archive"


def slugify(s: str, max_len: int = 60) -> str:
    s = (s or "").strip().lower()
    s = re.sub(r"[^\w\s-]", "", s)
    s = re.sub(r"[\s_]+", "-", s)
    s = re.sub(r"-+", "-", s).strip("-")
    return s[:max_len] or "conversation"


def short_id(uuid_str: str) -> str:
    return (uuid_str or "").replace("-", "")[:8] or "xxxxxxxx"


def parse_chatgpt_timestamp(ts) -> datetime | None:
    """ChatGPT exports use Unix epoch floats for `create_time` / `update_time`."""
    if ts is None:
        return None
    try:
        return datetime.fromtimestamp(float(ts), tz=timezone.utc)
    except (ValueError, TypeError):
        return None


def walk_displayed_thread(conv: dict) -> list[dict]:
    """Walk from the conversation's `current_node` up to root, returning the
    visible message thread in chronological order.

    ChatGPT's tree can have regen siblings; `current_node` is the leaf of the
    thread the user last saw, so this gives the "displayed" conversation.
    """
    mapping = conv.get("mapping", {}) or {}
    current = conv.get("current_node")
    if not current or current not in mapping:
        # Fall back: any leaf
        leaves = [k for k, v in mapping.items() if not v.get("children")]
        current = leaves[0] if leaves else None
    chain: list[str] = []
    node_id = current
    seen = set()
    while node_id and node_id in mapping and node_id not in seen:
        seen.add(node_id)
        chain.append(node_id)
        node_id = mapping[node_id].get("parent")
    chain.reverse()
    messages = []
    for nid in chain:
        node = mapping.get(nid, {})
        msg = node.get("message") or {}
        if not msg:
            continue
        author = msg.get("author") or {}
        role = author.get("role") or "user"
        if role == "system":
            continue  # skip system noise
        content = msg.get("content") or {}
        parts = content.get("parts") if isinstance(content, dict) else None
        if parts:
            text = "\n\n".join(str(p) for p in parts if p)
        else:
            text = content.get("text", "") if isinstance(content, dict) else ""
        if not text.strip():
            continue
        messages.append({
            "role": role,
            "text": text.strip(),
            "create_time": msg.get("create_time"),
        })
    return messages


def render_messages(messages: list[dict]) -> str:
    blocks: list[str] = []
    for m in messages:
        role = m["role"]
        label = "👤 User" if role == "user" else "🤖 ChatGPT" if role == "assistant" else f"❔ {role}"
        blocks.append(f"### {label}\n\n{m['text']}")
    return "\n\n".join(blocks)


def write_conversation(conv: dict, outdir: Path) -> tuple[Path, datetime] | None:
    messages = walk_displayed_thread(conv)
    if not messages:
        return None

    title = conv.get("title") or "(untitled)"
    cid = conv.get("conversation_id") or conv.get("id") or ""
    created = parse_chatgpt_timestamp(conv.get("create_time"))
    updated = parse_chatgpt_timestamp(conv.get("update_time"))
    date_str = (created or datetime.now(timezone.utc)).strftime("%Y-%m-%d")
    sid = short_id(cid)
    filename = f"{date_str}_{slugify(title, max_len=80)}_{sid}.md"
    out_path = outdir / filename

    user_msg_count = sum(1 for m in messages if m["role"] == "user")
    asst_msg_count = sum(1 for m in messages if m["role"] == "assistant")

    fm = [
        "---",
        f"conversation_id: {cid}",
        "source: chatgpt-web",
        f"date: {date_str}",
    ]
    if created:
        fm.append(f"started: {created.isoformat()}")
    if updated:
        fm.append(f"updated: {updated.isoformat()}")
    fm.append(f"user_messages: {user_msg_count}")
    fm.append(f"assistant_messages: {asst_msg_count}")
    fm.append("topics: []")
    fm.append("tools: []")
    fm.append("entities: []")
    fm.append("---")

    body = render_messages(messages)
    out_path.write_text("\n".join(fm) + f"\n\n# {title}\n\n{body}\n", encoding="utf-8")
    return out_path, (created or datetime.now(timezone.utc))


def main():
    if len(sys.argv) < 2:
        print("usage: import_chatgpt.py /path/to/chatgpt-export-dir", file=sys.stderr)
        sys.exit(1)
    src = Path(sys.argv[1])
    if not src.exists():
        print(f"!! source missing: {src}", file=sys.stderr)
        sys.exit(1)

    # ChatGPT exports may name the file conversations.json or conversations-XXX.json
    conv_files = list(src.glob("conversations*.json"))
    if not conv_files:
        print(f"!! no conversations*.json found in {src}", file=sys.stderr)
        sys.exit(1)

    ARCHIVE.mkdir(parents=True, exist_ok=True)
    convs_dir = ARCHIVE / "conversations"
    convs_dir.mkdir(exist_ok=True)

    conversations: list[dict] = []
    for f in conv_files:
        print(f"Loading {f.name} ({f.stat().st_size // 1024 // 1024} MB)…")
        with open(f, "r", encoding="utf-8") as fh:
            data = json.load(fh)
        if isinstance(data, list):
            conversations.extend(data)
        elif isinstance(data, dict):
            conversations.append(data)
    print(f"  {len(conversations)} conversations total")

    by_year_month: dict[str, list[dict]] = defaultdict(list)
    written = 0
    for i, conv in enumerate(conversations, 1):
        if i % 100 == 0:
            print(f"  [{i}/{len(conversations)}]")
        result = write_conversation(conv, convs_dir)
        if result:
            out_path, created = result
            written += 1
            ym = created.strftime("%Y-%m")
            by_year_month[ym].append({
                "name": conv.get("title") or "(untitled)",
                "path": out_path,
                "created": created,
                "msgs": len(walk_displayed_thread(conv)),
            })
    print(f"  wrote {written} conversation files")

    # Per-month index files
    by_month_dir = ARCHIVE / "by-month"
    by_month_dir.mkdir(exist_ok=True)
    # NOTE: we DON'T wipe by_month/ here — Claude.ai exports may already
    # have written months. We update months we touch.
    for ym, items in by_year_month.items():
        target = by_month_dir / f"{ym}.md"
        items_sorted = sorted(
            items,
            key=lambda x: x["created"] or datetime.min.replace(tzinfo=timezone.utc),
            reverse=True,
        )
        lines = [
            "---",
            "type: web-archive-month",
            f"month: {ym}",
            f"conversations: {len(items_sorted)}",
            "---",
            "",
            f"# Web archive — {ym}",
            "",
            f"**{len(items_sorted)} ChatGPT conversations imported**",
            "",
            "| Date | Title | Msgs |",
            "|------|-------|-----:|",
        ]
        for it in items_sorted:
            d = (it["created"] or datetime.now(timezone.utc)).date()
            stem = it["path"].stem
            title = it["name"].replace("|", "\\|")[:90]
            lines.append(f"| {d} | [[web-archive/conversations/{stem}\\|{title}]] | {it['msgs']} |")
        # If a Claude.ai month index already exists, append the ChatGPT block
        if target.exists():
            existing = target.read_text(encoding="utf-8")
            target.write_text(existing + "\n\n---\n\n" + "\n".join(lines) + "\n", encoding="utf-8")
        else:
            target.write_text("\n".join(lines) + "\n", encoding="utf-8")

    print(f"\n✓ ChatGPT archive merged into {ARCHIVE}")


if __name__ == "__main__":
    main()

ChatGPT schema can drift. OpenAI occasionally tweaks the export format between releases. Inspect a sample conversations.json entry before running on the full set — verify the mapping tree, current_node, and message.content.parts still match what the script expects.

3. Same 10-agent backlink + consolidate pattern

Re-run split_web_assignments.py (it picks up new files), dispatch 10 agents with the web backlink prompt, then re-run consolidate.py.

Now your topics/prompt-engineering.md page has rows from your earliest ChatGPT explorations through your most recent Claude Code work. One brain, three data sources.

SECTION 06

Keep it alive: the /slay sync loop.

~ 30 seconds, weekly

Every time you finish a meaningful Claude Code session and want it folded into the brain, run /slay from any Claude Code window:

/slay

It runs an incremental export (skipping sessions that are already in the vault, so your backlinks aren't clobbered) and refreshes the project rollups + README. New sessions land with empty topics: [] until the next time you run a full backlink pass.

When to re-run the 10-agent backlink pass

You've accumulated 30+ new sessions since last backlink
You started a new big project and want it properly indexed
Search starts feeling "off" — old hubs don't reflect recent work

The same flow from section 02: split_assignments.py → 10 agents → consolidate.py → rebuild_rollups.py.

The skill never needs maintenance

It's read-only and just reads whatever's in the vault. As long as you keep the vault current with /slay, the skill stays useful.

SECTION 07

Gotchas & pitfalls.

Read before you hit them

Obsidian rewrites graph.json on quit. If you're patching graph view settings (color groups, force tweaks) via the file directly, do it while Obsidian is FULLY quit (Cmd+Q). Otherwise your changes get overwritten by in-memory state on the next save.

macOS APFS is case-insensitive. mv Home home silently deletes data. If you ever need to rename a folder case-only, go via a temp name first: Home → __tmp_home → home.

API keys lurking in old sessions. Before sharing the vault with anyone, grep for sk-, sk-ant-, AIzaSy, AKIA, and JWT signatures (eyJ...). Real keys often get pasted into chats and end up baked into the markdown.

Subagent files share their parent's sessionId. Naive dedup-by-sessionId will collapse all subagents into one file. The shipped export.py handles this by hashing the agent filename stem.

The 940-link starburst. If you don't generate monthly indexes, the web-archive README becomes a single mega-hub linking to every conversation, creating an unreadable starburst in the graph. The shipped import_claude_web.py writes per-month index files automatically.

ChatGPT export schema can change. Run on a small sample first. The shipped import_chatgpt.py is a working stub that follows the displayed-thread (current_node) chain, but verify a couple of conversations rendered correctly before processing the full export.

APPENDIX

Scaling tips for big vaults.

If your archive is 500+ conversations or your topic count balloons past a few hundred, here are optional tweaks. Skip unless you actually hit the issue.

Long-tail topic filtering

Re-run consolidate with --min-sessions N to only render hub pages for topics that connect ≥ N sessions. The long-tail topics stay in session frontmatter (so full-text search still finds them) but don't clutter the graph.

$ python3 ~/ClaudeCodeBrain/_build/scripts/consolidate.py --min-sessions 3

Messy folder names from worktrees

If your terminal projects have a lot of git worktrees, your projects/ directory may have duplicates like my-project, my-project-worktree-foo. Run an audit agent (read-only) to produce a rename plan, review it, then apply:

$ python3 ~/ClaudeCodeBrain/_build/scripts/apply_renames.py

scripts/apply_renames.py · 172 lines

#!/usr/bin/env python3
"""Apply the audit plan: rename project folders, merge where requested,
and rewrite all wikilinks across the vault.

Reads:
  _build/audit_plan.json  (produced by audit agent)

Notes:
  macOS APFS is case-insensitive. Case-only renames are done via a __tmp_ name first.
"""
from __future__ import annotations

import json
import re
import shutil
import sys
from pathlib import Path

VAULT = Path.home() / "ClaudeCodeBrain"
PROJECTS = VAULT / "projects"
AUDIT_PLAN = VAULT / "_build" / "audit_plan.json"


def safe_rename(src: Path, dst: Path) -> None:
    """Rename src → dst, handling case-only renames safely on APFS."""
    if src == dst:
        return
    if src.name.lower() == dst.name.lower() and src.parent == dst.parent:
        # Case-only rename: go via __tmp_ first
        tmp = src.parent / f"__tmp_{src.name}"
        src.rename(tmp)
        tmp.rename(dst)
        return
    src.rename(dst)


def merge_project(src: Path, dst: Path) -> None:
    """Merge src project into dst, copying sessions/subagents and removing src."""
    dst.mkdir(parents=True, exist_ok=True)
    for sub in ("sessions", "subagents"):
        s = src / sub
        d = dst / sub
        if not s.exists():
            continue
        d.mkdir(parents=True, exist_ok=True)
        for f in s.iterdir():
            target = d / f.name
            if target.exists():
                # Suffix to avoid collision
                i = 2
                while target.exists():
                    target = d / f"{f.stem}_{i}{f.suffix}"
                    i += 1
            shutil.move(str(f), str(target))
    # Drop the src directory (its _project.md will get regenerated)
    shutil.rmtree(src, ignore_errors=True)


def rewrite_wikilinks(vault: Path, name_map: dict[str, str]) -> int:
    """Walk every .md and rewrite [[projects/<old>/...]] references."""
    if not name_map:
        return 0
    # Build a single regex that matches any old name
    keys = sorted(name_map.keys(), key=len, reverse=True)
    pattern = re.compile(
        r"(\[\[projects/)("
        + "|".join(re.escape(k) for k in keys)
        + r")(/)"
    )
    # Also rewrite the YAML `project:` field
    project_field = re.compile(r"(?m)^project:\s*(\S+)\s*$")
    cwd_pattern = None  # we leave cwd alone

    edits = 0
    for md in vault.rglob("*.md"):
        if md.is_dir() or "/_build/" in str(md):
            continue
        try:
            text = md.read_text(encoding="utf-8")
        except Exception:
            continue
        new_text = pattern.sub(lambda m: m.group(1) + name_map[m.group(2)] + m.group(3), text)

        def _project_sub(m):
            old = m.group(1).strip()
            if old in name_map:
                return f"project: {name_map[old]}"
            return m.group(0)

        new_text = project_field.sub(_project_sub, new_text)
        if new_text != text:
            md.write_text(new_text, encoding="utf-8")
            edits += 1
    return edits


def main():
    if not AUDIT_PLAN.exists():
        print(f"!! audit plan missing: {AUDIT_PLAN}", file=sys.stderr)
        sys.exit(1)

    plan = json.loads(AUDIT_PLAN.read_text(encoding="utf-8"))
    projects_plan = plan.get("projects", [])

    # Build name map (old → new) and merge plan (old → merge_target)
    name_map: dict[str, str] = {}
    merges: list[tuple[str, str]] = []
    for entry in projects_plan:
        old = entry["current_name"]
        new = entry.get("canonical_name") or old
        merge_into = entry.get("merge_into")
        if merge_into:
            merges.append((old, merge_into))
        elif old != new:
            name_map[old] = new

    print(f"Plan: {len(name_map)} renames, {len(merges)} merges")

    # 1) Apply merges first (move files, then delete src dirs)
    for old, target in merges:
        src = PROJECTS / old
        dst = PROJECTS / target
        if not src.exists():
            print(f"  skip merge {old} → {target}: source missing")
            continue
        if not dst.exists():
            print(f"  rename instead of merge: {old} → {target} (target missing)")
            safe_rename(src, dst)
            continue
        print(f"  merging {old} → {target}")
        merge_project(src, dst)
        # After merge, links to the old name should point to target
        name_map[old] = target

    # 2) Apply renames
    for old, new in list(name_map.items()):
        src = PROJECTS / old
        dst = PROJECTS / new
        if not src.exists():
            # likely already merged
            continue
        if dst.exists() and src != dst:
            # collision — merge instead
            print(f"  collision on rename: {old} → {new}, merging")
            merge_project(src, dst)
            continue
        if old != new:
            print(f"  renaming {old} → {new}")
            safe_rename(src, dst)

    # 3) Rewrite wikilinks across the vault
    edits = rewrite_wikilinks(VAULT, name_map)
    print(f"  rewrote wikilinks in {edits} files")

    # 4) Regenerate _project.md rollups for affected projects
    # Simplest approach: re-run export to regenerate everything would clobber backlinks.
    # Instead we just leave the rollups; consolidate.py will refresh hub pages from reports.
    # Manually update the README.md project list:
    readme_path = VAULT / "README.md"
    if readme_path.exists():
        text = readme_path.read_text(encoding="utf-8")
        for old, new in name_map.items():
            text = text.replace(f"projects/{old}/_project", f"projects/{new}/_project")
            text = text.replace(f"|{old}]]", f"|{new}]]")
        readme_path.write_text(text, encoding="utf-8")

    print(f"\n✓ Renames + merges applied. Final project count: "
          f"{len([p for p in PROJECTS.iterdir() if p.is_dir()])}")


if __name__ == "__main__":
    main()

Customizing the graph view

Obsidian's graph view supports color groups (search-query → color) and CSS snippets. See Obsidian's graph docs. A reasonable starting config: dim everything under web-archive/conversations, color your active projects bright, hide README + monthly indexes from the visible graph via the filter -file:README -path:web-archive/by-month.

FINAL

You're done when…

ChatGPT export requested (1–3 day timer started)
Claude.ai export downloaded (~5 min)
~/ClaudeCodeBrain/ opens in Obsidian and the project index renders
10 backlink agent reports under _build/reports/
topics/, tools/, entities/ have hub pages
TOP-TOPICS.md shows your real workstreams in the leaderboard
~/.claude/skills/claude-brain/SKILL.md exists
~/.claude/commands/slay.md exists
Fresh Claude Code session — "let's work on X" → Claude reads the vault before responding
Claude.ai web archive folded in (sessions 04 done)
ChatGPT archive folded in (section 05 done, after the email arrives)
API keys grepped + redacted before sharing the vault