feat(backfill): implement backfill tool for generating missing embeddings

2026-03-26 22:45:28 +02:00
parent 1dde7f233d
commit f4ef0e9163
19 changed files with 575 additions and 37 deletions
@@ -41,6 +41,7 @@ A Go MCP server for capturing and retrieving thoughts, memory, and project conte
 | `recall_context` | Semantic + recency context block for injection |
 | `link_thoughts` | Create a typed relationship between thoughts |
 | `related_thoughts` | Explicit links + semantic neighbours |
+| `backfill_embeddings` | Generate missing embeddings for stored thoughts |

 ## Configuration

@@ -74,6 +75,89 @@ Alternatively, pass `client_id` and `client_secret` as body parameters instead o

 See `llm/plan.md` for full architecture and implementation plan.

+## Backfill
+
+Run `backfill_embeddings` after switching embedding models or importing thoughts without vectors.
+
+```json
+{
+  "project": "optional-project-name",
+  "limit": 100,
+  "include_archived": false,
+  "older_than_days": 0,
+  "dry_run": false
+}
+```
+
+- `dry_run: true` — report counts without calling the embedding provider
+- `limit` — max thoughts per call (default 100)
+- Embeddings are generated in parallel (4 workers) and upserted; one failure does not abort the run
+
+**Automatic backfill** (optional, config-gated):
+
+```yaml
+backfill:
+  enabled: true
+  run_on_startup: true   # run once on server start
+  interval: "15m"        # repeat every 15 minutes
+  batch_size: 20
+  max_per_run: 100
+  include_archived: false
+```
+
+**Search fallback**: when no embeddings exist for the active model in scope, `search_thoughts`, `recall_context`, `get_project_context`, `summarize_thoughts`, and `related_thoughts` automatically fall back to Postgres full-text search so results are never silently empty.
+
+## Client Setup
+
+### Claude Code
+
+```bash
+# API key auth
+claude mcp add --transport http amcs http://localhost:8080/mcp --header "x-brain-key: <key>"
+
+# Bearer token auth
+claude mcp add --transport http amcs http://localhost:8080/mcp --header "Authorization: Bearer <token>"
+```
+
+### OpenAI Codex
+
+Add to `~/.codex/config.toml`:
+
+```toml
+[[mcp_servers]]
+name = "amcs"
+url  = "http://localhost:8080/mcp"
+
+[mcp_servers.headers]
+x-brain-key = "<key>"
+```
+
+### OpenCode
+
+```bash
+# API key auth
+opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "x-brain-key=<key>"
+
+# Bearer token auth
+opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "Authorization=Bearer <token>"
+```
+
+Or add directly to `opencode.json` / `~/.config/opencode/config.json`:
+
+```json
+{
+  "mcp": {
+    "amcs": {
+      "type": "remote",
+      "url": "http://localhost:8080/mcp",
+      "headers": {
+        "x-brain-key": "<key>"
+      }
+    }
+  }
+}
+```
+
 ## Development

 Run the SQL migrations against a local database with: