feat(backfill): implement backfill tool for generating missing embeddings

This commit is contained in:
2026-03-26 22:45:28 +02:00
parent 1dde7f233d
commit f4ef0e9163
19 changed files with 575 additions and 37 deletions

View File

@@ -41,6 +41,7 @@ A Go MCP server for capturing and retrieving thoughts, memory, and project conte
| `recall_context` | Semantic + recency context block for injection |
| `link_thoughts` | Create a typed relationship between thoughts |
| `related_thoughts` | Explicit links + semantic neighbours |
| `backfill_embeddings` | Generate missing embeddings for stored thoughts |
## Configuration
@@ -74,6 +75,89 @@ Alternatively, pass `client_id` and `client_secret` as body parameters instead o
See `llm/plan.md` for full architecture and implementation plan.
## Backfill
Run `backfill_embeddings` after switching embedding models or importing thoughts without vectors.
```json
{
"project": "optional-project-name",
"limit": 100,
"include_archived": false,
"older_than_days": 0,
"dry_run": false
}
```
- `dry_run: true` — report counts without calling the embedding provider
- `limit` — max thoughts per call (default 100)
- Embeddings are generated in parallel (4 workers) and upserted; one failure does not abort the run
**Automatic backfill** (optional, config-gated):
```yaml
backfill:
enabled: true
run_on_startup: true # run once on server start
interval: "15m" # repeat every 15 minutes
batch_size: 20
max_per_run: 100
include_archived: false
```
**Search fallback**: when no embeddings exist for the active model in scope, `search_thoughts`, `recall_context`, `get_project_context`, `summarize_thoughts`, and `related_thoughts` automatically fall back to Postgres full-text search so results are never silently empty.
## Client Setup
### Claude Code
```bash
# API key auth
claude mcp add --transport http amcs http://localhost:8080/mcp --header "x-brain-key: <key>"
# Bearer token auth
claude mcp add --transport http amcs http://localhost:8080/mcp --header "Authorization: Bearer <token>"
```
### OpenAI Codex
Add to `~/.codex/config.toml`:
```toml
[[mcp_servers]]
name = "amcs"
url = "http://localhost:8080/mcp"
[mcp_servers.headers]
x-brain-key = "<key>"
```
### OpenCode
```bash
# API key auth
opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "x-brain-key=<key>"
# Bearer token auth
opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "Authorization=Bearer <token>"
```
Or add directly to `opencode.json` / `~/.config/opencode/config.json`:
```json
{
"mcp": {
"amcs": {
"type": "remote",
"url": "http://localhost:8080/mcp",
"headers": {
"x-brain-key": "<key>"
}
}
}
}
```
## Development
Run the SQL migrations against a local database with: