feat(backfill): implement backfill tool for generating missing embeddings
This commit is contained in:
84
README.md
84
README.md
@@ -41,6 +41,7 @@ A Go MCP server for capturing and retrieving thoughts, memory, and project conte
|
||||
| `recall_context` | Semantic + recency context block for injection |
|
||||
| `link_thoughts` | Create a typed relationship between thoughts |
|
||||
| `related_thoughts` | Explicit links + semantic neighbours |
|
||||
| `backfill_embeddings` | Generate missing embeddings for stored thoughts |
|
||||
|
||||
## Configuration
|
||||
|
||||
@@ -74,6 +75,89 @@ Alternatively, pass `client_id` and `client_secret` as body parameters instead o
|
||||
|
||||
See `llm/plan.md` for full architecture and implementation plan.
|
||||
|
||||
## Backfill
|
||||
|
||||
Run `backfill_embeddings` after switching embedding models or importing thoughts without vectors.
|
||||
|
||||
```json
|
||||
{
|
||||
"project": "optional-project-name",
|
||||
"limit": 100,
|
||||
"include_archived": false,
|
||||
"older_than_days": 0,
|
||||
"dry_run": false
|
||||
}
|
||||
```
|
||||
|
||||
- `dry_run: true` — report counts without calling the embedding provider
|
||||
- `limit` — max thoughts per call (default 100)
|
||||
- Embeddings are generated in parallel (4 workers) and upserted; one failure does not abort the run
|
||||
|
||||
**Automatic backfill** (optional, config-gated):
|
||||
|
||||
```yaml
|
||||
backfill:
|
||||
enabled: true
|
||||
run_on_startup: true # run once on server start
|
||||
interval: "15m" # repeat every 15 minutes
|
||||
batch_size: 20
|
||||
max_per_run: 100
|
||||
include_archived: false
|
||||
```
|
||||
|
||||
**Search fallback**: when no embeddings exist for the active model in scope, `search_thoughts`, `recall_context`, `get_project_context`, `summarize_thoughts`, and `related_thoughts` automatically fall back to Postgres full-text search so results are never silently empty.
|
||||
|
||||
## Client Setup
|
||||
|
||||
### Claude Code
|
||||
|
||||
```bash
|
||||
# API key auth
|
||||
claude mcp add --transport http amcs http://localhost:8080/mcp --header "x-brain-key: <key>"
|
||||
|
||||
# Bearer token auth
|
||||
claude mcp add --transport http amcs http://localhost:8080/mcp --header "Authorization: Bearer <token>"
|
||||
```
|
||||
|
||||
### OpenAI Codex
|
||||
|
||||
Add to `~/.codex/config.toml`:
|
||||
|
||||
```toml
|
||||
[[mcp_servers]]
|
||||
name = "amcs"
|
||||
url = "http://localhost:8080/mcp"
|
||||
|
||||
[mcp_servers.headers]
|
||||
x-brain-key = "<key>"
|
||||
```
|
||||
|
||||
### OpenCode
|
||||
|
||||
```bash
|
||||
# API key auth
|
||||
opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "x-brain-key=<key>"
|
||||
|
||||
# Bearer token auth
|
||||
opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "Authorization=Bearer <token>"
|
||||
```
|
||||
|
||||
Or add directly to `opencode.json` / `~/.config/opencode/config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"mcp": {
|
||||
"amcs": {
|
||||
"type": "remote",
|
||||
"url": "http://localhost:8080/mcp",
|
||||
"headers": {
|
||||
"x-brain-key": "<key>"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Development
|
||||
|
||||
Run the SQL migrations against a local database with:
|
||||
|
||||
Reference in New Issue
Block a user