3.7 KiB
AMCS TODO
Embedding Backfill and Text-Search Fallback Audit
This file originally described the planned backfill_embeddings work and semantic-to-text fallback behavior. Most of that work is now implemented. This document now tracks what landed, what still needs verification, and what follow-up work remains.
For current operator-facing behavior, prefer README.md.
Status summary
Implemented
The main work described in this file is already present in the repo:
backfill_embeddingsMCP tool exists- missing-embedding selection helpers exist in the store layer
- embedding upsert helpers exist in the store layer
- semantic retrieval falls back to Postgres full-text search when the active model has no embeddings in scope
- fallback behavior is wired into the main query-driven tools
- a full-text index migration exists
- optional automatic backfill runner exists in config/startup flow
- retry and reparse maintenance tooling also exists around metadata quality
Still worth checking or improving
The broad feature is done, but some implementation-depth items are still worth tracking:
- test coverage around fallback/backfill behavior
- whether configured backfill batching is used consistently end-to-end
- observability depth beyond logs
- response visibility into which retrieval mode was used
What is already implemented
Backfill tool
Implemented:
backfill_embeddings- project scoping
- archived-thought filtering
- age filtering
- dry-run mode
- bounded concurrency
- best-effort per-item failure handling
- idempotent embedding upsert behavior
Search fallback
Implemented:
- full-text fallback when no embeddings exist for the active model in scope
- fallback helper shared by query-based tools
- full-text index migration on thought content
Tools using fallback
Implemented fallback coverage for:
search_thoughtsrecall_contextget_project_contextwhen a query is providedsummarize_thoughtswhen a query is provided- semantic neighbors in
related_thoughts
Optional automatic behavior
Implemented:
- config-gated startup backfill pass
- config-gated periodic backfill loop
Remaining follow-ups
1. Expose retrieval mode in responses
Still outstanding.
Why it matters:
- callers currently benefit from fallback automatically
- but debugging is easier if responses explicitly say whether retrieval was
semanticortext
Suggested shape:
- add a machine-readable field such as
retrieval_mode: semantic|text - keep it consistent across all query-based tools that use shared retrieval logic
2. Verify and improve tests
Still worth auditing.
Recommended checks:
- no-embedding scope falls back to text search
- project-scoped fallback only searches within project scope
- archived thoughts remain excluded by default
related_thoughtsfalls back correctly when semantic vectors are unavailable- backfill creates embeddings that later restore semantic search
3. Re-embedding / migration ergonomics
Still optional future work.
Potential additions:
- count missing embeddings by project
- add
missing_embeddingsstats tothought_stats - add a controlled re-embed or reindex flow for model migrations
Notes for maintainers
Do not read this file as an untouched future roadmap item anymore. The repo has already implemented the core work described here.
If more backfill/fallback work is planned, append it as concrete follow-ups against the current codebase rather than preserving the old speculative rollout order.
Historical note
The original long-form proposal was replaced during the repo audit because it described work that is now largely complete and was causing issue/document drift.
If needed, recover the older version from git history.