# AMCS TODO ## Embedding Backfill and Text-Search Fallback Audit This file originally described the planned `backfill_embeddings` work and semantic-to-text fallback behavior. Most of that work is now implemented. This document now tracks what landed, what still needs verification, and what follow-up work remains. For current operator-facing behavior, prefer `README.md`. --- ## Status summary ### Implemented The main work described in this file is already present in the repo: - `backfill_embeddings` MCP tool exists - missing-embedding selection helpers exist in the store layer - embedding upsert helpers exist in the store layer - semantic retrieval falls back to Postgres full-text search when the active model has no embeddings in scope - fallback behavior is wired into the main query-driven tools - a full-text index migration exists - optional automatic backfill runner exists in config/startup flow - retry and reparse maintenance tooling also exists around metadata quality ### Still worth checking or improving The broad feature is done, but some implementation-depth items are still worth tracking: - test coverage around fallback/backfill behavior - whether configured backfill batching is used consistently end-to-end - observability depth beyond logs - response visibility into which retrieval mode was used --- ## What is already implemented ### Backfill tool Implemented: - `backfill_embeddings` - project scoping - archived-thought filtering - age filtering - dry-run mode - bounded concurrency - best-effort per-item failure handling - idempotent embedding upsert behavior ### Search fallback Implemented: - full-text fallback when no embeddings exist for the active model in scope - fallback helper shared by query-based tools - full-text index migration on thought content ### Tools using fallback Implemented fallback coverage for: - `search_thoughts` - `recall_context` - `get_project_context` when a query is provided - `summarize_thoughts` when a query is provided - semantic neighbors in `related_thoughts` ### Optional automatic behavior Implemented: - config-gated startup backfill pass - config-gated periodic backfill loop --- ## Remaining follow-ups ### 1. Expose retrieval mode in responses Still outstanding. Why it matters: - callers currently benefit from fallback automatically - but debugging is easier if responses explicitly say whether retrieval was `semantic` or `text` Suggested shape: - add a machine-readable field such as `retrieval_mode: semantic|text` - keep it consistent across all query-based tools that use shared retrieval logic ### 2. Verify and improve tests Still worth auditing. Recommended checks: - no-embedding scope falls back to text search - project-scoped fallback only searches within project scope - archived thoughts remain excluded by default - `related_thoughts` falls back correctly when semantic vectors are unavailable - backfill creates embeddings that later restore semantic search ### 3. Re-embedding / migration ergonomics Still optional future work. Potential additions: - count missing embeddings by project - add `missing_embeddings` stats to `thought_stats` - add a controlled re-embed or reindex flow for model migrations --- ## Notes for maintainers Do not read this file as an untouched future roadmap item anymore. The repo has already implemented the core work described here. If more backfill/fallback work is planned, append it as concrete follow-ups against the current codebase rather than preserving the old speculative rollout order. --- ## Historical note The original long-form proposal was replaced during the repo audit because it described work that is now largely complete and was causing issue/document drift. If needed, recover the older version from git history.