127 lines
3.7 KiB
Markdown
127 lines
3.7 KiB
Markdown
# AMCS TODO
|
|
## Embedding Backfill and Text-Search Fallback Audit
|
|
|
|
This file originally described the planned `backfill_embeddings` work and semantic-to-text fallback behavior. Most of that work is now implemented. This document now tracks what landed, what still needs verification, and what follow-up work remains.
|
|
|
|
For current operator-facing behavior, prefer `README.md`.
|
|
|
|
---
|
|
|
|
## Status summary
|
|
|
|
### Implemented
|
|
|
|
The main work described in this file is already present in the repo:
|
|
|
|
- `backfill_embeddings` MCP tool exists
|
|
- missing-embedding selection helpers exist in the store layer
|
|
- embedding upsert helpers exist in the store layer
|
|
- semantic retrieval falls back to Postgres full-text search when the active model has no embeddings in scope
|
|
- fallback behavior is wired into the main query-driven tools
|
|
- a full-text index migration exists
|
|
- optional automatic backfill runner exists in config/startup flow
|
|
- retry and reparse maintenance tooling also exists around metadata quality
|
|
|
|
### Still worth checking or improving
|
|
|
|
The broad feature is done, but some implementation-depth items are still worth tracking:
|
|
|
|
- test coverage around fallback/backfill behavior
|
|
- whether configured backfill batching is used consistently end-to-end
|
|
- observability depth beyond logs
|
|
- response visibility into which retrieval mode was used
|
|
|
|
---
|
|
|
|
## What is already implemented
|
|
|
|
### Backfill tool
|
|
|
|
Implemented:
|
|
|
|
- `backfill_embeddings`
|
|
- project scoping
|
|
- archived-thought filtering
|
|
- age filtering
|
|
- dry-run mode
|
|
- bounded concurrency
|
|
- best-effort per-item failure handling
|
|
- idempotent embedding upsert behavior
|
|
|
|
### Search fallback
|
|
|
|
Implemented:
|
|
|
|
- full-text fallback when no embeddings exist for the active model in scope
|
|
- fallback helper shared by query-based tools
|
|
- full-text index migration on thought content
|
|
|
|
### Tools using fallback
|
|
|
|
Implemented fallback coverage for:
|
|
|
|
- `search_thoughts`
|
|
- `recall_context`
|
|
- `get_project_context` when a query is provided
|
|
- `summarize_thoughts` when a query is provided
|
|
- semantic neighbors in `related_thoughts`
|
|
|
|
### Optional automatic behavior
|
|
|
|
Implemented:
|
|
|
|
- config-gated startup backfill pass
|
|
- config-gated periodic backfill loop
|
|
|
|
---
|
|
|
|
## Remaining follow-ups
|
|
|
|
### 1. Expose retrieval mode in responses
|
|
|
|
Still outstanding.
|
|
|
|
Why it matters:
|
|
- callers currently benefit from fallback automatically
|
|
- but debugging is easier if responses explicitly say whether retrieval was `semantic` or `text`
|
|
|
|
Suggested shape:
|
|
- add a machine-readable field such as `retrieval_mode: semantic|text`
|
|
- keep it consistent across all query-based tools that use shared retrieval logic
|
|
|
|
### 2. Verify and improve tests
|
|
|
|
Still worth auditing.
|
|
|
|
Recommended checks:
|
|
- no-embedding scope falls back to text search
|
|
- project-scoped fallback only searches within project scope
|
|
- archived thoughts remain excluded by default
|
|
- `related_thoughts` falls back correctly when semantic vectors are unavailable
|
|
- backfill creates embeddings that later restore semantic search
|
|
|
|
### 3. Re-embedding / migration ergonomics
|
|
|
|
Still optional future work.
|
|
|
|
Potential additions:
|
|
- count missing embeddings by project
|
|
- add `missing_embeddings` stats to `thought_stats`
|
|
- add a controlled re-embed or reindex flow for model migrations
|
|
|
|
---
|
|
|
|
## Notes for maintainers
|
|
|
|
Do not read this file as an untouched future roadmap item anymore. The repo has already implemented the core work described here.
|
|
|
|
If more backfill/fallback work is planned, append it as concrete follow-ups against the current codebase rather than preserving the old speculative rollout order.
|
|
|
|
---
|
|
|
|
## Historical note
|
|
|
|
The original long-form proposal was replaced during the repo audit because it described work that is now largely complete and was causing issue/document drift.
|
|
|
|
If needed, recover the older version from git history.
|