wdevs/amcs

Files

Jack O'Neill 7c41a3e846 docs: audit plan and todo status

2026-04-03 13:37:45 +02:00

3.7 KiB

Raw Blame History

AMCS TODO

Embedding Backfill and Text-Search Fallback Audit

This file originally described the planned backfill_embeddings work and semantic-to-text fallback behavior. Most of that work is now implemented. This document now tracks what landed, what still needs verification, and what follow-up work remains.

For current operator-facing behavior, prefer README.md.

Status summary

Implemented

The main work described in this file is already present in the repo:

backfill_embeddings MCP tool exists
missing-embedding selection helpers exist in the store layer
embedding upsert helpers exist in the store layer
semantic retrieval falls back to Postgres full-text search when the active model has no embeddings in scope
fallback behavior is wired into the main query-driven tools
a full-text index migration exists
optional automatic backfill runner exists in config/startup flow
retry and reparse maintenance tooling also exists around metadata quality

Still worth checking or improving

The broad feature is done, but some implementation-depth items are still worth tracking:

test coverage around fallback/backfill behavior
whether configured backfill batching is used consistently end-to-end
observability depth beyond logs
response visibility into which retrieval mode was used

What is already implemented

Backfill tool

Implemented:

backfill_embeddings
project scoping
archived-thought filtering
age filtering
dry-run mode
bounded concurrency
best-effort per-item failure handling
idempotent embedding upsert behavior

Search fallback

Implemented:

full-text fallback when no embeddings exist for the active model in scope
fallback helper shared by query-based tools
full-text index migration on thought content

Tools using fallback

Implemented fallback coverage for:

search_thoughts
recall_context
get_project_context when a query is provided
summarize_thoughts when a query is provided
semantic neighbors in related_thoughts

Optional automatic behavior

Implemented:

config-gated startup backfill pass
config-gated periodic backfill loop

Remaining follow-ups

1. Expose retrieval mode in responses

Still outstanding.

Why it matters:

callers currently benefit from fallback automatically
but debugging is easier if responses explicitly say whether retrieval was semantic or text

Suggested shape:

add a machine-readable field such as retrieval_mode: semantic|text
keep it consistent across all query-based tools that use shared retrieval logic

2. Verify and improve tests

Still worth auditing.

Recommended checks:

no-embedding scope falls back to text search
project-scoped fallback only searches within project scope
archived thoughts remain excluded by default
related_thoughts falls back correctly when semantic vectors are unavailable
backfill creates embeddings that later restore semantic search

3. Re-embedding / migration ergonomics

Still optional future work.

Potential additions:

count missing embeddings by project
add missing_embeddings stats to thought_stats
add a controlled re-embed or reindex flow for model migrations

Notes for maintainers

Do not read this file as an untouched future roadmap item anymore. The repo has already implemented the core work described here.

If more backfill/fallback work is planned, append it as concrete follow-ups against the current codebase rather than preserving the old speculative rollout order.

Historical note

The original long-form proposal was replaced during the repo audit because it described work that is now largely complete and was causing issue/document drift.

If needed, recover the older version from git history.

3.7 KiB Raw Blame History

AMCS TODO

Embedding Backfill and Text-Search Fallback Audit

Status summary

Implemented

Still worth checking or improving

What is already implemented

Backfill tool

Search fallback

Tools using fallback

Optional automatic behavior

Remaining follow-ups

1. Expose retrieval mode in responses

2. Verify and improve tests

3. Re-embedding / migration ergonomics

Notes for maintainers

Historical note

3.7 KiB

Raw Blame History