* Serve LLM instructions at `/llm` * Include markdown content for memory instructions * Update README with LLM integration details * Add tests for LLM instructions handler * Modify database migrations to use GUIDs for thoughts and projects
12 KiB
AMCS TODO
Auto Embedding Backfill Tool
Objective
Add an MCP tool that automatically backfills missing embeddings for existing thoughts so semantic search keeps working after:
- embedding model changes
- earlier capture or update failures
- import or migration of raw thoughts without vectors
The tool should be safe to run repeatedly, should not duplicate work, and should make it easy to restore semantic coverage without rewriting existing thoughts.
Desired outcome
After this work:
- raw thought text remains the source of truth
- embeddings are treated as derived data per model
- search continues to query only embeddings from the active embedding model
- when no embeddings exist for the active model and scope, search falls back to Postgres text search
- operators or MCP clients can trigger a backfill for the current model
- AMCS can optionally auto-run a limited backfill pass on startup or on a schedule later
Why this is needed
Current search behavior is model-specific:
- query text is embedded with the configured provider model
- results are filtered by
embeddings.model - thoughts with no embedding for that model are invisible to semantic search
This means a model switch leaves old thoughts searchable only by listing and metadata filters until new embeddings are generated.
To avoid that dead zone, AMCS should also support a lexical fallback path backed by native Postgres text-search indexing.
Tool proposal
New MCP tool
backfill_embeddings
Purpose:
- find thoughts missing an embedding for the active model
- generate embeddings in batches
- write embeddings with upsert semantics
- report counts for scanned, embedded, skipped, and failed thoughts
Input
{
"project": "optional project name or id",
"limit": 100,
"batch_size": 20,
"include_archived": false,
"older_than_days": 0,
"dry_run": false
}
Notes:
projectscopes the backfill to a project when desiredlimitcaps total thoughts processed in one tool callbatch_sizecontrols provider loadinclude_archiveddefaults tofalseolder_than_daysis optional and mainly useful to avoid racing with fresh writesdry_runreturns counts and sample IDs without calling the embedding provider
Output
{
"model": "openai/text-embedding-3-small",
"scanned": 100,
"embedded": 87,
"skipped": 13,
"failed": 0,
"dry_run": false,
"failures": []
}
Optional:
- include a short
next_cursorlater if we add cursor-based paging
Backfill behavior
Core rules
- Backfill only when a thought is missing an embedding row for the active model.
- Do not recompute embeddings that already exist for that model unless an explicit future
forceflag is added. - Keep embeddings per model side by side in the existing
embeddingstable. - Use
insert ... on conflict (thought_id, model) do updateso retries stay idempotent.
Selection query
Add a store query that returns thoughts where no embedding exists for the requested model.
Shape:
- from
thoughts t - left join
embeddings e on e.thought_id = t.guid and e.model = $model - filter
e.id is null - optional filters for project, archived state, age
- order by
t.created_at asc - limit by requested batch
Ordering oldest first is useful because it steadily restores long-tail recall instead of repeatedly revisiting recent writes.
Processing loop
For each selected thought:
- read
content - call
provider.Embed(content) - upsert embedding row for
thought_id + model - continue on per-item failure and collect errors
Use bounded concurrency instead of fully serial processing so large backfills complete in reasonable time without overwhelming the provider.
Recommended first pass:
- one tool invocation handles batches internally
- concurrency defaults to a small fixed number like
4 batch_sizeand concurrency are kept server-side defaults at first, even if onlylimitis exposed in MCP input
Search fallback behavior
Goal
If semantic retrieval cannot run because no embeddings exist for the active model in the selected scope, AMCS should fall back to Postgres text search instead of returning empty semantic results by default.
Fallback rules
-
If embeddings exist for the active model, keep using vector search as the primary path.
-
If no embeddings exist for the active model in scope, run Postgres text search against raw thought content.
-
Fallback should apply to:
search_thoughtsrecall_contextget_project_contextwhenqueryis providedsummarize_thoughtswhenqueryis provided- semantic neighbors in
related_thoughts
-
Fallback should not mutate data. It is retrieval-only.
-
Backfill remains the long-term fix; text search is the immediate safety net.
Postgres search approach
Add a native full-text index on thought content and query it with a matching text-search configuration.
Recommended first pass:
- add a migration creating a GIN index on
to_tsvector('simple', content) - use
websearch_to_tsquery('simple', $query)for user-entered text - rank results with
ts_rank_cd(...) - continue excluding archived thoughts by default
- continue honoring project scope
Using the simple configuration is a safer default for mixed prose, identifiers, and code-ish text than a language-specific stemmer.
Store additions for fallback
Add store methods such as:
HasEmbeddingsForModel(ctx, model string, projectID *uuid.UUID) (bool, error)SearchThoughtsText(ctx, query string, limit int, projectID *uuid.UUID, excludeID *uuid.UUID) ([]SearchResult, error)
These should be used by a shared retrieval helper in internal/tools so semantic callers degrade consistently.
Notes on ranking
Text-search scores will not be directly comparable to vector similarity scores.
That is acceptable in v1 because:
- each request will use one retrieval mode at a time
- fallback is only used when semantic search is unavailable
- response payloads can continue to return
similarityas a generic relevance score
Auto behavior
The user asked for an auto backfill tool, so define two layers:
Layer 1: explicit MCP tool
Ship backfill_embeddings first.
This is the lowest-risk path because:
- it is observable
- it is rate-limited by the caller
- it avoids surprise provider cost on startup
Layer 2: optional automatic runner
Add a config-gated background runner after the tool exists and is proven stable.
Config sketch:
backfill:
enabled: false
run_on_startup: false
interval: "15m"
batch_size: 20
max_per_run: 100
include_archived: false
Behavior:
- on startup, if enabled and
run_on_startup=true, run a small bounded backfill pass - if
intervalis set, periodically backfill missing embeddings for the active configured model - log counts and failures, but never block server startup on backfill failure
This keeps the first implementation simple while still giving us a clean path to true automation.
Store changes
Add store methods focused on missing-model coverage.
New methods
ListThoughtsMissingEmbedding(ctx, model string, limit int, projectID *uuid.UUID, includeArchived bool, olderThanDays int) ([]Thought, error)UpsertEmbedding(ctx, thoughtID uuid.UUID, model string, embedding []float32) error
Optional later methods
CountThoughtsMissingEmbedding(ctx, model string, projectID *uuid.UUID, includeArchived bool) (int, error)ListThoughtIDsMissingEmbeddingPage(...)for cursor-based paging on large datasets
Why separate UpsertEmbedding
InsertThought and UpdateThought already contain embedding upsert logic, but a dedicated helper will:
- reduce duplication
- let backfill avoid full thought updates
- make future re-embedding jobs cleaner
Tooling changes
New file
internal/tools/backfill.go
Responsibilities:
- parse input
- resolve project if provided
- select missing thoughts
- run bounded embedding generation
- record per-item failures without aborting the whole batch
- return summary counts
MCP registration
Add the tool to:
internal/mcpserver/server.gointernal/mcpserver/schema.goand tests if neededinternal/app/app.gowiring
Suggested tool description:
Generate missing embeddings for stored thoughts using the active embedding model.
Config changes
No config is required for the first manual tool beyond the existing embedding provider settings.
For the later automatic runner, add:
backfill.enabledbackfill.run_on_startupbackfill.intervalbackfill.batch_sizebackfill.max_per_runbackfill.include_archived
Validation rules:
batch_size > 0max_per_run >= batch_sizeintervalmust parse when provided
Failure handling
The backfill tool should be best-effort, not all-or-nothing.
Rules:
- one thought failure does not abort the full run
- provider errors are captured and counted
- database upsert failures are captured and counted
- final tool response includes truncated failure details
- full details go to logs
Failure payloads should avoid returning raw thought content to the caller if that would create noisy or sensitive responses. Prefer thought IDs plus short error strings.
Observability
Add structured logs for:
- selected model
- project scope
- scan count
- success count
- failure count
- duration
Later, metrics can include:
amcs_backfill_runs_totalamcs_backfill_embeddings_totalamcs_backfill_failures_totalamcs_thoughts_missing_embeddings
Concurrency and rate limiting
Keep the first version conservative.
Plan:
- use a worker pool with a small fixed concurrency
- keep batch sizes small by default
- stop fetching new work once
limitis reached - respect
ctxcancellation so long backfills can be interrupted cleanly
Do not add provider-specific rate-limit logic in v1 unless real failures show it is needed.
Security and safety
- Reuse existing MCP auth.
- Do not expose a broad
force=trueoption in v1. - Default to non-archived thoughts only.
- Do not mutate raw thought text or metadata during backfill.
- Treat embeddings as derived data that may be regenerated safely.
Testing plan
Store tests
Add tests for:
- listing thoughts missing embeddings for a model
- project-scoped missing-embedding queries
- archived thought filtering
- idempotent upsert behavior
Tool tests
Add tests for:
- dry-run mode
- successful batch embedding
- partial provider failures
- empty result set
- project resolution
- context cancellation
Integration tests
Add a flow covering:
- create thoughts without embeddings for a target model
- run
backfill_embeddings - confirm rows exist in
embeddings - confirm
search_thoughtscan now retrieve them when using that model
Fallback search tests
Add coverage for:
- no embeddings for model ->
search_thoughtsuses Postgres text search - project-scoped queries only search matching project thoughts
- archived thoughts stay excluded by default
related_thoughtsfalls back to text search neighbors when semantic vectors are unavailable- once embeddings exist, semantic search remains the primary path
Rollout order
- Add store helpers for missing-embedding selection and embedding upsert.
- Add Postgres full-text index migration and text-search store helpers.
- Add shared semantic-or-text fallback retrieval logic for query-based tools.
- Add
backfill_embeddingsMCP tool and wire it into the server. - Add unit and integration tests.
- Document usage in
README.md. - Add optional background auto-runner behind config.
- Consider a future
forceorreindex_modelpath only after v1 is stable.
Open questions
- Should the tool expose
batch_sizeto clients, or should batching stay internal? - Should the first version support only the active model, or allow a
modeloverride for admins? - Should archived thoughts be backfilled by default during startup jobs but not MCP calls?
- Do we want a separate CLI/admin command for large one-time reindex jobs outside MCP?
Recommended answers for v1:
- keep batching mostly internal
- use only the active configured model
- exclude archived thoughts by default everywhere
- postpone a dedicated CLI until volume justifies it
Nice follow-ups
- add a
missing_embeddingsstat tothought_stats - expose a read-only tool for counting missing embeddings by project
- add a re-embed path for migrating from one model to another in controlled waves
- add metadata extraction backfill as a separate job if imported content often lacks metadata
- expose the retrieval mode in responses for easier debugging of semantic vs text fallback