wdevs/amcs

Files

Hein 8d0a91a961 feat(llm): add LLM integration instructions and handler

* Serve LLM instructions at `/llm`
* Include markdown content for memory instructions
* Update README with LLM integration details
* Add tests for LLM instructions handler
* Modify database migrations to use GUIDs for thoughts and projects

2026-03-25 18:02:42 +02:00

12 KiB

Raw Blame History

AMCS TODO

Auto Embedding Backfill Tool

Objective

Add an MCP tool that automatically backfills missing embeddings for existing thoughts so semantic search keeps working after:

embedding model changes
earlier capture or update failures
import or migration of raw thoughts without vectors

The tool should be safe to run repeatedly, should not duplicate work, and should make it easy to restore semantic coverage without rewriting existing thoughts.

Desired outcome

After this work:

raw thought text remains the source of truth
embeddings are treated as derived data per model
search continues to query only embeddings from the active embedding model
when no embeddings exist for the active model and scope, search falls back to Postgres text search
operators or MCP clients can trigger a backfill for the current model
AMCS can optionally auto-run a limited backfill pass on startup or on a schedule later

Why this is needed

Current search behavior is model-specific:

query text is embedded with the configured provider model
results are filtered by embeddings.model
thoughts with no embedding for that model are invisible to semantic search

This means a model switch leaves old thoughts searchable only by listing and metadata filters until new embeddings are generated.

To avoid that dead zone, AMCS should also support a lexical fallback path backed by native Postgres text-search indexing.

Tool proposal

New MCP tool

backfill_embeddings

Purpose:

find thoughts missing an embedding for the active model
generate embeddings in batches
write embeddings with upsert semantics
report counts for scanned, embedded, skipped, and failed thoughts

Input

{
  "project": "optional project name or id",
  "limit": 100,
  "batch_size": 20,
  "include_archived": false,
  "older_than_days": 0,
  "dry_run": false
}

Notes:

project scopes the backfill to a project when desired
limit caps total thoughts processed in one tool call
batch_size controls provider load
include_archived defaults to false
older_than_days is optional and mainly useful to avoid racing with fresh writes
dry_run returns counts and sample IDs without calling the embedding provider

Output

{
  "model": "openai/text-embedding-3-small",
  "scanned": 100,
  "embedded": 87,
  "skipped": 13,
  "failed": 0,
  "dry_run": false,
  "failures": []
}

Optional:

include a short next_cursor later if we add cursor-based paging

Backfill behavior

Core rules

Backfill only when a thought is missing an embedding row for the active model.
Do not recompute embeddings that already exist for that model unless an explicit future force flag is added.
Keep embeddings per model side by side in the existing embeddings table.
Use insert ... on conflict (thought_id, model) do update so retries stay idempotent.

Selection query

Add a store query that returns thoughts where no embedding exists for the requested model.

Shape:

from thoughts t
left join embeddings e on e.thought_id = t.guid and e.model = $model
filter e.id is null
optional filters for project, archived state, age
order by t.created_at asc
limit by requested batch

Ordering oldest first is useful because it steadily restores long-tail recall instead of repeatedly revisiting recent writes.

Processing loop

For each selected thought:

read content
call provider.Embed(content)
upsert embedding row for thought_id + model
continue on per-item failure and collect errors

Use bounded concurrency instead of fully serial processing so large backfills complete in reasonable time without overwhelming the provider.

Recommended first pass:

one tool invocation handles batches internally
concurrency defaults to a small fixed number like 4
batch_size and concurrency are kept server-side defaults at first, even if only limit is exposed in MCP input

Search fallback behavior

Goal

If semantic retrieval cannot run because no embeddings exist for the active model in the selected scope, AMCS should fall back to Postgres text search instead of returning empty semantic results by default.

Fallback rules

If embeddings exist for the active model, keep using vector search as the primary path.
If no embeddings exist for the active model in scope, run Postgres text search against raw thought content.
Fallback should apply to:
- search_thoughts
- recall_context
- get_project_context when query is provided
- summarize_thoughts when query is provided
- semantic neighbors in related_thoughts
Fallback should not mutate data. It is retrieval-only.
Backfill remains the long-term fix; text search is the immediate safety net.

Postgres search approach

Add a native full-text index on thought content and query it with a matching text-search configuration.

Recommended first pass:

add a migration creating a GIN index on to_tsvector('simple', content)
use websearch_to_tsquery('simple', $query) for user-entered text
rank results with ts_rank_cd(...)
continue excluding archived thoughts by default
continue honoring project scope

Using the simple configuration is a safer default for mixed prose, identifiers, and code-ish text than a language-specific stemmer.

Store additions for fallback

Add store methods such as:

HasEmbeddingsForModel(ctx, model string, projectID *uuid.UUID) (bool, error)
SearchThoughtsText(ctx, query string, limit int, projectID *uuid.UUID, excludeID *uuid.UUID) ([]SearchResult, error)

These should be used by a shared retrieval helper in internal/tools so semantic callers degrade consistently.

Notes on ranking

Text-search scores will not be directly comparable to vector similarity scores.

That is acceptable in v1 because:

each request will use one retrieval mode at a time
fallback is only used when semantic search is unavailable
response payloads can continue to return similarity as a generic relevance score

Auto behavior

The user asked for an auto backfill tool, so define two layers:

Layer 1: explicit MCP tool

Ship backfill_embeddings first.

This is the lowest-risk path because:

it is observable
it is rate-limited by the caller
it avoids surprise provider cost on startup

Layer 2: optional automatic runner

Add a config-gated background runner after the tool exists and is proven stable.

Config sketch:

backfill:
  enabled: false
  run_on_startup: false
  interval: "15m"
  batch_size: 20
  max_per_run: 100
  include_archived: false

Behavior:

on startup, if enabled and run_on_startup=true, run a small bounded backfill pass
if interval is set, periodically backfill missing embeddings for the active configured model
log counts and failures, but never block server startup on backfill failure

This keeps the first implementation simple while still giving us a clean path to true automation.

Store changes

Add store methods focused on missing-model coverage.

New methods

ListThoughtsMissingEmbedding(ctx, model string, limit int, projectID *uuid.UUID, includeArchived bool, olderThanDays int) ([]Thought, error)
UpsertEmbedding(ctx, thoughtID uuid.UUID, model string, embedding []float32) error

Optional later methods

CountThoughtsMissingEmbedding(ctx, model string, projectID *uuid.UUID, includeArchived bool) (int, error)
ListThoughtIDsMissingEmbeddingPage(...) for cursor-based paging on large datasets

Why separate `UpsertEmbedding`

InsertThought and UpdateThought already contain embedding upsert logic, but a dedicated helper will:

reduce duplication
let backfill avoid full thought updates
make future re-embedding jobs cleaner

Tooling changes

New file

internal/tools/backfill.go

Responsibilities:

parse input
resolve project if provided
select missing thoughts
run bounded embedding generation
record per-item failures without aborting the whole batch
return summary counts

MCP registration

Add the tool to:

internal/mcpserver/server.go
internal/mcpserver/schema.go and tests if needed
internal/app/app.go wiring

Suggested tool description:

Generate missing embeddings for stored thoughts using the active embedding model.

Config changes

No config is required for the first manual tool beyond the existing embedding provider settings.

For the later automatic runner, add:

backfill.enabled
backfill.run_on_startup
backfill.interval
backfill.batch_size
backfill.max_per_run
backfill.include_archived

Validation rules:

batch_size > 0
max_per_run >= batch_size
interval must parse when provided

Failure handling

The backfill tool should be best-effort, not all-or-nothing.

Rules:

one thought failure does not abort the full run
provider errors are captured and counted
database upsert failures are captured and counted
final tool response includes truncated failure details
full details go to logs

Failure payloads should avoid returning raw thought content to the caller if that would create noisy or sensitive responses. Prefer thought IDs plus short error strings.

Observability

Add structured logs for:

selected model
project scope
scan count
success count
failure count
duration

Later, metrics can include:

amcs_backfill_runs_total
amcs_backfill_embeddings_total
amcs_backfill_failures_total
amcs_thoughts_missing_embeddings

Concurrency and rate limiting

Keep the first version conservative.

Plan:

use a worker pool with a small fixed concurrency
keep batch sizes small by default
stop fetching new work once limit is reached
respect ctx cancellation so long backfills can be interrupted cleanly

Do not add provider-specific rate-limit logic in v1 unless real failures show it is needed.

Security and safety

Reuse existing MCP auth.
Do not expose a broad force=true option in v1.
Default to non-archived thoughts only.
Do not mutate raw thought text or metadata during backfill.
Treat embeddings as derived data that may be regenerated safely.

Testing plan

Store tests

Add tests for:

listing thoughts missing embeddings for a model
project-scoped missing-embedding queries
archived thought filtering
idempotent upsert behavior

Tool tests