diff --git a/README.md b/README.md index 907f34e..11fdb3c 100644 --- a/README.md +++ b/README.md @@ -262,7 +262,7 @@ Alternatively, pass `client_id` and `client_secret` as body parameters instead o - `ai.litellm.base_url` and `ai.litellm.api_key` — LiteLLM proxy - `ai.ollama.base_url` and `ai.ollama.api_key` — Ollama local or remote server -See `llm/plan.md` for full architecture and implementation plan. +See `llm/plan.md` for an audited high-level status summary of the original implementation plan, and `llm/todo.md` for the audited backfill/fallback follow-up status. ## Backfill diff --git a/llm/plan.md b/llm/plan.md index 4c8c404..d40e7c1 100644 --- a/llm/plan.md +++ b/llm/plan.md @@ -1,1809 +1,170 @@ # Avalon Memory Crystal Server (amcs) -## OB1 in Go — LiteLLM-First Implementation Plan -Based of the Open Brain project. Reference it for detail: https://github.com/NateBJones-Projects/OB1 +## Implementation Plan Audit — Current State and Remaining Work -## Objective +This file started as the original LiteLLM-first implementation plan for the Go rewrite. The repo has moved well past that starting point, so this document now serves as an audit of what the plan covered, what is already done, and what still appears to be outstanding. -Build a Go implementation of the OB1 project with: - -* **LiteLLM as the primary AI provider** -* **OpenRouter as the default upstream behind LiteLLM** -* **config-file-based keys and auth tokens** -* **MCP over Streamable HTTP** -* **Postgres with pgvector** -* parity with the current OB1 toolset: - - * `search_thoughts` - * `list_thoughts` - * `thought_stats` - * `capture_thought` - -* extended toolset for memory and project management: - - * `get_thought` - * `update_thought` - * `delete_thought` - * `archive_thought` - * `create_project` - * `list_projects` - * `get_project_context` - * `set_active_project` - * `get_active_project` - * `summarize_thoughts` - * `recall_context` - * `link_thoughts` - * `related_thoughts` - -The current OB1 reference implementation is a small MCP server backed by a `thoughts` table in Postgres, a `match_thoughts(...)` vector-search function, and OpenRouter calls for embeddings plus metadata extraction. +For current usage and setup, prefer `README.md`. +For the specific embedding-backfill follow-up work, see `llm/todo.md`. --- -## Why LiteLLM should be the primary provider +## Status summary -LiteLLM is the right primary abstraction because it gives one stable OpenAI-compatible API surface while allowing routing to multiple upstream providers, including OpenRouter. LiteLLM documents OpenAI-compatible proxy endpoints, including `/v1/embeddings`, and explicitly supports OpenRouter-backed models. +### Original v1 plan: substantially complete -That gives us: +The original core plan is no longer a future roadmap item. The repo already contains the major v1 pieces that this document proposed, including: -* one provider contract in the Go app -* centralized key management -* easier model swaps -* support for multiple upstreams later -* simpler production operations +- YAML-driven config and startup +- Postgres-backed storage with pgvector +- LiteLLM as the primary hosted AI path +- direct OpenRouter support +- MCP over Streamable HTTP +- authenticated MCP endpoints +- core thought tools +- project/context tools +- link traversal tools +- health/readiness endpoints +- file storage tools +- metadata retry and reparse maintenance tools +- embedding backfill and text-search fallback -### Provider strategy +In practice, the project has also grown beyond the original v1 scope with additional domains and tooling such as: -**Primary runtime mode** - -* App -> LiteLLM Proxy -> OpenRouter / other providers - -**Fallback mode** - -* App -> OpenRouter directly - -We will support both, but the codebase will be designed **LiteLLM-first**. +- stored files and binary resources +- agent skills and guardrails +- chat history tools +- household / maintenance / calendar / meal / CRM tools +- OAuth client-credentials support +- Ollama support +- tool discovery and persistent tool annotations --- -## Scope of the Go service +## What from the original plan is already implemented -The Go service will provide: +### Core server and platform work -1. **MCP server over Streamable HTTP** -2. **API-key authentication** -3. **thought capture** -4. **semantic search** -5. **thought listing** -6. **thought statistics** -7. **thought lifecycle management** (get, update, delete, archive) -8. **project grouping and context** (create, list, get context, active project) -9. **memory summarization and context recall** -10. **thought linking and relationship traversal** -11. **provider abstraction for embeddings + metadata extraction** -12. **config-file-driven startup** +Implemented in the current repo: + +- Streamable HTTP MCP server +- API-key auth middleware +- OAuth token flow support +- YAML config loader and validation +- Postgres connection verification and startup checks +- structured logging +- health and readiness endpoints +- version reporting + +### Original thought and project toolset + +Implemented in the current repo: + +- `capture_thought` +- `search_thoughts` +- `list_thoughts` +- `thought_stats` +- `get_thought` +- `update_thought` +- `delete_thought` +- `archive_thought` +- `create_project` +- `list_projects` +- `get_project_context` +- `set_active_project` +- `get_active_project` +- `summarize_thoughts` +- `recall_context` +- `link_thoughts` +- `related_thoughts` + +### Provider and retrieval work + +Implemented in the current repo: + +- LiteLLM provider path +- direct OpenRouter path +- Ollama path +- semantic search against model-specific embeddings +- full-text search fallback when embeddings do not exist for the active model in scope +- embedding backfill support +- metadata retry support for pending/failed extractions +- metadata reparse support + +### Scope that has gone beyond the original plan + +Additional capabilities now present in the repo: + +- file upload/save/load/list +- agent skills and guardrails plus project linking +- tool discovery via `describe_tools` +- persistent usage notes via `annotate_tool` +- chat history tools +- domain-specific household / maintenance / calendar / meals / CRM tooling --- -## Functional requirements +## Remaining deferred features from the original plan -### Required v1 features +These still appear to be legitimate future work items. -* start from YAML config -* connect to Postgres -* use pgvector for embeddings -* call LiteLLM for: +### 1. Slack ingestion - * embeddings - * metadata extraction -* expose MCP tools over HTTP -* protect the MCP endpoint with configured API keys -* preserve OB1-compatible tool semantics -* store thoughts with metadata and embeddings -* search via `match_thoughts(...)` +Still outstanding. -### Deferred features +### 2. Webhook ingestion -* Slack ingestion -* webhook ingestion -* async metadata extraction -* per-user tenancy -* admin UI -* background enrichment jobs -* multi-provider routing policy inside the app +Still outstanding. + +### 3. Per-user tenancy + +Still outstanding. + +### 4. Admin UI + +Still outstanding. + +### 5. Multi-provider routing policy inside the app + +Still outstanding. --- -## Reference behavior to preserve +## Deferred items from the original plan that are no longer accurate as open work -The current OB1 server is small and direct: +### Async metadata extraction -* it stores thoughts in a `thoughts` table -* it uses vector similarity for semantic search -* it exposes four MCP tools -* it uses an access key for auth -* it generates embeddings and extracts metadata via OpenRouter. +This was originally listed as deferred, but the repo now already behaves much closer to the deferred target than this document suggests: -The Go version should preserve those behaviors first, then improve structure and operability. +- capture can persist the thought even when metadata extraction fails or times out +- pending/failed metadata can be retried later +- metadata retry can run automatically in the background + +That does not necessarily mean the design is final, but it is no longer accurate to treat async metadata handling as untouched future work. + +### Background enrichment jobs + +The exact phrase from the original plan is still broad, but the repo already includes background-style maintenance flows such as: + +- embedding backfill +- metadata retry +- metadata reparse support + +So this plan should not be read as if AMCS has no background maintenance/enrichment machinery at all. --- -## Architecture +## Notes for maintainers -```text - +----------------------+ - | MCP Client / AI App | - +----------+-----------+ - | - | Streamable HTTP - v - +----------------------+ - | Go OB1 Server | - | auth + MCP tools | - +----+-----------+-----+ - | | - | | - v v - +----------------+ +----------------------+ - | LiteLLM Proxy | | Postgres + pgvector | - | embeddings | | thoughts + pgvector | - | metadata | | RPC/search SQL | - +--------+-------+ +----------------------+ - | - v - +-------------+ - | OpenRouter | - | or others | - +-------------+ -``` +When using this file for planning, treat it as historical context plus a high-level gap list. + +Do **not** use the original milestone sections below as a literal roadmap. They describe the early intended build order, not the present state of the repo. + +If a fresh roadmap is needed, create a new planning document based on the current codebase rather than continuing to extend the original greenfield plan. --- -## High-level design +## Historical note -### Core components +The rest of the original implementation-plan content was intentionally removed from this file during the audit cleanup because it described a mostly pre-implementation future state and was drifting away from the actual repository. -1. **Config subsystem** - - * load YAML - * apply env overrides - * validate required fields - -2. **Auth subsystem** - - * API-key validation - * header-based auth - * optional query-param auth - -3. **AI provider subsystem** - - * provider interface - * LiteLLM implementation - * optional OpenRouter direct implementation - -4. **Store subsystem** - - * Postgres connection pool - * insert/search/list/stats operations - * pgvector support - -5. **MCP subsystem** - - * MCP server - * tool registration - * HTTP transport - -6. **Observability subsystem** - - * structured logs - * metrics - * health checks - ---- - -## Project layout - -```text -ob1-go/ - cmd/ - ob1-server/ - main.go - - internal/ - app/ - app.go - - config/ - config.go - loader.go - validate.go - - auth/ - middleware.go - keyring.go - - ai/ - provider.go - factory.go - prompts.go - types.go - - litellm/ - client.go - embeddings.go - metadata.go - - openrouter/ - client.go - embeddings.go - metadata.go - - mcpserver/ - server.go - transport.go - - tools/ - search.go - list.go - stats.go - capture.go - get.go - update.go - delete.go - archive.go - projects.go - context.go - summarize.go - recall.go - links.go - - store/ - db.go - thoughts.go - stats.go - projects.go - links.go - - metadata/ - schema.go - normalize.go - validate.go - - types/ - thought.go - filters.go - - observability/ - logger.go - metrics.go - tracing.go - - migrations/ - 001_enable_vector.sql - 002_create_thoughts.sql - 003_add_projects.sql - 004_create_thought_links.sql - 005_create_match_thoughts.sql - 006_rls_and_grants.sql - - configs/ - config.example.yaml - dev.yaml - - scripts/ - run-local.sh - migrate.sh - - go.mod - README.md -``` - ---- - -## Dependencies - -### Required Go packages - -* `github.com/modelcontextprotocol/go-sdk` -* `github.com/jackc/pgx/v5` -* `github.com/pgvector/pgvector-go` -* `gopkg.in/yaml.v3` -* `github.com/go-playground/validator/v10` -* `github.com/google/uuid` - -### Standard library usage - -* `net/http` -* `context` -* `log/slog` -* `time` -* `encoding/json` - -The Go MCP SDK is the right fit for implementing an MCP server in Go, and `pgvector-go` is the expected library for Go integration with pgvector-backed Postgres columns. - ---- - -## Config model - -Config files are the primary source of truth. - -### Rules - -* use **YAML config files** -* allow **environment overrides** -* do **not** commit real secrets -* commit only `config.example.yaml` -* keep local secrets in ignored files -* in production, mount config files as secrets or use env overrides for sensitive values - ---- - -## Example config - -```yaml -server: - host: "0.0.0.0" - port: 8080 - read_timeout: "15s" - write_timeout: "30s" - idle_timeout: "60s" - allowed_origins: - - "*" - -mcp: - path: "/mcp" - server_name: "open-brain" - version: "1.0.0" - transport: "streamable_http" - -auth: - mode: "api_keys" - header_name: "x-brain-key" - query_param: "key" - allow_query_param: false - keys: - - id: "local-client" - value: "replace-me" - description: "main local client key" - -database: - url: "postgres://user:pass@localhost:5432/ob1?sslmode=disable" - max_conns: 10 - min_conns: 2 - max_conn_lifetime: "30m" - max_conn_idle_time: "10m" - -ai: - provider: "litellm" - - embeddings: - model: "openai/text-embedding-3-small" - dimensions: 1536 - - metadata: - model: "gpt-4o-mini" - temperature: 0.1 - - litellm: - base_url: "http://localhost:4000/v1" - api_key: "replace-me" - use_responses_api: false - request_headers: {} - embedding_model: "openrouter/openai/text-embedding-3-small" - metadata_model: "gpt-4o-mini" - - openrouter: - base_url: "https://openrouter.ai/api/v1" - api_key: "" - app_name: "ob1-go" - site_url: "" - extra_headers: {} - -capture: - source: "mcp" - metadata_defaults: - type: "observation" - topic_fallback: "uncategorized" - -search: - default_limit: 10 - default_threshold: 0.5 - max_limit: 50 - -logging: - level: "info" - format: "json" - -observability: - metrics_enabled: true - pprof_enabled: false -``` - ---- - -## Config structs - -```go -type Config struct { - Server ServerConfig `yaml:"server"` - MCP MCPConfig `yaml:"mcp"` - Auth AuthConfig `yaml:"auth"` - Database DatabaseConfig `yaml:"database"` - AI AIConfig `yaml:"ai"` - Capture CaptureConfig `yaml:"capture"` - Search SearchConfig `yaml:"search"` - Logging LoggingConfig `yaml:"logging"` - Observability ObservabilityConfig `yaml:"observability"` -} - -type ServerConfig struct { - Host string `yaml:"host"` - Port int `yaml:"port"` - ReadTimeout time.Duration `yaml:"read_timeout"` - WriteTimeout time.Duration `yaml:"write_timeout"` - IdleTimeout time.Duration `yaml:"idle_timeout"` - AllowedOrigins []string `yaml:"allowed_origins"` -} - -type MCPConfig struct { - Path string `yaml:"path"` - ServerName string `yaml:"server_name"` - Version string `yaml:"version"` - Transport string `yaml:"transport"` -} - -type AuthConfig struct { - Mode string `yaml:"mode"` - HeaderName string `yaml:"header_name"` - QueryParam string `yaml:"query_param"` - AllowQueryParam bool `yaml:"allow_query_param"` - Keys []APIKey `yaml:"keys"` -} - -type APIKey struct { - ID string `yaml:"id"` - Value string `yaml:"value"` - Description string `yaml:"description"` -} - -type DatabaseConfig struct { - URL string `yaml:"url"` - MaxConns int32 `yaml:"max_conns"` - MinConns int32 `yaml:"min_conns"` - MaxConnLifetime time.Duration `yaml:"max_conn_lifetime"` - MaxConnIdleTime time.Duration `yaml:"max_conn_idle_time"` -} - -type AIConfig struct { - Provider string `yaml:"provider"` // litellm | openrouter - Embeddings AIEmbeddingConfig `yaml:"embeddings"` - Metadata AIMetadataConfig `yaml:"metadata"` - LiteLLM LiteLLMConfig `yaml:"litellm"` - OpenRouter OpenRouterAIConfig `yaml:"openrouter"` -} - -type AIEmbeddingConfig struct { - Model string `yaml:"model"` - Dimensions int `yaml:"dimensions"` -} - -type AIMetadataConfig struct { - Model string `yaml:"model"` - Temperature float64 `yaml:"temperature"` -} - -type LiteLLMConfig struct { - BaseURL string `yaml:"base_url"` - APIKey string `yaml:"api_key"` - UseResponsesAPI bool `yaml:"use_responses_api"` - RequestHeaders map[string]string `yaml:"request_headers"` - EmbeddingModel string `yaml:"embedding_model"` - MetadataModel string `yaml:"metadata_model"` -} - -type OpenRouterAIConfig struct { - BaseURL string `yaml:"base_url"` - APIKey string `yaml:"api_key"` - AppName string `yaml:"app_name"` - SiteURL string `yaml:"site_url"` - ExtraHeaders map[string]string `yaml:"extra_headers"` -} -``` - ---- - -## Config precedence - -### Order - -1. `--config /path/to/file.yaml` -2. `AMCS_CONFIG` -3. default `./configs/dev.yaml` -4. environment overrides for specific fields - -### Suggested env overrides - -* `AMCS_DATABASE_URL` -* `AMCS_LITELLM_API_KEY` -* `AMCS_OPENROUTER_API_KEY` -* `AMCS_SERVER_PORT` - ---- - -## Validation rules - -At startup, fail fast if: - -* `database.url` is empty -* `auth.keys` is empty -* `mcp.path` is empty -* `ai.provider` is unsupported -* `ai.embeddings.dimensions <= 0` -* provider-specific base URL or API key is missing -* the DB vector dimension does not match configured embedding dimensions - ---- - -## AI provider design - -### Provider interface - -```go -type Provider interface { - Embed(ctx context.Context, input string) ([]float32, error) - ExtractMetadata(ctx context.Context, input string) (ThoughtMetadata, error) - Name() string -} -``` - -### Factory - -```go -func NewProvider(cfg AIConfig, httpClient *http.Client, log *slog.Logger) (Provider, error) { - switch cfg.Provider { - case "litellm": - return litellm.New(cfg, httpClient, log) - case "openrouter": - return openrouter.New(cfg, httpClient, log) - default: - return nil, fmt.Errorf("unsupported ai.provider: %s", cfg.Provider) - } -} -``` - ---- - -## LiteLLM-first behavior - -### Embeddings - -The app will call LiteLLM at: - -* `POST /v1/embeddings` - -using an OpenAI-compatible request payload and Bearer auth. LiteLLM documents its proxy embeddings support through OpenAI-compatible endpoints. - -### Metadata extraction - -The app will call LiteLLM at: - -* `POST /v1/chat/completions` - -using: - -* configured metadata model -* system prompt -* user message -* JSON-oriented response handling - -LiteLLM’s proxy is intended to accept OpenAI-style chat completion requests. - -### Model routing - -In config, use: - -* `litellm.embedding_model` -* `litellm.metadata_model` - -These may be: - -* direct model names -* LiteLLM aliases -* OpenRouter-backed model identifiers - -Example: - -```yaml -litellm: - embedding_model: "openrouter/openai/text-embedding-3-small" - metadata_model: "gpt-4o-mini" -``` - -LiteLLM documents OpenRouter provider usage and OpenRouter-backed model naming. - ---- - -## OpenRouter fallback mode - -If `ai.provider: openrouter`, the app will directly call: - -* `POST /api/v1/embeddings` -* `POST /api/v1/chat/completions` - -with Bearer auth. - -OpenRouter documents the embeddings endpoint and its authentication model. - -This mode is mainly for: - -* local simplicity -* debugging provider issues -* deployments without LiteLLM - ---- - -## Metadata schema - -Use one stable metadata schema regardless of provider. - -```go -type ThoughtMetadata struct { - People []string `json:"people"` - ActionItems []string `json:"action_items"` - DatesMentioned []string `json:"dates_mentioned"` - Topics []string `json:"topics"` - Type string `json:"type"` - Source string `json:"source"` -} -``` - -### Accepted type values - -* `observation` -* `task` -* `idea` -* `reference` -* `person_note` - -### Normalization rules - -* trim all strings -* deduplicate arrays -* drop empty values -* cap topics count if needed -* default invalid `type` to `observation` -* set `source: "mcp"` for MCP-captured thoughts - -### Fallback defaults - -If metadata extraction fails: - -```json -{ - "people": [], - "action_items": [], - "dates_mentioned": [], - "topics": ["uncategorized"], - "type": "observation", - "source": "mcp" -} -``` - ---- - -## Database design - -The DB contract should match the current OB1 structure as closely as possible: - -* `thoughts` table -* `embedding vector(1536)` -* HNSW index -* metadata JSONB -* `match_thoughts(...)` function - ---- - -## Migrations - -### `001_enable_vector.sql` - -```sql -create extension if not exists vector; -``` - -### `002_create_thoughts.sql` - -```sql -create table if not exists thoughts ( - id uuid default gen_random_uuid() primary key, - content text not null, - embedding vector(1536), - metadata jsonb default '{}'::jsonb, - created_at timestamptz default now(), - updated_at timestamptz default now() -); - -create index if not exists thoughts_embedding_hnsw_idx - on thoughts using hnsw (embedding vector_cosine_ops); - -create index if not exists thoughts_metadata_gin_idx - on thoughts using gin (metadata); - -create index if not exists thoughts_created_at_idx - on thoughts (created_at desc); -``` - -### `003_add_projects.sql` - -```sql -create table if not exists projects ( - id uuid default gen_random_uuid() primary key, - name text not null unique, - description text, - created_at timestamptz default now(), - last_active_at timestamptz default now() -); - -alter table thoughts add column if not exists project_id uuid references projects(id); -alter table thoughts add column if not exists archived_at timestamptz; - -create index if not exists thoughts_project_id_idx on thoughts (project_id); -create index if not exists thoughts_archived_at_idx on thoughts (archived_at); -``` - -### `004_create_thought_links.sql` - -```sql -create table if not exists thought_links ( - from_id uuid references thoughts(id) on delete cascade, - to_id uuid references thoughts(id) on delete cascade, - relation text not null, - created_at timestamptz default now(), - primary key (from_id, to_id, relation) -); - -create index if not exists thought_links_from_idx on thought_links (from_id); -create index if not exists thought_links_to_idx on thought_links (to_id); -``` - -### `005_create_match_thoughts.sql` - -```sql -create or replace function match_thoughts( - query_embedding vector(1536), - match_threshold float default 0.7, - match_count int default 10, - filter jsonb default '{}'::jsonb -) -returns table ( - id uuid, - content text, - metadata jsonb, - similarity float, - created_at timestamptz -) -language plpgsql -as $$ -begin - return query - select - t.id, - t.content, - t.metadata, - 1 - (t.embedding <=> query_embedding) as similarity, - t.created_at - from thoughts t - where 1 - (t.embedding <=> query_embedding) > match_threshold - and (filter = '{}'::jsonb or t.metadata @> filter) - order by t.embedding <=> query_embedding - limit match_count; -end; -$$; -``` - -### `006_rls_and_grants.sql` - -```sql --- Grant full access to the application database user configured in database.url. --- Replace 'ob1_user' with the actual role name used in your database.url. -grant select, insert, update, delete on table public.thoughts to ob1_user; -grant select, insert, update, delete on table public.projects to ob1_user; -grant select, insert, update, delete on table public.thought_links to ob1_user; -``` - ---- - -## Store layer - -### Interfaces - -```go -type ThoughtStore interface { - InsertThought(ctx context.Context, thought Thought) error - GetThought(ctx context.Context, id uuid.UUID) (Thought, error) - UpdateThought(ctx context.Context, id uuid.UUID, patch ThoughtPatch) (Thought, error) - DeleteThought(ctx context.Context, id uuid.UUID) error - ArchiveThought(ctx context.Context, id uuid.UUID) error - SearchThoughts(ctx context.Context, embedding []float32, threshold float64, limit int, filter map[string]any) ([]SearchResult, error) - ListThoughts(ctx context.Context, filter ListFilter) ([]Thought, error) - Stats(ctx context.Context) (ThoughtStats, error) -} - -type ProjectStore interface { - InsertProject(ctx context.Context, project Project) error - GetProject(ctx context.Context, nameOrID string) (Project, error) - ListProjects(ctx context.Context) ([]ProjectSummary, error) - TouchProject(ctx context.Context, id uuid.UUID) error -} - -type LinkStore interface { - InsertLink(ctx context.Context, link ThoughtLink) error - GetLinks(ctx context.Context, thoughtID uuid.UUID) ([]ThoughtLink, error) -} -``` - -### DB implementation notes - -Use `pgxpool.Pool`. - -On startup: - -* parse DB config -* create pool -* register pgvector support -* ping DB -* verify required function exists -* verify vector extension exists - ---- - -## Domain types - -```go -type Thought struct { - ID uuid.UUID - Content string - Embedding []float32 - Metadata ThoughtMetadata - ProjectID *uuid.UUID - ArchivedAt *time.Time - CreatedAt time.Time - UpdatedAt time.Time -} - -type SearchResult struct { - ID uuid.UUID - Content string - Metadata ThoughtMetadata - Similarity float64 - CreatedAt time.Time -} - -type ListFilter struct { - Limit int - Type string - Topic string - Person string - Days int - ProjectID *uuid.UUID - IncludeArchived bool -} - -type ThoughtStats struct { - TotalCount int - TypeCounts map[string]int - TopTopics []KeyCount - TopPeople []KeyCount -} - -type ThoughtPatch struct { - Content *string - Metadata *ThoughtMetadata -} - -type Project struct { - ID uuid.UUID - Name string - Description string - CreatedAt time.Time - LastActiveAt time.Time -} - -type ProjectSummary struct { - Project - ThoughtCount int -} - -type ThoughtLink struct { - FromID uuid.UUID - ToID uuid.UUID - Relation string - CreatedAt time.Time -} -``` - ---- - -## Auth design - -The reference OB1 implementation uses a configured access key and accepts it via header or query param. - -We will keep compatibility but make it cleaner. - -### Auth behavior - -* primary auth via header, default: `x-brain-key` -* optional query param fallback -* support multiple keys in config -* attach key ID to request context for auditing - -### Middleware flow - -1. read configured header -2. if missing and allowed, read query param -3. compare against in-memory keyring -4. if matched, attach key ID to request context -5. else return `401 Unauthorized` - -### Recommendation - -Set: - -```yaml -auth: - allow_query_param: false -``` - -for production. - ---- - -## MCP server design - -Expose MCP over Streamable HTTP. MCP’s spec defines Streamable HTTP as the remote transport replacing the older HTTP+SSE approach. - -### HTTP routes - -* `POST /mcp` -* `GET /healthz` -* `GET /readyz` - -### Middleware stack - -* request ID -* panic recovery -* structured logging -* auth -* timeout -* optional CORS - ---- - -## MCP tools - -### 1. `capture_thought` - -**Input** - -* `content string` - -**Flow** - -1. validate content -2. concurrently: - - * call provider `Embed` - * call provider `ExtractMetadata` -3. normalize metadata -4. set `source = "mcp"` -5. insert into `thoughts` -6. return success payload - -### 2. `search_thoughts` - -**Input** - -* `query string` -* `limit int` -* `threshold float` - -**Flow** - -1. embed query -2. call `match_thoughts(...)` -3. format ranked results -4. return results - -### 3. `list_thoughts` - -**Input** - -* `limit` -* `type` -* `topic` -* `person` -* `days` - -**Flow** - -1. build SQL filters -2. query `thoughts` -3. order by `created_at desc` -4. return summaries - -### 4. `thought_stats` - -**Input** - -* none - -**Flow** - -1. count rows -2. aggregate metadata usage -3. return totals and top buckets - ---- - -### 5. `get_thought` - -**Input** - -* `id string` - -**Flow** - -1. validate UUID -2. query `thoughts` by ID -3. return full record or not-found error - ---- - -### 6. `update_thought` - -**Input** - -* `id string` -* `content string` (optional) -* `metadata map` (optional, merged not replaced) - -**Flow** - -1. validate inputs -2. if content provided: re-embed and re-extract metadata -3. merge metadata patch -4. update row, set `updated_at` -5. return updated record - ---- - -### 7. `delete_thought` - -**Input** - -* `id string` - -**Flow** - -1. validate UUID -2. hard-delete row (cascades to `thought_links`) -3. return confirmation - ---- - -### 8. `archive_thought` - -**Input** - -* `id string` - -**Flow** - -1. validate UUID -2. set `archived_at = now()` -3. return confirmation - -Note: archived thoughts are excluded from search and list results by default unless `include_archived: true` is passed. - ---- - -### 9. `create_project` - -**Input** - -* `name string` -* `description string` (optional) - -**Flow** - -1. validate name uniqueness -2. insert into `projects` -3. return project record - ---- - -### 10. `list_projects` - -**Input** - -* none - -**Flow** - -1. query `projects` ordered by `last_active_at desc` -2. join thought counts per project -3. return summaries - ---- - -### 11. `get_project_context` - -**Input** - -* `project string` (name or ID) -* `query string` (optional, semantic focus) -* `limit int` - -**Flow** - -1. resolve project -2. fetch recent thoughts in project (last N) -3. if query provided: semantic search scoped to project -4. merge and deduplicate results ranked by recency + similarity -5. update `projects.last_active_at` -6. return context block ready for injection - ---- - -### 12. `set_active_project` - -**Input** - -* `project string` (name or ID) - -**Flow** - -1. resolve project -2. store project ID in server session context (in-memory, per connection) -3. return confirmation - ---- - -### 13. `get_active_project` - -**Input** - -* none - -**Flow** - -1. return current session active project or null - ---- - -### 14. `summarize_thoughts` - -**Input** - -* `query string` (optional topic focus) -* `project string` (optional) -* `days int` (optional time window) -* `limit int` - -**Flow** - -1. fetch matching thoughts via search or filter -2. format as context -3. call AI provider to produce prose summary -4. return summary text - ---- - -### 15. `recall_context` - -**Input** - -* `query string` -* `project string` (optional) -* `limit int` - -**Flow** - -1. semantic search with optional project filter -2. recency boost: merge with most recent N thoughts from project -3. deduplicate and rank -4. return formatted context block suitable for pasting into a new conversation - ---- - -### 16. `link_thoughts` - -**Input** - -* `from_id string` -* `to_id string` -* `relation string` (e.g. `follows_up`, `contradicts`, `references`, `blocks`) - -**Flow** - -1. validate both IDs exist -2. insert into `thought_links` -3. return confirmation - ---- - -### 17. `related_thoughts` - -**Input** - -* `id string` -* `include_semantic bool` (default true) - -**Flow** - -1. fetch explicit links from `thought_links` for this ID -2. if `include_semantic`: also fetch nearest semantic neighbours -3. merge, deduplicate, return with relation type or similarity score - ---- - -## Tool package plan - -### `internal/tools/capture.go` - -Responsibilities: - -* input validation -* parallel embed + metadata extraction -* normalization -* write to store - -### `internal/tools/search.go` - -Responsibilities: - -* input validation -* embed query -* vector search -* output formatting - -### `internal/tools/list.go` - -Responsibilities: - -* filter normalization -* DB read -* output formatting - -### `internal/tools/stats.go` - -Responsibilities: - -* fetch/aggregate stats -* output shaping - -### `internal/tools/get.go` - -Responsibilities: - -* UUID validation -* single thought retrieval - -### `internal/tools/update.go` - -Responsibilities: - -* partial content/metadata update -* conditional re-embed if content changed -* metadata merge - -### `internal/tools/delete.go` - -Responsibilities: - -* UUID validation -* hard delete - -### `internal/tools/archive.go` - -Responsibilities: - -* UUID validation -* set `archived_at` - -### `internal/tools/projects.go` - -Responsibilities: - -* `create_project`, `list_projects` -* `set_active_project`, `get_active_project` (session context) - -### `internal/tools/context.go` - -Responsibilities: - -* `get_project_context`: resolve project, combine recency + semantic search, return context block -* update `last_active_at` on access - -### `internal/tools/summarize.go` - -Responsibilities: - -* filter/search thoughts -* format as prompt context -* call AI provider for prose summary - -### `internal/tools/recall.go` - -Responsibilities: - -* `recall_context`: semantic search + recency boost + project filter -* output formatted context block - -### `internal/tools/links.go` - -Responsibilities: - -* `link_thoughts`: validate both IDs, insert link -* `related_thoughts`: fetch explicit links + optional semantic neighbours, merge and return - ---- - -## Startup sequence - -1. parse CLI args -2. load config file -3. apply env overrides -4. validate config -5. initialize logger -6. create DB pool -7. verify DB requirements -8. create AI provider -9. create store -10. create tool handlers -11. register MCP tools -12. start HTTP server - ---- - -## Error handling policy - -### Fail fast on startup errors - -* invalid config -* DB unavailable -* missing required API keys -* invalid MCP config -* provider initialization failure - -### Retry policy for provider calls - -Retry on: - -* `429` -* `500` -* `502` -* `503` -* timeout -* connection reset - -Do not retry on: - -* malformed request -* auth failure -* invalid model name -* invalid response shape after repeated attempts - -Use: - -* exponential backoff -* capped retries -* context-aware cancellation - ---- - -## Observability - -### Logging - -Use `log/slog` in JSON mode. - -Include: - -* request ID -* route -* tool name -* key ID -* provider name -* DB latency -* upstream latency -* error class - -### Metrics - -Track: - -* request count by tool -* request duration -* provider call duration -* DB query duration -* auth failures -* provider failures -* insert/search counts - -### Health checks - -#### `/healthz` - -Returns OK if process is running. - -#### `/readyz` - -Returns OK only if: - -* DB is reachable -* provider config is valid -* optional provider probe passes - ---- - -## Security plan - -### Secrets handling - -* keep secrets in config files only for local/dev use -* never commit real secrets -* use mounted secret files or env overrides in production - -### API key policy - -* support multiple keys -* identify keys by ID -* allow key rotation by config update + restart -* log only key ID, never raw value - -### Transport - -* run behind TLS terminator in production -* disable query-param auth in production -* avoid logging full URLs when query-param auth is enabled - ---- - -## Testing plan - -## Unit tests - -### Config - -* valid config loads -* invalid config fails -* env overrides apply correctly - -### Auth - -* header auth success -* query auth success -* invalid key rejected -* disabled query auth rejected - -### Metadata - -* normalization works -* invalid types default correctly -* empty metadata falls back safely - -### AI provider parsing - -* LiteLLM embeddings parse correctly -* LiteLLM chat completions parse correctly -* provider errors classified correctly - -### Store - -* filter builders generate expected SQL fragments -* JSONB metadata handling is stable - ---- - -## Integration tests - -Run against local Postgres with pgvector. - -Test: - -* migrations apply cleanly -* insert thought -* search thought -* list thoughts with filters -* stats aggregation -* auth-protected MCP route -* LiteLLM mock/proxy compatibility - ---- - -## Manual acceptance tests - -1. start local Postgres + pgvector -2. start LiteLLM -3. configure LiteLLM to route embeddings to OpenRouter -4. start Go server -5. connect MCP client -6. call `capture_thought` -7. call `search_thoughts` -8. call `list_thoughts` -9. call `thought_stats` -10. rotate API key and verify restart behavior - ---- - -## Milestones - -## Milestone 1 — foundation - -Deliver: - -* repo skeleton -* config loader -* config validation -* logger -* DB connection -* migrations - -Exit criteria: - -* app starts -* DB connection verified -* config-driven startup works - ---- - -## Milestone 2 — AI provider layer - -Deliver: - -* provider interface -* LiteLLM implementation -* OpenRouter fallback implementation -* metadata prompt -* normalization - -Exit criteria: - -* successful embedding call through LiteLLM -* successful metadata extraction through LiteLLM -* vector length validation works - ---- - -## Milestone 3 — capture and search - -Deliver: - -* `capture_thought` -* `search_thoughts` -* store methods for insert and vector search - -Exit criteria: - -* thoughts can be captured end-to-end -* semantic search returns results - ---- - -## Milestone 4 — remaining tools - -Deliver: - -* `list_thoughts` -* `thought_stats` - -Exit criteria: - -* all four tools function through MCP - ---- - -## Milestone 5 — extended memory and project tools - -Deliver: - -* `get_thought`, `update_thought`, `delete_thought`, `archive_thought` -* `create_project`, `list_projects`, `set_active_project`, `get_active_project` -* `get_project_context`, `recall_context` -* `summarize_thoughts` -* `link_thoughts`, `related_thoughts` -* migrations 003 and 004 (projects + links tables) - -Exit criteria: - -* thoughts can be retrieved, patched, deleted, archived -* projects can be created and listed -* `get_project_context` returns a usable context block -* `summarize_thoughts` produces a prose summary via the AI provider -* thought links can be created and retrieved with semantic neighbours - ---- - -## Milestone 6 — HTTP and auth hardening - -Deliver: - -* auth middleware -* health endpoints -* structured logs -* retries -* timeouts - -Exit criteria: - -* endpoint protected -* logs useful -* service stable under expected failures - ---- - -## Milestone 7 — production readiness - -Deliver: - -* metrics -* readiness checks -* key rotation workflow -* deployment docs - -Exit criteria: - -* production deployment is straightforward -* operational playbook exists - ---- - -## Implementation order - -Build in this order: - -1. config -2. DB + migrations (001, 002, 003, 004) -3. LiteLLM client -4. metadata normalization -5. `capture_thought` -6. `search_thoughts` -7. MCP HTTP server -8. auth middleware -9. `list_thoughts` -10. `thought_stats` -11. `get_thought`, `update_thought`, `delete_thought`, `archive_thought` -12. `create_project`, `list_projects`, `set_active_project`, `get_active_project` -13. `get_project_context`, `recall_context` -14. `summarize_thoughts` -15. `link_thoughts`, `related_thoughts` -16. logs/metrics/health - -This gives usable value early and builds the project/memory layer on a solid foundation. - ---- - -## Recommended local development stack - -### Services - -* Postgres with pgvector -* LiteLLM proxy -* optional OpenRouter upstream -* Go service - -### Example shape - -```text -docker-compose: - postgres - litellm - ob1-go -``` - ---- - -## Recommended production deployment - -### Preferred architecture - -* Go service on Fly.io / Cloud Run / Render -* LiteLLM as separate service -* Postgres managed externally -* TLS terminator in front - -### Why not Edge Functions - -The original repo uses Deno Edge Functions because of its chosen deployment environment, but the app behavior is better suited to a normal long-running Go service for maintainability and observability. - ---- - -## Risks and decisions - -### Risk: embedding dimension mismatch - -Mitigation: - -* validate config vs DB on startup - -### Risk: LiteLLM model alias drift - -Mitigation: - -* add readiness probe for configured models - -### Risk: metadata extraction instability - -Mitigation: - -* strong normalization + safe defaults - -### Risk: single global auth model - -Mitigation: - -* acceptable for v1 -* redesign for multi-tenant later - -### Risk: stats scaling poorly - -Mitigation: - -* start with in-memory aggregation -* move to SQL aggregation if needed - ---- - -## Definition of done for v1 - -The project is done when: - -* service starts from YAML config -* LiteLLM is the primary AI provider -* OpenRouter can be used behind LiteLLM -* direct OpenRouter mode still works -* MCP endpoint is authenticated -* `capture_thought` stores content, embedding, metadata -* `search_thoughts` performs semantic search -* `list_thoughts` supports filtering -* `thought_stats` returns useful summaries -* thoughts can be retrieved, updated, deleted, and archived -* projects can be created, listed, and used to scope captures and searches -* `get_project_context` returns a ready-to-use context block -* `recall_context` returns a semantically relevant + recent context block -* `summarize_thoughts` produces prose summaries via the AI provider -* thought links can be created and traversed with semantic fallback -* logs and health checks exist -* key rotation works via config + restart - ---- - -## Recommendation - -Build this as a **boring Go service**: - -* stdlib HTTP -* thin MCP server layer -* thin provider layer -* thin store layer -* YAML config -* explicit interfaces - -Do not over-abstract it. The product shape is simple. The goal is a reliable, understandable service with LiteLLM as the stable provider boundary. - ---- - -## Next implementation artifact - -The next concrete deliverable should be a **starter Go repo skeleton** containing: - -* `go.mod` -* folder structure -* `main.go` -* config loader -* example config -* migration files (001–006) -* provider interface -* LiteLLM client skeleton -* store interfaces (`ThoughtStore`, `ProjectStore`, `LinkStore`) -* domain types including `Project`, `ThoughtLink`, `ThoughtPatch` -* MCP tool registration stubs for all 17 tools +If needed, recover the older version from git history. diff --git a/llm/todo.md b/llm/todo.md index 235f515..13a0d28 100644 --- a/llm/todo.md +++ b/llm/todo.md @@ -1,450 +1,126 @@ # AMCS TODO -## Auto Embedding Backfill Tool +## Embedding Backfill and Text-Search Fallback Audit -## Objective +This file originally described the planned `backfill_embeddings` work and semantic-to-text fallback behavior. Most of that work is now implemented. This document now tracks what landed, what still needs verification, and what follow-up work remains. -Add an MCP tool that automatically backfills missing embeddings for existing thoughts so semantic search keeps working after: - -* embedding model changes -* earlier capture or update failures -* import or migration of raw thoughts without vectors - -The tool should be safe to run repeatedly, should not duplicate work, and should make it easy to restore semantic coverage without rewriting existing thoughts. +For current operator-facing behavior, prefer `README.md`. --- -## Desired outcome +## Status summary -After this work: +### Implemented -* raw thought text remains the source of truth -* embeddings are treated as derived data per model -* search continues to query only embeddings from the active embedding model -* when no embeddings exist for the active model and scope, search falls back to Postgres text search -* operators or MCP clients can trigger a backfill for the current model -* AMCS can optionally auto-run a limited backfill pass on startup or on a schedule later +The main work described in this file is already present in the repo: + +- `backfill_embeddings` MCP tool exists +- missing-embedding selection helpers exist in the store layer +- embedding upsert helpers exist in the store layer +- semantic retrieval falls back to Postgres full-text search when the active model has no embeddings in scope +- fallback behavior is wired into the main query-driven tools +- a full-text index migration exists +- optional automatic backfill runner exists in config/startup flow +- retry and reparse maintenance tooling also exists around metadata quality + +### Still worth checking or improving + +The broad feature is done, but some implementation-depth items are still worth tracking: + +- test coverage around fallback/backfill behavior +- whether configured backfill batching is used consistently end-to-end +- observability depth beyond logs +- response visibility into which retrieval mode was used --- -## Why this is needed +## What is already implemented -Current search behavior is model-specific: +### Backfill tool -* query text is embedded with the configured provider model -* results are filtered by `embeddings.model` -* thoughts with no embedding for that model are invisible to semantic search +Implemented: -This means a model switch leaves old thoughts searchable only by listing and metadata filters until new embeddings are generated. +- `backfill_embeddings` +- project scoping +- archived-thought filtering +- age filtering +- dry-run mode +- bounded concurrency +- best-effort per-item failure handling +- idempotent embedding upsert behavior -To avoid that dead zone, AMCS should also support a lexical fallback path backed by native Postgres text-search indexing. +### Search fallback + +Implemented: + +- full-text fallback when no embeddings exist for the active model in scope +- fallback helper shared by query-based tools +- full-text index migration on thought content + +### Tools using fallback + +Implemented fallback coverage for: + +- `search_thoughts` +- `recall_context` +- `get_project_context` when a query is provided +- `summarize_thoughts` when a query is provided +- semantic neighbors in `related_thoughts` + +### Optional automatic behavior + +Implemented: + +- config-gated startup backfill pass +- config-gated periodic backfill loop --- -## Tool proposal +## Remaining follow-ups -### New MCP tool +### 1. Expose retrieval mode in responses -`backfill_embeddings` +Still outstanding. -Purpose: +Why it matters: +- callers currently benefit from fallback automatically +- but debugging is easier if responses explicitly say whether retrieval was `semantic` or `text` -* find thoughts missing an embedding for the active model -* generate embeddings in batches -* write embeddings with upsert semantics -* report counts for scanned, embedded, skipped, and failed thoughts +Suggested shape: +- add a machine-readable field such as `retrieval_mode: semantic|text` +- keep it consistent across all query-based tools that use shared retrieval logic -### Input +### 2. Verify and improve tests -```json -{ - "project": "optional project name or id", - "limit": 100, - "batch_size": 20, - "include_archived": false, - "older_than_days": 0, - "dry_run": false -} -``` +Still worth auditing. -Notes: +Recommended checks: +- no-embedding scope falls back to text search +- project-scoped fallback only searches within project scope +- archived thoughts remain excluded by default +- `related_thoughts` falls back correctly when semantic vectors are unavailable +- backfill creates embeddings that later restore semantic search -* `project` scopes the backfill to a project when desired -* `limit` caps total thoughts processed in one tool call -* `batch_size` controls provider load -* `include_archived` defaults to `false` -* `older_than_days` is optional and mainly useful to avoid racing with fresh writes -* `dry_run` returns counts and sample IDs without calling the embedding provider +### 3. Re-embedding / migration ergonomics -### Output +Still optional future work. -```json -{ - "model": "openai/text-embedding-3-small", - "scanned": 100, - "embedded": 87, - "skipped": 13, - "failed": 0, - "dry_run": false, - "failures": [] -} -``` - -Optional: - -* include a short `next_cursor` later if we add cursor-based paging +Potential additions: +- count missing embeddings by project +- add `missing_embeddings` stats to `thought_stats` +- add a controlled re-embed or reindex flow for model migrations --- -## Backfill behavior +## Notes for maintainers -### Core rules +Do not read this file as an untouched future roadmap item anymore. The repo has already implemented the core work described here. -* Backfill only when a thought is missing an embedding row for the active model. -* Do not recompute embeddings that already exist for that model unless an explicit future `force` flag is added. -* Keep embeddings per model side by side in the existing `embeddings` table. -* Use `insert ... on conflict (thought_id, model) do update` so retries stay idempotent. - -### Selection query - -Add a store query that returns thoughts where no embedding exists for the requested model. - -Shape: - -* from `thoughts t` -* left join `embeddings e on e.thought_id = t.guid and e.model = $model` -* filter `e.id is null` -* optional filters for project, archived state, age -* order by `t.created_at asc` -* limit by requested batch - -Ordering oldest first is useful because it steadily restores long-tail recall instead of repeatedly revisiting recent writes. - -### Processing loop - -For each selected thought: - -1. read `content` -2. call `provider.Embed(content)` -3. upsert embedding row for `thought_id + model` -4. continue on per-item failure and collect errors - -Use bounded concurrency instead of fully serial processing so large backfills complete in reasonable time without overwhelming the provider. - -Recommended first pass: - -* one tool invocation handles batches internally -* concurrency defaults to a small fixed number like `4` -* `batch_size` and concurrency are kept server-side defaults at first, even if only `limit` is exposed in MCP input +If more backfill/fallback work is planned, append it as concrete follow-ups against the current codebase rather than preserving the old speculative rollout order. --- -## Search fallback behavior +## Historical note -### Goal +The original long-form proposal was replaced during the repo audit because it described work that is now largely complete and was causing issue/document drift. -If semantic retrieval cannot run because no embeddings exist for the active model in the selected scope, AMCS should fall back to Postgres text search instead of returning empty semantic results by default. - -### Fallback rules - -* If embeddings exist for the active model, keep using vector search as the primary path. -* If no embeddings exist for the active model in scope, run Postgres text search against raw thought content. -* Fallback should apply to: - - * `search_thoughts` - * `recall_context` - * `get_project_context` when `query` is provided - * `summarize_thoughts` when `query` is provided - * semantic neighbors in `related_thoughts` - -* Fallback should not mutate data. It is retrieval-only. -* Backfill remains the long-term fix; text search is the immediate safety net. - -### Postgres search approach - -Add a native full-text index on thought content and query it with a matching text-search configuration. - -Recommended first pass: - -* add a migration creating a GIN index on `to_tsvector('simple', content)` -* use `websearch_to_tsquery('simple', $query)` for user-entered text -* rank results with `ts_rank_cd(...)` -* continue excluding archived thoughts by default -* continue honoring project scope - -Using the `simple` configuration is a safer default for mixed prose, identifiers, and code-ish text than a language-specific stemmer. - -### Store additions for fallback - -Add store methods such as: - -* `HasEmbeddingsForModel(ctx, model string, projectID *uuid.UUID) (bool, error)` -* `SearchThoughtsText(ctx, query string, limit int, projectID *uuid.UUID, excludeID *uuid.UUID) ([]SearchResult, error)` - -These should be used by a shared retrieval helper in `internal/tools` so semantic callers degrade consistently. - -### Notes on ranking - -Text-search scores will not be directly comparable to vector similarity scores. - -That is acceptable in v1 because: - -* each request will use one retrieval mode at a time -* fallback is only used when semantic search is unavailable -* response payloads can continue to return `similarity` as a generic relevance score - ---- - -## Auto behavior - -The user asked for an auto backfill tool, so define two layers: - -### Layer 1: explicit MCP tool - -Ship `backfill_embeddings` first. - -This is the lowest-risk path because: - -* it is observable -* it is rate-limited by the caller -* it avoids surprise provider cost on startup - -### Layer 2: optional automatic runner - -Add a config-gated background runner after the tool exists and is proven stable. - -Config sketch: - -```yaml -backfill: - enabled: false - run_on_startup: false - interval: "15m" - batch_size: 20 - max_per_run: 100 - include_archived: false -``` - -Behavior: - -* on startup, if enabled and `run_on_startup=true`, run a small bounded backfill pass -* if `interval` is set, periodically backfill missing embeddings for the active configured model -* log counts and failures, but never block server startup on backfill failure - -This keeps the first implementation simple while still giving us a clean path to true automation. - ---- - -## Store changes - -Add store methods focused on missing-model coverage. - -### New methods - -* `ListThoughtsMissingEmbedding(ctx, model string, limit int, projectID *uuid.UUID, includeArchived bool, olderThanDays int) ([]Thought, error)` -* `UpsertEmbedding(ctx, thoughtID uuid.UUID, model string, embedding []float32) error` - -### Optional later methods - -* `CountThoughtsMissingEmbedding(ctx, model string, projectID *uuid.UUID, includeArchived bool) (int, error)` -* `ListThoughtIDsMissingEmbeddingPage(...)` for cursor-based paging on large datasets - -### Why separate `UpsertEmbedding` - -`InsertThought` and `UpdateThought` already contain embedding upsert logic, but a dedicated helper will: - -* reduce duplication -* let backfill avoid full thought updates -* make future re-embedding jobs cleaner - ---- - -## Tooling changes - -### New file - -`internal/tools/backfill.go` - -Responsibilities: - -* parse input -* resolve project if provided -* select missing thoughts -* run bounded embedding generation -* record per-item failures without aborting the whole batch -* return summary counts - -### MCP registration - -Add the tool to: - -* `internal/mcpserver/server.go` -* `internal/mcpserver/schema.go` and tests if needed -* `internal/app/app.go` wiring - -Suggested tool description: - -* `Generate missing embeddings for stored thoughts using the active embedding model.` - ---- - -## Config changes - -No config is required for the first manual tool beyond the existing embedding provider settings. - -For the later automatic runner, add: - -* `backfill.enabled` -* `backfill.run_on_startup` -* `backfill.interval` -* `backfill.batch_size` -* `backfill.max_per_run` -* `backfill.include_archived` - -Validation rules: - -* `batch_size > 0` -* `max_per_run >= batch_size` -* `interval` must parse when provided - ---- - -## Failure handling - -The backfill tool should be best-effort, not all-or-nothing. - -Rules: - -* one thought failure does not abort the full run -* provider errors are captured and counted -* database upsert failures are captured and counted -* final tool response includes truncated failure details -* full details go to logs - -Failure payloads should avoid returning raw thought content to the caller if that would create noisy or sensitive responses. Prefer thought IDs plus short error strings. - ---- - -## Observability - -Add structured logs for: - -* selected model -* project scope -* scan count -* success count -* failure count -* duration - -Later, metrics can include: - -* `amcs_backfill_runs_total` -* `amcs_backfill_embeddings_total` -* `amcs_backfill_failures_total` -* `amcs_thoughts_missing_embeddings` - ---- - -## Concurrency and rate limiting - -Keep the first version conservative. - -Plan: - -* use a worker pool with a small fixed concurrency -* keep batch sizes small by default -* stop fetching new work once `limit` is reached -* respect `ctx` cancellation so long backfills can be interrupted cleanly - -Do not add provider-specific rate-limit logic in v1 unless real failures show it is needed. - ---- - -## Security and safety - -* Reuse existing MCP auth. -* Do not expose a broad `force=true` option in v1. -* Default to non-archived thoughts only. -* Do not mutate raw thought text or metadata during backfill. -* Treat embeddings as derived data that may be regenerated safely. - ---- - -## Testing plan - -### Store tests - -Add tests for: - -* listing thoughts missing embeddings for a model -* project-scoped missing-embedding queries -* archived thought filtering -* idempotent upsert behavior - -### Tool tests - -Add tests for: - -* dry-run mode -* successful batch embedding -* partial provider failures -* empty result set -* project resolution -* context cancellation - -### Integration tests - -Add a flow covering: - -1. create thoughts without embeddings for a target model -2. run `backfill_embeddings` -3. confirm rows exist in `embeddings` -4. confirm `search_thoughts` can now retrieve them when using that model - -### Fallback search tests - -Add coverage for: - -* no embeddings for model -> `search_thoughts` uses Postgres text search -* project-scoped queries only search matching project thoughts -* archived thoughts stay excluded by default -* `related_thoughts` falls back to text search neighbors when semantic vectors are unavailable -* once embeddings exist, semantic search remains the primary path - ---- - -## Rollout order - -1. Add store helpers for missing-embedding selection and embedding upsert. -2. Add Postgres full-text index migration and text-search store helpers. -3. Add shared semantic-or-text fallback retrieval logic for query-based tools. -4. Add `backfill_embeddings` MCP tool and wire it into the server. -5. Add unit and integration tests. -6. Document usage in `README.md`. -7. Add optional background auto-runner behind config. -8. Consider a future `force` or `reindex_model` path only after v1 is stable. - ---- - -## Open questions - -* Should the tool expose `batch_size` to clients, or should batching stay internal? -* Should the first version support only the active model, or allow a `model` override for admins? -* Should archived thoughts be backfilled by default during startup jobs but not MCP calls? -* Do we want a separate CLI/admin command for large one-time reindex jobs outside MCP? - -Recommended answers for v1: - -* keep batching mostly internal -* use only the active configured model -* exclude archived thoughts by default everywhere -* postpone a dedicated CLI until volume justifies it - ---- - -## Nice follow-ups - -* add a `missing_embeddings` stat to `thought_stats` -* expose a read-only tool for counting missing embeddings by project -* add a re-embed path for migrating from one model to another in controlled waves -* add metadata extraction backfill as a separate job if imported content often lacks metadata -* expose the retrieval mode in responses for easier debugging of semantic vs text fallback +If needed, recover the older version from git history.