Files
amcs/llm/plan.md

35 KiB
Raw Blame History

Avalon Memory Crystal Server (amcs)

OB1 in Go — LiteLLM-First Implementation Plan

Based of the Open Brain project. Reference it for detail: https://github.com/NateBJones-Projects/OB1

Objective

Build a Go implementation of the OB1 project with:

  • LiteLLM as the primary AI provider

  • OpenRouter as the default upstream behind LiteLLM

  • config-file-based keys and auth tokens

  • MCP over Streamable HTTP

  • Postgres with pgvector

  • parity with the current OB1 toolset:

    • search_thoughts
    • list_thoughts
    • thought_stats
    • capture_thought
  • extended toolset for memory and project management:

    • get_thought
    • update_thought
    • delete_thought
    • archive_thought
    • create_project
    • list_projects
    • get_project_context
    • set_active_project
    • get_active_project
    • summarize_thoughts
    • recall_context
    • link_thoughts
    • related_thoughts

The current OB1 reference implementation is a small MCP server backed by a thoughts table in Postgres, a match_thoughts(...) vector-search function, and OpenRouter calls for embeddings plus metadata extraction.


Why LiteLLM should be the primary provider

LiteLLM is the right primary abstraction because it gives one stable OpenAI-compatible API surface while allowing routing to multiple upstream providers, including OpenRouter. LiteLLM documents OpenAI-compatible proxy endpoints, including /v1/embeddings, and explicitly supports OpenRouter-backed models.

That gives us:

  • one provider contract in the Go app
  • centralized key management
  • easier model swaps
  • support for multiple upstreams later
  • simpler production operations

Provider strategy

Primary runtime mode

  • App -> LiteLLM Proxy -> OpenRouter / other providers

Fallback mode

  • App -> OpenRouter directly

We will support both, but the codebase will be designed LiteLLM-first.


Scope of the Go service

The Go service will provide:

  1. MCP server over Streamable HTTP
  2. API-key authentication
  3. thought capture
  4. semantic search
  5. thought listing
  6. thought statistics
  7. thought lifecycle management (get, update, delete, archive)
  8. project grouping and context (create, list, get context, active project)
  9. memory summarization and context recall
  10. thought linking and relationship traversal
  11. provider abstraction for embeddings + metadata extraction
  12. config-file-driven startup

Functional requirements

Required v1 features

  • start from YAML config

  • connect to Postgres

  • use pgvector for embeddings

  • call LiteLLM for:

    • embeddings
    • metadata extraction
  • expose MCP tools over HTTP

  • protect the MCP endpoint with configured API keys

  • preserve OB1-compatible tool semantics

  • store thoughts with metadata and embeddings

  • search via match_thoughts(...)

Deferred features

  • Slack ingestion
  • webhook ingestion
  • async metadata extraction
  • per-user tenancy
  • admin UI
  • background enrichment jobs
  • multi-provider routing policy inside the app

Reference behavior to preserve

The current OB1 server is small and direct:

  • it stores thoughts in a thoughts table
  • it uses vector similarity for semantic search
  • it exposes four MCP tools
  • it uses an access key for auth
  • it generates embeddings and extracts metadata via OpenRouter.

The Go version should preserve those behaviors first, then improve structure and operability.


Architecture

                +----------------------+
                | MCP Client / AI App  |
                +----------+-----------+
                           |
                           | Streamable HTTP
                           v
                +----------------------+
                | Go OB1 Server        |
                | auth + MCP tools     |
                +----+-----------+-----+
                     |           |
                     |           |
                     v           v
          +----------------+   +----------------------+
          | LiteLLM Proxy  |   | Postgres + pgvector  |
          | embeddings     |   | thoughts + pgvector  |
          | metadata       |   | RPC/search SQL       |
          +--------+-------+   +----------------------+
                   |
                   v
            +-------------+
            | OpenRouter  |
            | or others   |
            +-------------+

High-level design

Core components

  1. Config subsystem

    • load YAML
    • apply env overrides
    • validate required fields
  2. Auth subsystem

    • API-key validation
    • header-based auth
    • optional query-param auth
  3. AI provider subsystem

    • provider interface
    • LiteLLM implementation
    • optional OpenRouter direct implementation
  4. Store subsystem

    • Postgres connection pool
    • insert/search/list/stats operations
    • pgvector support
  5. MCP subsystem

    • MCP server
    • tool registration
    • HTTP transport
  6. Observability subsystem

    • structured logs
    • metrics
    • health checks

Project layout

ob1-go/
  cmd/
    ob1-server/
      main.go

  internal/
    app/
      app.go

    config/
      config.go
      loader.go
      validate.go

    auth/
      middleware.go
      keyring.go

    ai/
      provider.go
      factory.go
      prompts.go
      types.go

      litellm/
        client.go
        embeddings.go
        metadata.go

      openrouter/
        client.go
        embeddings.go
        metadata.go

    mcpserver/
      server.go
      transport.go

    tools/
      search.go
      list.go
      stats.go
      capture.go
      get.go
      update.go
      delete.go
      archive.go
      projects.go
      context.go
      summarize.go
      recall.go
      links.go

    store/
      db.go
      thoughts.go
      stats.go
      projects.go
      links.go

    metadata/
      schema.go
      normalize.go
      validate.go

    types/
      thought.go
      filters.go

    observability/
      logger.go
      metrics.go
      tracing.go

  migrations/
    001_enable_vector.sql
    002_create_thoughts.sql
    003_add_projects.sql
    004_create_thought_links.sql
    005_create_match_thoughts.sql
    006_rls_and_grants.sql

  configs/
    config.example.yaml
    dev.yaml

  scripts/
    run-local.sh
    migrate.sh

  go.mod
  README.md

Dependencies

Required Go packages

  • github.com/modelcontextprotocol/go-sdk
  • github.com/jackc/pgx/v5
  • github.com/pgvector/pgvector-go
  • gopkg.in/yaml.v3
  • github.com/go-playground/validator/v10
  • github.com/google/uuid

Standard library usage

  • net/http
  • context
  • log/slog
  • time
  • encoding/json

The Go MCP SDK is the right fit for implementing an MCP server in Go, and pgvector-go is the expected library for Go integration with pgvector-backed Postgres columns.


Config model

Config files are the primary source of truth.

Rules

  • use YAML config files
  • allow environment overrides
  • do not commit real secrets
  • commit only config.example.yaml
  • keep local secrets in ignored files
  • in production, mount config files as secrets or use env overrides for sensitive values

Example config

server:
  host: "0.0.0.0"
  port: 8080
  read_timeout: "15s"
  write_timeout: "30s"
  idle_timeout: "60s"
  allowed_origins:
    - "*"

mcp:
  path: "/mcp"
  server_name: "open-brain"
  version: "1.0.0"
  transport: "streamable_http"

auth:
  mode: "api_keys"
  header_name: "x-brain-key"
  query_param: "key"
  allow_query_param: false
  keys:
    - id: "local-client"
      value: "replace-me"
      description: "main local client key"

database:
  url: "postgres://user:pass@localhost:5432/ob1?sslmode=disable"
  max_conns: 10
  min_conns: 2
  max_conn_lifetime: "30m"
  max_conn_idle_time: "10m"

ai:
  provider: "litellm"

  embeddings:
    model: "openai/text-embedding-3-small"
    dimensions: 1536

  metadata:
    model: "gpt-4o-mini"
    temperature: 0.1

  litellm:
    base_url: "http://localhost:4000/v1"
    api_key: "replace-me"
    use_responses_api: false
    request_headers: {}
    embedding_model: "openrouter/openai/text-embedding-3-small"
    metadata_model: "gpt-4o-mini"

  openrouter:
    base_url: "https://openrouter.ai/api/v1"
    api_key: ""
    app_name: "ob1-go"
    site_url: ""
    extra_headers: {}

capture:
  source: "mcp"
  metadata_defaults:
    type: "observation"
    topic_fallback: "uncategorized"

search:
  default_limit: 10
  default_threshold: 0.5
  max_limit: 50

logging:
  level: "info"
  format: "json"

observability:
  metrics_enabled: true
  pprof_enabled: false

Config structs

type Config struct {
    Server        ServerConfig        `yaml:"server"`
    MCP           MCPConfig           `yaml:"mcp"`
    Auth          AuthConfig          `yaml:"auth"`
    Database      DatabaseConfig      `yaml:"database"`
    AI            AIConfig            `yaml:"ai"`
    Capture       CaptureConfig       `yaml:"capture"`
    Search        SearchConfig        `yaml:"search"`
    Logging       LoggingConfig       `yaml:"logging"`
    Observability ObservabilityConfig `yaml:"observability"`
}

type ServerConfig struct {
    Host           string        `yaml:"host"`
    Port           int           `yaml:"port"`
    ReadTimeout    time.Duration `yaml:"read_timeout"`
    WriteTimeout   time.Duration `yaml:"write_timeout"`
    IdleTimeout    time.Duration `yaml:"idle_timeout"`
    AllowedOrigins []string      `yaml:"allowed_origins"`
}

type MCPConfig struct {
    Path       string `yaml:"path"`
    ServerName string `yaml:"server_name"`
    Version    string `yaml:"version"`
    Transport  string `yaml:"transport"`
}

type AuthConfig struct {
    Mode            string   `yaml:"mode"`
    HeaderName      string   `yaml:"header_name"`
    QueryParam      string   `yaml:"query_param"`
    AllowQueryParam bool     `yaml:"allow_query_param"`
    Keys            []APIKey `yaml:"keys"`
}

type APIKey struct {
    ID          string `yaml:"id"`
    Value       string `yaml:"value"`
    Description string `yaml:"description"`
}

type DatabaseConfig struct {
    URL             string        `yaml:"url"`
    MaxConns        int32         `yaml:"max_conns"`
    MinConns        int32         `yaml:"min_conns"`
    MaxConnLifetime time.Duration `yaml:"max_conn_lifetime"`
    MaxConnIdleTime time.Duration `yaml:"max_conn_idle_time"`
}

type AIConfig struct {
    Provider   string              `yaml:"provider"` // litellm | openrouter
    Embeddings AIEmbeddingConfig   `yaml:"embeddings"`
    Metadata   AIMetadataConfig    `yaml:"metadata"`
    LiteLLM    LiteLLMConfig       `yaml:"litellm"`
    OpenRouter OpenRouterAIConfig  `yaml:"openrouter"`
}

type AIEmbeddingConfig struct {
    Model      string `yaml:"model"`
    Dimensions int    `yaml:"dimensions"`
}

type AIMetadataConfig struct {
    Model       string  `yaml:"model"`
    Temperature float64 `yaml:"temperature"`
}

type LiteLLMConfig struct {
    BaseURL         string            `yaml:"base_url"`
    APIKey          string            `yaml:"api_key"`
    UseResponsesAPI bool              `yaml:"use_responses_api"`
    RequestHeaders  map[string]string `yaml:"request_headers"`
    EmbeddingModel  string            `yaml:"embedding_model"`
    MetadataModel   string            `yaml:"metadata_model"`
}

type OpenRouterAIConfig struct {
    BaseURL      string            `yaml:"base_url"`
    APIKey       string            `yaml:"api_key"`
    AppName      string            `yaml:"app_name"`
    SiteURL      string            `yaml:"site_url"`
    ExtraHeaders map[string]string `yaml:"extra_headers"`
}

Config precedence

Order

  1. --config /path/to/file.yaml
  2. OB1_CONFIG
  3. default ./configs/dev.yaml
  4. environment overrides for specific fields

Suggested env overrides

  • OB1_DATABASE_URL
  • OB1_LITELLM_API_KEY
  • OB1_OPENROUTER_API_KEY
  • OB1_SERVER_PORT

Validation rules

At startup, fail fast if:

  • database.url is empty
  • auth.keys is empty
  • mcp.path is empty
  • ai.provider is unsupported
  • ai.embeddings.dimensions <= 0
  • provider-specific base URL or API key is missing
  • the DB vector dimension does not match configured embedding dimensions

AI provider design

Provider interface

type Provider interface {
    Embed(ctx context.Context, input string) ([]float32, error)
    ExtractMetadata(ctx context.Context, input string) (ThoughtMetadata, error)
    Name() string
}

Factory

func NewProvider(cfg AIConfig, httpClient *http.Client, log *slog.Logger) (Provider, error) {
    switch cfg.Provider {
    case "litellm":
        return litellm.New(cfg, httpClient, log)
    case "openrouter":
        return openrouter.New(cfg, httpClient, log)
    default:
        return nil, fmt.Errorf("unsupported ai.provider: %s", cfg.Provider)
    }
}

LiteLLM-first behavior

Embeddings

The app will call LiteLLM at:

  • POST /v1/embeddings

using an OpenAI-compatible request payload and Bearer auth. LiteLLM documents its proxy embeddings support through OpenAI-compatible endpoints.

Metadata extraction

The app will call LiteLLM at:

  • POST /v1/chat/completions

using:

  • configured metadata model
  • system prompt
  • user message
  • JSON-oriented response handling

LiteLLMs proxy is intended to accept OpenAI-style chat completion requests.

Model routing

In config, use:

  • litellm.embedding_model
  • litellm.metadata_model

These may be:

  • direct model names
  • LiteLLM aliases
  • OpenRouter-backed model identifiers

Example:

litellm:
  embedding_model: "openrouter/openai/text-embedding-3-small"
  metadata_model: "gpt-4o-mini"

LiteLLM documents OpenRouter provider usage and OpenRouter-backed model naming.


OpenRouter fallback mode

If ai.provider: openrouter, the app will directly call:

  • POST /api/v1/embeddings
  • POST /api/v1/chat/completions

with Bearer auth.

OpenRouter documents the embeddings endpoint and its authentication model.

This mode is mainly for:

  • local simplicity
  • debugging provider issues
  • deployments without LiteLLM

Metadata schema

Use one stable metadata schema regardless of provider.

type ThoughtMetadata struct {
    People         []string `json:"people"`
    ActionItems    []string `json:"action_items"`
    DatesMentioned []string `json:"dates_mentioned"`
    Topics         []string `json:"topics"`
    Type           string   `json:"type"`
    Source         string   `json:"source"`
}

Accepted type values

  • observation
  • task
  • idea
  • reference
  • person_note

Normalization rules

  • trim all strings
  • deduplicate arrays
  • drop empty values
  • cap topics count if needed
  • default invalid type to observation
  • set source: "mcp" for MCP-captured thoughts

Fallback defaults

If metadata extraction fails:

{
  "people": [],
  "action_items": [],
  "dates_mentioned": [],
  "topics": ["uncategorized"],
  "type": "observation",
  "source": "mcp"
}

Database design

The DB contract should match the current OB1 structure as closely as possible:

  • thoughts table
  • embedding vector(1536)
  • HNSW index
  • metadata JSONB
  • match_thoughts(...) function

Migrations

001_enable_vector.sql

create extension if not exists vector;

002_create_thoughts.sql

create table if not exists thoughts (
  id uuid default gen_random_uuid() primary key,
  content text not null,
  embedding vector(1536),
  metadata jsonb default '{}'::jsonb,
  created_at timestamptz default now(),
  updated_at timestamptz default now()
);

create index if not exists thoughts_embedding_hnsw_idx
  on thoughts using hnsw (embedding vector_cosine_ops);

create index if not exists thoughts_metadata_gin_idx
  on thoughts using gin (metadata);

create index if not exists thoughts_created_at_idx
  on thoughts (created_at desc);

003_add_projects.sql

create table if not exists projects (
  id          uuid default gen_random_uuid() primary key,
  name        text not null unique,
  description text,
  created_at  timestamptz default now(),
  last_active_at timestamptz default now()
);

alter table thoughts add column if not exists project_id uuid references projects(id);
alter table thoughts add column if not exists archived_at timestamptz;

create index if not exists thoughts_project_id_idx on thoughts (project_id);
create index if not exists thoughts_archived_at_idx on thoughts (archived_at);

004_create_thought_links.sql

create table if not exists thought_links (
  from_id    uuid references thoughts(id) on delete cascade,
  to_id      uuid references thoughts(id) on delete cascade,
  relation   text not null,
  created_at timestamptz default now(),
  primary key (from_id, to_id, relation)
);

create index if not exists thought_links_from_idx on thought_links (from_id);
create index if not exists thought_links_to_idx   on thought_links (to_id);

005_create_match_thoughts.sql

create or replace function match_thoughts(
  query_embedding vector(1536),
  match_threshold float default 0.7,
  match_count int default 10,
  filter jsonb default '{}'::jsonb
)
returns table (
  id uuid,
  content text,
  metadata jsonb,
  similarity float,
  created_at timestamptz
)
language plpgsql
as $$
begin
  return query
  select
    t.id,
    t.content,
    t.metadata,
    1 - (t.embedding <=> query_embedding) as similarity,
    t.created_at
  from thoughts t
  where 1 - (t.embedding <=> query_embedding) > match_threshold
    and (filter = '{}'::jsonb or t.metadata @> filter)
  order by t.embedding <=> query_embedding
  limit match_count;
end;
$$;

006_rls_and_grants.sql

-- Grant full access to the application database user configured in database.url.
-- Replace 'ob1_user' with the actual role name used in your database.url.
grant select, insert, update, delete on table public.thoughts to ob1_user;
grant select, insert, update, delete on table public.projects to ob1_user;
grant select, insert, update, delete on table public.thought_links to ob1_user;

Store layer

Interfaces

type ThoughtStore interface {
    InsertThought(ctx context.Context, thought Thought) error
    GetThought(ctx context.Context, id uuid.UUID) (Thought, error)
    UpdateThought(ctx context.Context, id uuid.UUID, patch ThoughtPatch) (Thought, error)
    DeleteThought(ctx context.Context, id uuid.UUID) error
    ArchiveThought(ctx context.Context, id uuid.UUID) error
    SearchThoughts(ctx context.Context, embedding []float32, threshold float64, limit int, filter map[string]any) ([]SearchResult, error)
    ListThoughts(ctx context.Context, filter ListFilter) ([]Thought, error)
    Stats(ctx context.Context) (ThoughtStats, error)
}

type ProjectStore interface {
    InsertProject(ctx context.Context, project Project) error
    GetProject(ctx context.Context, nameOrID string) (Project, error)
    ListProjects(ctx context.Context) ([]ProjectSummary, error)
    TouchProject(ctx context.Context, id uuid.UUID) error
}

type LinkStore interface {
    InsertLink(ctx context.Context, link ThoughtLink) error
    GetLinks(ctx context.Context, thoughtID uuid.UUID) ([]ThoughtLink, error)
}

DB implementation notes

Use pgxpool.Pool.

On startup:

  • parse DB config
  • create pool
  • register pgvector support
  • ping DB
  • verify required function exists
  • verify vector extension exists

Domain types

type Thought struct {
    ID        uuid.UUID
    Content   string
    Embedding []float32
    Metadata  ThoughtMetadata
    ProjectID *uuid.UUID
    ArchivedAt *time.Time
    CreatedAt time.Time
    UpdatedAt time.Time
}

type SearchResult struct {
    ID         uuid.UUID
    Content    string
    Metadata   ThoughtMetadata
    Similarity float64
    CreatedAt  time.Time
}

type ListFilter struct {
    Limit           int
    Type            string
    Topic           string
    Person          string
    Days            int
    ProjectID       *uuid.UUID
    IncludeArchived bool
}

type ThoughtStats struct {
    TotalCount int
    TypeCounts map[string]int
    TopTopics  []KeyCount
    TopPeople  []KeyCount
}

type ThoughtPatch struct {
    Content  *string
    Metadata *ThoughtMetadata
}

type Project struct {
    ID           uuid.UUID
    Name         string
    Description  string
    CreatedAt    time.Time
    LastActiveAt time.Time
}

type ProjectSummary struct {
    Project
    ThoughtCount int
}

type ThoughtLink struct {
    FromID    uuid.UUID
    ToID      uuid.UUID
    Relation  string
    CreatedAt time.Time
}

Auth design

The reference OB1 implementation uses a configured access key and accepts it via header or query param.

We will keep compatibility but make it cleaner.

Auth behavior

  • primary auth via header, default: x-brain-key
  • optional query param fallback
  • support multiple keys in config
  • attach key ID to request context for auditing

Middleware flow

  1. read configured header
  2. if missing and allowed, read query param
  3. compare against in-memory keyring
  4. if matched, attach key ID to request context
  5. else return 401 Unauthorized

Recommendation

Set:

auth:
  allow_query_param: false

for production.


MCP server design

Expose MCP over Streamable HTTP. MCPs spec defines Streamable HTTP as the remote transport replacing the older HTTP+SSE approach.

HTTP routes

  • POST /mcp
  • GET /healthz
  • GET /readyz

Middleware stack

  • request ID
  • panic recovery
  • structured logging
  • auth
  • timeout
  • optional CORS

MCP tools

1. capture_thought

Input

  • content string

Flow

  1. validate content

  2. concurrently:

    • call provider Embed
    • call provider ExtractMetadata
  3. normalize metadata

  4. set source = "mcp"

  5. insert into thoughts

  6. return success payload

2. search_thoughts

Input

  • query string
  • limit int
  • threshold float

Flow

  1. embed query
  2. call match_thoughts(...)
  3. format ranked results
  4. return results

3. list_thoughts

Input

  • limit
  • type
  • topic
  • person
  • days

Flow

  1. build SQL filters
  2. query thoughts
  3. order by created_at desc
  4. return summaries

4. thought_stats

Input

  • none

Flow

  1. count rows
  2. aggregate metadata usage
  3. return totals and top buckets

5. get_thought

Input

  • id string

Flow

  1. validate UUID
  2. query thoughts by ID
  3. return full record or not-found error

6. update_thought

Input

  • id string
  • content string (optional)
  • metadata map (optional, merged not replaced)

Flow

  1. validate inputs
  2. if content provided: re-embed and re-extract metadata
  3. merge metadata patch
  4. update row, set updated_at
  5. return updated record

7. delete_thought

Input

  • id string

Flow

  1. validate UUID
  2. hard-delete row (cascades to thought_links)
  3. return confirmation

8. archive_thought

Input

  • id string

Flow

  1. validate UUID
  2. set archived_at = now()
  3. return confirmation

Note: archived thoughts are excluded from search and list results by default unless include_archived: true is passed.


9. create_project

Input

  • name string
  • description string (optional)

Flow

  1. validate name uniqueness
  2. insert into projects
  3. return project record

10. list_projects

Input

  • none

Flow

  1. query projects ordered by last_active_at desc
  2. join thought counts per project
  3. return summaries

11. get_project_context

Input

  • project string (name or ID)
  • query string (optional, semantic focus)
  • limit int

Flow

  1. resolve project
  2. fetch recent thoughts in project (last N)
  3. if query provided: semantic search scoped to project
  4. merge and deduplicate results ranked by recency + similarity
  5. update projects.last_active_at
  6. return context block ready for injection

12. set_active_project

Input

  • project string (name or ID)

Flow

  1. resolve project
  2. store project ID in server session context (in-memory, per connection)
  3. return confirmation

13. get_active_project

Input

  • none

Flow

  1. return current session active project or null

14. summarize_thoughts

Input

  • query string (optional topic focus)
  • project string (optional)
  • days int (optional time window)
  • limit int

Flow

  1. fetch matching thoughts via search or filter
  2. format as context
  3. call AI provider to produce prose summary
  4. return summary text

15. recall_context

Input

  • query string
  • project string (optional)
  • limit int

Flow

  1. semantic search with optional project filter
  2. recency boost: merge with most recent N thoughts from project
  3. deduplicate and rank
  4. return formatted context block suitable for pasting into a new conversation

Input

  • from_id string
  • to_id string
  • relation string (e.g. follows_up, contradicts, references, blocks)

Flow

  1. validate both IDs exist
  2. insert into thought_links
  3. return confirmation

Input

  • id string
  • include_semantic bool (default true)

Flow

  1. fetch explicit links from thought_links for this ID
  2. if include_semantic: also fetch nearest semantic neighbours
  3. merge, deduplicate, return with relation type or similarity score

Tool package plan

internal/tools/capture.go

Responsibilities:

  • input validation
  • parallel embed + metadata extraction
  • normalization
  • write to store

internal/tools/search.go

Responsibilities:

  • input validation
  • embed query
  • vector search
  • output formatting

internal/tools/list.go

Responsibilities:

  • filter normalization
  • DB read
  • output formatting

internal/tools/stats.go

Responsibilities:

  • fetch/aggregate stats
  • output shaping

internal/tools/get.go

Responsibilities:

  • UUID validation
  • single thought retrieval

internal/tools/update.go

Responsibilities:

  • partial content/metadata update
  • conditional re-embed if content changed
  • metadata merge

internal/tools/delete.go

Responsibilities:

  • UUID validation
  • hard delete

internal/tools/archive.go

Responsibilities:

  • UUID validation
  • set archived_at

internal/tools/projects.go

Responsibilities:

  • create_project, list_projects
  • set_active_project, get_active_project (session context)

internal/tools/context.go

Responsibilities:

  • get_project_context: resolve project, combine recency + semantic search, return context block
  • update last_active_at on access

internal/tools/summarize.go

Responsibilities:

  • filter/search thoughts
  • format as prompt context
  • call AI provider for prose summary

internal/tools/recall.go

Responsibilities:

  • recall_context: semantic search + recency boost + project filter
  • output formatted context block

internal/tools/links.go

Responsibilities:

  • link_thoughts: validate both IDs, insert link
  • related_thoughts: fetch explicit links + optional semantic neighbours, merge and return

Startup sequence

  1. parse CLI args
  2. load config file
  3. apply env overrides
  4. validate config
  5. initialize logger
  6. create DB pool
  7. verify DB requirements
  8. create AI provider
  9. create store
  10. create tool handlers
  11. register MCP tools
  12. start HTTP server

Error handling policy

Fail fast on startup errors

  • invalid config
  • DB unavailable
  • missing required API keys
  • invalid MCP config
  • provider initialization failure

Retry policy for provider calls

Retry on:

  • 429
  • 500
  • 502
  • 503
  • timeout
  • connection reset

Do not retry on:

  • malformed request
  • auth failure
  • invalid model name
  • invalid response shape after repeated attempts

Use:

  • exponential backoff
  • capped retries
  • context-aware cancellation

Observability

Logging

Use log/slog in JSON mode.

Include:

  • request ID
  • route
  • tool name
  • key ID
  • provider name
  • DB latency
  • upstream latency
  • error class

Metrics

Track:

  • request count by tool
  • request duration
  • provider call duration
  • DB query duration
  • auth failures
  • provider failures
  • insert/search counts

Health checks

/healthz

Returns OK if process is running.

/readyz

Returns OK only if:

  • DB is reachable
  • provider config is valid
  • optional provider probe passes

Security plan

Secrets handling

  • keep secrets in config files only for local/dev use
  • never commit real secrets
  • use mounted secret files or env overrides in production

API key policy

  • support multiple keys
  • identify keys by ID
  • allow key rotation by config update + restart
  • log only key ID, never raw value

Transport

  • run behind TLS terminator in production
  • disable query-param auth in production
  • avoid logging full URLs when query-param auth is enabled

Testing plan

Unit tests

Config

  • valid config loads
  • invalid config fails
  • env overrides apply correctly

Auth

  • header auth success
  • query auth success
  • invalid key rejected
  • disabled query auth rejected

Metadata

  • normalization works
  • invalid types default correctly
  • empty metadata falls back safely

AI provider parsing

  • LiteLLM embeddings parse correctly
  • LiteLLM chat completions parse correctly
  • provider errors classified correctly

Store

  • filter builders generate expected SQL fragments
  • JSONB metadata handling is stable

Integration tests

Run against local Postgres with pgvector.

Test:

  • migrations apply cleanly
  • insert thought
  • search thought
  • list thoughts with filters
  • stats aggregation
  • auth-protected MCP route
  • LiteLLM mock/proxy compatibility

Manual acceptance tests

  1. start local Postgres + pgvector
  2. start LiteLLM
  3. configure LiteLLM to route embeddings to OpenRouter
  4. start Go server
  5. connect MCP client
  6. call capture_thought
  7. call search_thoughts
  8. call list_thoughts
  9. call thought_stats
  10. rotate API key and verify restart behavior

Milestones

Milestone 1 — foundation

Deliver:

  • repo skeleton
  • config loader
  • config validation
  • logger
  • DB connection
  • migrations

Exit criteria:

  • app starts
  • DB connection verified
  • config-driven startup works

Milestone 2 — AI provider layer

Deliver:

  • provider interface
  • LiteLLM implementation
  • OpenRouter fallback implementation
  • metadata prompt
  • normalization

Exit criteria:

  • successful embedding call through LiteLLM
  • successful metadata extraction through LiteLLM
  • vector length validation works

Deliver:

  • capture_thought
  • search_thoughts
  • store methods for insert and vector search

Exit criteria:

  • thoughts can be captured end-to-end
  • semantic search returns results

Milestone 4 — remaining tools

Deliver:

  • list_thoughts
  • thought_stats

Exit criteria:

  • all four tools function through MCP

Milestone 5 — extended memory and project tools

Deliver:

  • get_thought, update_thought, delete_thought, archive_thought
  • create_project, list_projects, set_active_project, get_active_project
  • get_project_context, recall_context
  • summarize_thoughts
  • link_thoughts, related_thoughts
  • migrations 003 and 004 (projects + links tables)

Exit criteria:

  • thoughts can be retrieved, patched, deleted, archived
  • projects can be created and listed
  • get_project_context returns a usable context block
  • summarize_thoughts produces a prose summary via the AI provider
  • thought links can be created and retrieved with semantic neighbours

Milestone 6 — HTTP and auth hardening

Deliver:

  • auth middleware
  • health endpoints
  • structured logs
  • retries
  • timeouts

Exit criteria:

  • endpoint protected
  • logs useful
  • service stable under expected failures

Milestone 7 — production readiness

Deliver:

  • metrics
  • readiness checks
  • key rotation workflow
  • deployment docs

Exit criteria:

  • production deployment is straightforward
  • operational playbook exists

Implementation order

Build in this order:

  1. config
  2. DB + migrations (001, 002, 003, 004)
  3. LiteLLM client
  4. metadata normalization
  5. capture_thought
  6. search_thoughts
  7. MCP HTTP server
  8. auth middleware
  9. list_thoughts
  10. thought_stats
  11. get_thought, update_thought, delete_thought, archive_thought
  12. create_project, list_projects, set_active_project, get_active_project
  13. get_project_context, recall_context
  14. summarize_thoughts
  15. link_thoughts, related_thoughts
  16. logs/metrics/health

This gives usable value early and builds the project/memory layer on a solid foundation.


Services

  • Postgres with pgvector
  • LiteLLM proxy
  • optional OpenRouter upstream
  • Go service

Example shape

docker-compose:
  postgres
  litellm
  ob1-go

Preferred architecture

  • Go service on Fly.io / Cloud Run / Render
  • LiteLLM as separate service
  • Postgres managed externally
  • TLS terminator in front

Why not Edge Functions

The original repo uses Deno Edge Functions because of its chosen deployment environment, but the app behavior is better suited to a normal long-running Go service for maintainability and observability.


Risks and decisions

Risk: embedding dimension mismatch

Mitigation:

  • validate config vs DB on startup

Risk: LiteLLM model alias drift

Mitigation:

  • add readiness probe for configured models

Risk: metadata extraction instability

Mitigation:

  • strong normalization + safe defaults

Risk: single global auth model

Mitigation:

  • acceptable for v1
  • redesign for multi-tenant later

Risk: stats scaling poorly

Mitigation:

  • start with in-memory aggregation
  • move to SQL aggregation if needed

Definition of done for v1

The project is done when:

  • service starts from YAML config
  • LiteLLM is the primary AI provider
  • OpenRouter can be used behind LiteLLM
  • direct OpenRouter mode still works
  • MCP endpoint is authenticated
  • capture_thought stores content, embedding, metadata
  • search_thoughts performs semantic search
  • list_thoughts supports filtering
  • thought_stats returns useful summaries
  • thoughts can be retrieved, updated, deleted, and archived
  • projects can be created, listed, and used to scope captures and searches
  • get_project_context returns a ready-to-use context block
  • recall_context returns a semantically relevant + recent context block
  • summarize_thoughts produces prose summaries via the AI provider
  • thought links can be created and traversed with semantic fallback
  • logs and health checks exist
  • key rotation works via config + restart

Recommendation

Build this as a boring Go service:

  • stdlib HTTP
  • thin MCP server layer
  • thin provider layer
  • thin store layer
  • YAML config
  • explicit interfaces

Do not over-abstract it. The product shape is simple. The goal is a reliable, understandable service with LiteLLM as the stable provider boundary.


Next implementation artifact

The next concrete deliverable should be a starter Go repo skeleton containing:

  • go.mod
  • folder structure
  • main.go
  • config loader
  • example config
  • migration files (001006)
  • provider interface
  • LiteLLM client skeleton
  • store interfaces (ThoughtStore, ProjectStore, LinkStore)
  • domain types including Project, ThoughtLink, ThoughtPatch
  • MCP tool registration stubs for all 17 tools