# Avalon Memory Crystal Server (amcs) ![Avalon Memory Crystal](assets/avelonmemorycrystal.jpg) A Go MCP server for capturing and retrieving thoughts, memory, and project context. Exposes tools over Streamable HTTP, backed by Postgres with pgvector for semantic search. ## What it does - **Capture** thoughts with automatic embedding and metadata extraction - **Search** thoughts semantically via vector similarity - **Organise** thoughts into projects and retrieve full project context - **Summarise** and recall memory across topics and time windows - **Link** related thoughts and traverse relationships ## Stack - Go — MCP server over Streamable HTTP - Postgres + pgvector — storage and vector search - LiteLLM — primary hosted AI provider (embeddings + metadata extraction) - OpenRouter — default upstream behind LiteLLM - Ollama — supported local or self-hosted OpenAI-compatible provider ## Tools | Tool | Purpose | |---|---| | `capture_thought` | Store a thought with embedding and metadata | | `search_thoughts` | Semantic similarity search | | `list_thoughts` | Filter thoughts by type, topic, person, date | | `thought_stats` | Counts and top topics/people | | `get_thought` | Retrieve a thought by ID | | `update_thought` | Patch content or metadata | | `delete_thought` | Hard delete | | `archive_thought` | Soft delete | | `create_project` | Register a named project | | `list_projects` | List projects with thought counts | | `get_project_context` | Recent + semantic context for a project | | `set_active_project` | Set session project scope | | `get_active_project` | Get current session project | | `summarize_thoughts` | LLM prose summary over a filtered set | | `recall_context` | Semantic + recency context block for injection | | `link_thoughts` | Create a typed relationship between thoughts | | `related_thoughts` | Explicit links + semantic neighbours | | `save_file` | Store a file (base64 or by resource URI) and optionally link it to a thought | | `load_file` | Retrieve a stored file by ID; returns metadata, base64 content, and an embedded MCP binary resource | | `list_files` | Browse stored files by thought, project, or kind | | `backfill_embeddings` | Generate missing embeddings for stored thoughts | | `reparse_thought_metadata` | Re-extract and normalize metadata for stored thoughts | | `retry_failed_metadata` | Retry metadata extraction for thoughts still pending or failed | | `add_skill` | Store a reusable agent skill (behavioural instruction or capability prompt) | | `remove_skill` | Delete an agent skill by id | | `list_skills` | List all agent skills, optionally filtered by tag | | `add_guardrail` | Store a reusable agent guardrail (constraint or safety rule) | | `remove_guardrail` | Delete an agent guardrail by id | | `list_guardrails` | List all agent guardrails, optionally filtered by tag or severity | | `add_project_skill` | Link an agent skill to a project | | `remove_project_skill` | Unlink an agent skill from a project | | `list_project_skills` | List all skills linked to a project | | `add_project_guardrail` | Link an agent guardrail to a project | | `remove_project_guardrail` | Unlink an agent guardrail from a project | | `list_project_guardrails` | List all guardrails linked to a project | ## Agent Skills and Guardrails Skills and guardrails are reusable agent behaviour instructions and constraints that can be attached to projects. **At the start of every project session, always call `list_project_skills` and `list_project_guardrails` first.** Use the returned skills and guardrails to guide agent behaviour for that project. Only generate or create new skills/guardrails if none are returned. ### Skills A skill is a reusable behavioural instruction or capability prompt — for example, "always respond in structured markdown" or "break complex tasks into numbered steps before starting". ```json { "name": "structured-output", "description": "Enforce markdown output format", "content": "Always structure responses using markdown headers and bullet points.", "tags": ["formatting"] } ``` ### Guardrails A guardrail is a constraint or safety rule — for example, "never delete files without explicit confirmation" or "do not expose secrets in output". ```json { "name": "no-silent-deletes", "description": "Require confirmation before deletes", "content": "Never delete, drop, or truncate data without first confirming with the user.", "severity": "high", "tags": ["safety"] } ``` Severity levels: `low`, `medium`, `high`, `critical`. ### Project linking Link existing skills and guardrails to a project so they are automatically available when that project is active: ```json { "project": "my-project", "skill_id": "" } { "project": "my-project", "guardrail_id": "" } ``` ## Configuration Config is YAML-driven. Copy `configs/config.example.yaml` and set: - `database.url` — Postgres connection string - `auth.mode` — `api_keys` or `oauth_client_credentials` - `auth.keys` — API keys for MCP access via `x-brain-key` or `Authorization: Bearer ` when `auth.mode=api_keys` - `auth.oauth.clients` — client registry when `auth.mode=oauth_client_credentials` **OAuth Client Credentials flow** (`auth.mode=oauth_client_credentials`): 1. Obtain a token — `POST /oauth/token` (public, no auth required): ``` POST /oauth/token Content-Type: application/x-www-form-urlencoded Authorization: Basic base64(client_id:client_secret) grant_type=client_credentials ``` Returns: `{"access_token": "...", "token_type": "bearer", "expires_in": 3600}` 2. Use the token on the MCP endpoint: ``` Authorization: Bearer ``` Alternatively, pass `client_id` and `client_secret` as body parameters instead of `Authorization: Basic`. Direct `Authorization: Basic` credential validation on the MCP endpoint is also supported as a fallback (no token required). - `ai.litellm.base_url` and `ai.litellm.api_key` — LiteLLM proxy - `ai.ollama.base_url` and `ai.ollama.api_key` — Ollama local or remote server See `llm/plan.md` for full architecture and implementation plan. ## Backfill Run `backfill_embeddings` after switching embedding models or importing thoughts without vectors. ```json { "project": "optional-project-name", "limit": 100, "include_archived": false, "older_than_days": 0, "dry_run": false } ``` - `dry_run: true` — report counts without calling the embedding provider - `limit` — max thoughts per call (default 100) - Embeddings are generated in parallel (4 workers) and upserted; one failure does not abort the run ## Metadata Reparse Run `reparse_thought_metadata` to fix stale or inconsistent metadata by re-extracting it from thought content. ```json { "project": "optional-project-name", "limit": 100, "include_archived": false, "older_than_days": 0, "dry_run": false } ``` - `dry_run: true` scans only and does not call metadata extraction or write updates - If extraction fails for a thought, existing metadata is normalized and written only if it changes - Metadata reparse runs in parallel (4 workers); one failure does not abort the run ## Failed Metadata Retry `capture_thought` now stores the thought even when metadata extraction times out or fails. Those thoughts are marked with `metadata_status: "pending"` and retried in the background. Use `retry_failed_metadata` to sweep any thoughts still marked `pending` or `failed`. ```json { "project": "optional-project-name", "limit": 100, "include_archived": false, "older_than_days": 1, "dry_run": false } ``` - `dry_run: true` scans only and does not call metadata extraction or write updates - successful retries mark the thought metadata as `complete` and clear the last error - failed retries update the retry markers so the daily sweep can pick them up again later ## File Storage Files can optionally be linked to a thought by passing `thought_id`, which also adds an attachment reference to that thought's metadata. AI clients should prefer `save_file` when the goal is to retain the artifact itself, rather than reading or summarizing the file first. Stored files and attachment metadata are not forwarded to the metadata extraction client. ### MCP tools **Save via base64** (small files or when HTTP is not available): ```json { "name": "meeting-notes.pdf", "media_type": "application/pdf", "kind": "document", "thought_id": "optional-thought-uuid", "content_base64": "" } ``` **Save via resource URI** (preferred for binary; avoids base64 overhead): Upload the file binary via HTTP first (see below), then pass the returned URI to `save_file`: ```json { "name": "meeting-notes.pdf", "thought_id": "optional-thought-uuid", "content_uri": "amcs://files/" } ``` `content_base64` and `content_uri` are mutually exclusive. **Load a file** — returns metadata, base64 content, and an embedded MCP binary resource (`amcs://files/{id}`): ```json { "id": "stored-file-uuid" } ``` **List files** for a thought or project: ```json { "thought_id": "optional-thought-uuid", "project": "optional-project-name", "kind": "optional-image-document-audio-file", "limit": 20 } ``` ### MCP resources Stored files are also exposed as MCP resources at `amcs://files/{id}`. MCP clients can read raw binary content directly via `resources/read` without going through `load_file`. ### HTTP upload and download Direct HTTP access avoids base64 encoding entirely. The Go server caps `/files` uploads at 100 MB per request. Large uploads are also subject to available memory, Postgres limits, and any reverse proxy or load balancer in front of AMCS. Multipart upload: ```bash curl -X POST http://localhost:8080/files \ -H "x-brain-key: " \ -F "file=@./diagram.png" \ -F "project=amcs" \ -F "kind=image" ``` Raw body upload: ```bash curl -X POST "http://localhost:8080/files?project=amcs&name=meeting-notes.pdf" \ -H "x-brain-key: " \ -H "Content-Type: application/pdf" \ --data-binary @./meeting-notes.pdf ``` Binary download: ```bash curl http://localhost:8080/files/ \ -H "x-brain-key: " \ -o meeting-notes.pdf ``` **Automatic backfill** (optional, config-gated): ```yaml backfill: enabled: true run_on_startup: true # run once on server start interval: "15m" # repeat every 15 minutes batch_size: 20 max_per_run: 100 include_archived: false ``` ```yaml metadata_retry: enabled: true run_on_startup: true # retry failed metadata once on server start interval: "24h" # retry pending/failed metadata daily max_per_run: 100 include_archived: false ``` **Search fallback**: when no embeddings exist for the active model in scope, `search_thoughts`, `recall_context`, `get_project_context`, `summarize_thoughts`, and `related_thoughts` automatically fall back to Postgres full-text search so results are never silently empty. ## Client Setup ### Claude Code ```bash # API key auth claude mcp add --transport http amcs http://localhost:8080/mcp --header "x-brain-key: " # Bearer token auth claude mcp add --transport http amcs http://localhost:8080/mcp --header "Authorization: Bearer " ``` ### OpenAI Codex Add to `~/.codex/config.toml`: ```toml [[mcp_servers]] name = "amcs" url = "http://localhost:8080/mcp" [mcp_servers.headers] x-brain-key = "" ``` ### OpenCode ```bash # API key auth opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "x-brain-key=" # Bearer token auth opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "Authorization=Bearer " ``` Or add directly to `opencode.json` / `~/.config/opencode/config.json`: ```json { "mcp": { "amcs": { "type": "remote", "url": "http://localhost:8080/mcp", "headers": { "x-brain-key": "" } } } } ``` ## Apache Proxy If AMCS is deployed behind Apache HTTP Server, configure the proxy explicitly for larger uploads and longer-running requests. Example virtual host settings for the current AMCS defaults: ```apache ServerName amcs.example.com ProxyPreserveHost On LimitRequestBody 104857600 RequestReadTimeout handshake=0 header=20-40,MinRate=500 body=600,MinRate=500 Timeout 600 ProxyTimeout 600 ProxyPass /mcp http://127.0.0.1:8080/mcp connectiontimeout=30 timeout=600 ProxyPassReverse /mcp http://127.0.0.1:8080/mcp ProxyPass /files http://127.0.0.1:8080/files connectiontimeout=30 timeout=600 ProxyPassReverse /files http://127.0.0.1:8080/files ``` Recommended Apache settings: - `LimitRequestBody 104857600` matches AMCS's 100 MB `/files` upload cap. - `RequestReadTimeout ... body=600` gives clients up to 10 minutes to send larger request bodies. - `ProxyTimeout 600` and `ProxyPass ... timeout=600` give Apache enough time to wait for the Go backend. - If another proxy or load balancer sits in front of Apache, align its size and timeout settings too. ## Development Run the SQL migrations against a local database with: `DATABASE_URL=postgres://... make migrate` LLM integration instructions are served at `/llm`. ## Containers The repo now includes a `Dockerfile` and Compose files for running the app with Postgres + pgvector. 1. Set a real LiteLLM key in your shell: `export AMCS_LITELLM_API_KEY=your-key` 2. Start the stack with your runtime: `docker compose -f docker-compose.yml -f docker-compose.docker.yml up --build` `podman compose -f docker-compose.yml up --build` 3. Call the service on `http://localhost:8080` Notes: - The app uses `configs/docker.yaml` inside the container. - The local `./configs` directory is mounted into `/app/configs`, so config edits apply without rebuilding the image. - `AMCS_LITELLM_BASE_URL` overrides the LiteLLM endpoint, so you can retarget it without editing YAML. - `AMCS_OLLAMA_BASE_URL` overrides the Ollama endpoint for local or remote servers. - The Compose stack uses a default bridge network named `amcs`. - The base Compose file uses `host.containers.internal`, which is Podman-friendly. - The Docker override file adds `host-gateway` aliases so Docker can resolve the same host endpoint. - Database migrations `001` through `005` run automatically when the Postgres volume is created for the first time. - `migrations/006_rls_and_grants.sql` is intentionally skipped during container bootstrap because it contains deployment-specific grants for a role named `amcs_user`. ## Ollama Set `ai.provider: "ollama"` to use a local or self-hosted Ollama server through its OpenAI-compatible API. Example: ```yaml ai: provider: "ollama" embeddings: model: "nomic-embed-text" dimensions: 768 metadata: model: "llama3.2" temperature: 0.1 ollama: base_url: "http://localhost:11434/v1" api_key: "ollama" request_headers: {} ``` Notes: - For remote Ollama servers, point `ai.ollama.base_url` at the remote `/v1` endpoint. - The client always sends Bearer auth; Ollama ignores it locally, so `api_key: "ollama"` is a safe default. - `ai.embeddings.dimensions` must match the embedding model you actually use, or startup will fail the database vector-dimension check.