11 KiB
Avalon Memory Crystal Server (amcs)
A Go MCP server for capturing and retrieving thoughts, memory, and project context. Exposes tools over Streamable HTTP, backed by Postgres with pgvector for semantic search.
What it does
- Capture thoughts with automatic embedding and metadata extraction
- Search thoughts semantically via vector similarity
- Organise thoughts into projects and retrieve full project context
- Summarise and recall memory across topics and time windows
- Link related thoughts and traverse relationships
Stack
- Go — MCP server over Streamable HTTP
- Postgres + pgvector — storage and vector search
- LiteLLM — primary hosted AI provider (embeddings + metadata extraction)
- OpenRouter — default upstream behind LiteLLM
- Ollama — supported local or self-hosted OpenAI-compatible provider
Tools
| Tool | Purpose |
|---|---|
capture_thought |
Store a thought with embedding and metadata |
search_thoughts |
Semantic similarity search |
list_thoughts |
Filter thoughts by type, topic, person, date |
thought_stats |
Counts and top topics/people |
get_thought |
Retrieve a thought by ID |
update_thought |
Patch content or metadata |
delete_thought |
Hard delete |
archive_thought |
Soft delete |
create_project |
Register a named project |
list_projects |
List projects with thought counts |
get_project_context |
Recent + semantic context for a project |
set_active_project |
Set session project scope |
get_active_project |
Get current session project |
summarize_thoughts |
LLM prose summary over a filtered set |
recall_context |
Semantic + recency context block for injection |
link_thoughts |
Create a typed relationship between thoughts |
related_thoughts |
Explicit links + semantic neighbours |
save_file |
Store a base64-encoded image, document, audio file, or other binary and optionally link it to a thought |
load_file |
Retrieve a stored file by ID as base64 plus metadata |
list_files |
Browse stored files by thought, project, or kind |
backfill_embeddings |
Generate missing embeddings for stored thoughts |
reparse_thought_metadata |
Re-extract and normalize metadata for stored thoughts |
retry_failed_metadata |
Retry metadata extraction for thoughts still pending or failed |
Configuration
Config is YAML-driven. Copy configs/config.example.yaml and set:
database.url— Postgres connection stringauth.mode—api_keysoroauth_client_credentialsauth.keys— API keys for MCP access viax-brain-keyorAuthorization: Bearer <key>whenauth.mode=api_keysauth.oauth.clients— client registry whenauth.mode=oauth_client_credentials
OAuth Client Credentials flow (auth.mode=oauth_client_credentials):
-
Obtain a token —
POST /oauth/token(public, no auth required):POST /oauth/token Content-Type: application/x-www-form-urlencoded Authorization: Basic base64(client_id:client_secret) grant_type=client_credentialsReturns:
{"access_token": "...", "token_type": "bearer", "expires_in": 3600} -
Use the token on the MCP endpoint:
Authorization: Bearer <access_token>
Alternatively, pass client_id and client_secret as body parameters instead of Authorization: Basic. Direct Authorization: Basic credential validation on the MCP endpoint is also supported as a fallback (no token required).
ai.litellm.base_urlandai.litellm.api_key— LiteLLM proxyai.ollama.base_urlandai.ollama.api_key— Ollama local or remote server
See llm/plan.md for full architecture and implementation plan.
Backfill
Run backfill_embeddings after switching embedding models or importing thoughts without vectors.
{
"project": "optional-project-name",
"limit": 100,
"include_archived": false,
"older_than_days": 0,
"dry_run": false
}
dry_run: true— report counts without calling the embedding providerlimit— max thoughts per call (default 100)- Embeddings are generated in parallel (4 workers) and upserted; one failure does not abort the run
Metadata Reparse
Run reparse_thought_metadata to fix stale or inconsistent metadata by re-extracting it from thought content.
{
"project": "optional-project-name",
"limit": 100,
"include_archived": false,
"older_than_days": 0,
"dry_run": false
}
dry_run: truescans only and does not call metadata extraction or write updates- If extraction fails for a thought, existing metadata is normalized and written only if it changes
- Metadata reparse runs in parallel (4 workers); one failure does not abort the run
Failed Metadata Retry
capture_thought now stores the thought even when metadata extraction times out or fails. Those thoughts are marked with metadata_status: "pending" and retried in the background. Use retry_failed_metadata to sweep any thoughts still marked pending or failed.
{
"project": "optional-project-name",
"limit": 100,
"include_archived": false,
"older_than_days": 1,
"dry_run": false
}
dry_run: truescans only and does not call metadata extraction or write updates- successful retries mark the thought metadata as
completeand clear the last error - failed retries update the retry markers so the daily sweep can pick them up again later
File Storage
Use save_file to persist binary files as base64. Files can optionally be linked to a memory by passing thought_id, which also adds an attachment reference to that thought's metadata. AI clients should prefer save_file when the goal is to retain the artifact itself, rather than reading or summarizing the file first. Stored files and attachment metadata are not forwarded to the metadata extraction client.
{
"name": "meeting-notes.pdf",
"media_type": "application/pdf",
"kind": "document",
"thought_id": "optional-thought-uuid",
"content_base64": "<base64-payload>"
}
Load a stored file again with:
{
"id": "stored-file-uuid"
}
List files for a thought or project with:
{
"thought_id": "optional-thought-uuid",
"project": "optional-project-name",
"kind": "optional-image-document-audio-file",
"limit": 20
}
AMCS also supports direct authenticated HTTP uploads to /files for clients that want to stream file bodies instead of base64-encoding them into an MCP tool call.
Multipart upload:
curl -X POST http://localhost:8080/files \
-H "x-brain-key: <key>" \
-F "file=@./diagram.png" \
-F "project=amcs" \
-F "kind=image"
Raw body upload:
curl -X POST "http://localhost:8080/files?project=amcs&name=meeting-notes.pdf" \
-H "x-brain-key: <key>" \
-H "Content-Type: application/pdf" \
--data-binary @./meeting-notes.pdf
Automatic backfill (optional, config-gated):
backfill:
enabled: true
run_on_startup: true # run once on server start
interval: "15m" # repeat every 15 minutes
batch_size: 20
max_per_run: 100
include_archived: false
metadata_retry:
enabled: true
run_on_startup: true # retry failed metadata once on server start
interval: "24h" # retry pending/failed metadata daily
max_per_run: 100
include_archived: false
Search fallback: when no embeddings exist for the active model in scope, search_thoughts, recall_context, get_project_context, summarize_thoughts, and related_thoughts automatically fall back to Postgres full-text search so results are never silently empty.
Client Setup
Claude Code
# API key auth
claude mcp add --transport http amcs http://localhost:8080/mcp --header "x-brain-key: <key>"
# Bearer token auth
claude mcp add --transport http amcs http://localhost:8080/mcp --header "Authorization: Bearer <token>"
OpenAI Codex
Add to ~/.codex/config.toml:
[[mcp_servers]]
name = "amcs"
url = "http://localhost:8080/mcp"
[mcp_servers.headers]
x-brain-key = "<key>"
OpenCode
# API key auth
opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "x-brain-key=<key>"
# Bearer token auth
opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "Authorization=Bearer <token>"
Or add directly to opencode.json / ~/.config/opencode/config.json:
{
"mcp": {
"amcs": {
"type": "remote",
"url": "http://localhost:8080/mcp",
"headers": {
"x-brain-key": "<key>"
}
}
}
}
Development
Run the SQL migrations against a local database with:
DATABASE_URL=postgres://... make migrate
LLM integration instructions are served at /llm.
Containers
The repo now includes a Dockerfile and Compose files for running the app with Postgres + pgvector.
- Set a real LiteLLM key in your shell:
export AMCS_LITELLM_API_KEY=your-key - Start the stack with your runtime:
docker compose -f docker-compose.yml -f docker-compose.docker.yml up --buildpodman compose -f docker-compose.yml up --build - Call the service on
http://localhost:8080
Notes:
- The app uses
configs/docker.yamlinside the container. - The local
./configsdirectory is mounted into/app/configs, so config edits apply without rebuilding the image. AMCS_LITELLM_BASE_URLoverrides the LiteLLM endpoint, so you can retarget it without editing YAML.AMCS_OLLAMA_BASE_URLoverrides the Ollama endpoint for local or remote servers.- The Compose stack uses a default bridge network named
amcs. - The base Compose file uses
host.containers.internal, which is Podman-friendly. - The Docker override file adds
host-gatewayaliases so Docker can resolve the same host endpoint. - Database migrations
001through005run automatically when the Postgres volume is created for the first time. migrations/006_rls_and_grants.sqlis intentionally skipped during container bootstrap because it contains deployment-specific grants for a role namedamcs_user.
Ollama
Set ai.provider: "ollama" to use a local or self-hosted Ollama server through its OpenAI-compatible API.
Example:
ai:
provider: "ollama"
embeddings:
model: "nomic-embed-text"
dimensions: 768
metadata:
model: "llama3.2"
temperature: 0.1
ollama:
base_url: "http://localhost:11434/v1"
api_key: "ollama"
request_headers: {}
Notes:
- For remote Ollama servers, point
ai.ollama.base_urlat the remote/v1endpoint. - The client always sends Bearer auth; Ollama ignores it locally, so
api_key: "ollama"is a safe default. ai.embeddings.dimensionsmust match the embedding model you actually use, or startup will fail the database vector-dimension check.
