# Avalon Memory Crystal Server  (amcs)

![Avalon Memory Crystal](assets/avelonmemorycrystal.jpg)

A Go MCP server for capturing and retrieving thoughts, memory, and project context. Exposes tools over Streamable HTTP, backed by Postgres with pgvector for semantic search.

## What it does

- **Capture** thoughts with automatic embedding and metadata extraction
- **Search** thoughts semantically via vector similarity
- **Organise** thoughts into projects and retrieve full project context
- **Summarise** and recall memory across topics and time windows
- **Link** related thoughts and traverse relationships

## Stack

- Go — MCP server over Streamable HTTP
- Postgres + pgvector — storage and vector search
- LiteLLM — primary hosted AI provider (embeddings + metadata extraction)
- OpenRouter — default upstream behind LiteLLM
- Ollama — supported local or self-hosted OpenAI-compatible provider

## Tools

| Tool | Purpose |
|---|---|
| `capture_thought` | Store a thought with embedding and metadata |
| `search_thoughts` | Semantic similarity search |
| `list_thoughts` | Filter thoughts by type, topic, person, date |
| `thought_stats` | Counts and top topics/people |
| `get_thought` | Retrieve a thought by ID |
| `update_thought` | Patch content or metadata |
| `delete_thought` | Hard delete |
| `archive_thought` | Soft delete |
| `create_project` | Register a named project |
| `list_projects` | List projects with thought counts |
| `get_project_context` | Recent + semantic context for a project |
| `set_active_project` | Set session project scope |
| `get_active_project` | Get current session project |
| `summarize_thoughts` | LLM prose summary over a filtered set |
| `recall_context` | Semantic + recency context block for injection |
| `link_thoughts` | Create a typed relationship between thoughts |
| `related_thoughts` | Explicit links + semantic neighbours |
| `backfill_embeddings` | Generate missing embeddings for stored thoughts |
| `reparse_thought_metadata` | Re-extract and normalize metadata for stored thoughts |

## Configuration

Config is YAML-driven. Copy `configs/config.example.yaml` and set:

- `database.url` — Postgres connection string
- `auth.mode` — `api_keys` or `oauth_client_credentials`
- `auth.keys` — API keys for MCP access via `x-brain-key` or `Authorization: Bearer <key>` when `auth.mode=api_keys`
- `auth.oauth.clients` — client registry when `auth.mode=oauth_client_credentials`

**OAuth Client Credentials flow** (`auth.mode=oauth_client_credentials`):

1. Obtain a token — `POST /oauth/token` (public, no auth required):
   ```
   POST /oauth/token
   Content-Type: application/x-www-form-urlencoded
   Authorization: Basic base64(client_id:client_secret)

   grant_type=client_credentials
   ```
   Returns: `{"access_token": "...", "token_type": "bearer", "expires_in": 3600}`

2. Use the token on the MCP endpoint:
   ```
   Authorization: Bearer <access_token>
   ```

Alternatively, pass `client_id` and `client_secret` as body parameters instead of `Authorization: Basic`. Direct `Authorization: Basic` credential validation on the MCP endpoint is also supported as a fallback (no token required).
- `ai.litellm.base_url` and `ai.litellm.api_key` — LiteLLM proxy
- `ai.ollama.base_url` and `ai.ollama.api_key` — Ollama local or remote server

See `llm/plan.md` for full architecture and implementation plan.

## Backfill

Run `backfill_embeddings` after switching embedding models or importing thoughts without vectors.

```json
{
  "project": "optional-project-name",
  "limit": 100,
  "include_archived": false,
  "older_than_days": 0,
  "dry_run": false
}
```

- `dry_run: true` — report counts without calling the embedding provider
- `limit` — max thoughts per call (default 100)
- Embeddings are generated in parallel (4 workers) and upserted; one failure does not abort the run

## Metadata Reparse

Run `reparse_thought_metadata` to fix stale or inconsistent metadata by re-extracting it from thought content.

```json
{
  "project": "optional-project-name",
  "limit": 100,
  "include_archived": false,
  "older_than_days": 0,
  "dry_run": false
}
```

- `dry_run: true` scans only and does not call metadata extraction or write updates
- If extraction fails for a thought, existing metadata is normalized and written only if it changes
- Metadata reparse runs in parallel (4 workers); one failure does not abort the run

**Automatic backfill** (optional, config-gated):

```yaml
backfill:
  enabled: true
  run_on_startup: true   # run once on server start
  interval: "15m"        # repeat every 15 minutes
  batch_size: 20
  max_per_run: 100
  include_archived: false
```

**Search fallback**: when no embeddings exist for the active model in scope, `search_thoughts`, `recall_context`, `get_project_context`, `summarize_thoughts`, and `related_thoughts` automatically fall back to Postgres full-text search so results are never silently empty.

## Client Setup

### Claude Code

```bash
# API key auth
claude mcp add --transport http amcs http://localhost:8080/mcp --header "x-brain-key: <key>"

# Bearer token auth
claude mcp add --transport http amcs http://localhost:8080/mcp --header "Authorization: Bearer <token>"
```

### OpenAI Codex

Add to `~/.codex/config.toml`:

```toml
[[mcp_servers]]
name = "amcs"
url  = "http://localhost:8080/mcp"

[mcp_servers.headers]
x-brain-key = "<key>"
```

### OpenCode

```bash
# API key auth
opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "x-brain-key=<key>"

# Bearer token auth
opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "Authorization=Bearer <token>"
```

Or add directly to `opencode.json` / `~/.config/opencode/config.json`:

```json
{
  "mcp": {
    "amcs": {
      "type": "remote",
      "url": "http://localhost:8080/mcp",
      "headers": {
        "x-brain-key": "<key>"
      }
    }
  }
}
```

## Development

Run the SQL migrations against a local database with:

`DATABASE_URL=postgres://... make migrate`

LLM integration instructions are served at `/llm`.

## Containers

The repo now includes a `Dockerfile` and Compose files for running the app with Postgres + pgvector.

1. Set a real LiteLLM key in your shell:
   `export AMCS_LITELLM_API_KEY=your-key`
2. Start the stack with your runtime:
   `docker compose -f docker-compose.yml -f docker-compose.docker.yml up --build`
   `podman compose -f docker-compose.yml up --build`
3. Call the service on `http://localhost:8080`

Notes:

- The app uses `configs/docker.yaml` inside the container.
- The local `./configs` directory is mounted into `/app/configs`, so config edits apply without rebuilding the image.
- `AMCS_LITELLM_BASE_URL` overrides the LiteLLM endpoint, so you can retarget it without editing YAML.
- `AMCS_OLLAMA_BASE_URL` overrides the Ollama endpoint for local or remote servers.
- The Compose stack uses a default bridge network named `amcs`.
- The base Compose file uses `host.containers.internal`, which is Podman-friendly.
- The Docker override file adds `host-gateway` aliases so Docker can resolve the same host endpoint.
- Database migrations `001` through `005` run automatically when the Postgres volume is created for the first time.
- `migrations/006_rls_and_grants.sql` is intentionally skipped during container bootstrap because it contains deployment-specific grants for a role named `amcs_user`.

## Ollama

Set `ai.provider: "ollama"` to use a local or self-hosted Ollama server through its OpenAI-compatible API.

Example:

```yaml
ai:
  provider: "ollama"
  embeddings:
    model: "nomic-embed-text"
    dimensions: 768
  metadata:
    model: "llama3.2"
    temperature: 0.1
  ollama:
    base_url: "http://localhost:11434/v1"
    api_key: "ollama"
    request_headers: {}
```

Notes:

- For remote Ollama servers, point `ai.ollama.base_url` at the remote `/v1` endpoint.
- The client always sends Bearer auth; Ollama ignores it locally, so `api_key: "ollama"` is a safe default.
- `ai.embeddings.dimensions` must match the embedding model you actually use, or startup will fail the database vector-dimension check.