amcs/README.md

# Avalon Memory Crystal Server  (amcs)

![Avalon Memory Crystal](assets/avelonmemorycrystal.jpg)

A Go MCP server for capturing and retrieving thoughts, memory, and project context. Exposes tools over Streamable HTTP, backed by Postgres with pgvector for semantic search.

## What it does

- **Capture** thoughts with automatic embedding and metadata extraction
- **Search** thoughts semantically via vector similarity
- **Organise** thoughts into projects and retrieve full project context
- **Summarise** and recall memory across topics and time windows
- **Link** related thoughts and traverse relationships

## Stack

- Go — MCP server over Streamable HTTP
- Postgres + pgvector — storage and vector search
- LiteLLM — primary hosted AI provider (embeddings + metadata extraction)
- OpenRouter — default upstream behind LiteLLM
- Ollama — supported local or self-hosted OpenAI-compatible provider

## Tools

| Tool | Purpose |
|---|---|
| `capture_thought` | Store a thought with embedding and metadata |
| `search_thoughts` | Semantic similarity search |
| `list_thoughts` | Filter thoughts by type, topic, person, date |
| `thought_stats` | Counts and top topics/people |
| `get_thought` | Retrieve a thought by ID |
| `update_thought` | Patch content or metadata |
| `delete_thought` | Hard delete |
| `archive_thought` | Soft delete |
| `create_project` | Register a named project |
| `list_projects` | List projects with thought counts |
| `get_project_context` | Recent + semantic context for a project |
| `set_active_project` | Set session project scope |
| `get_active_project` | Get current session project |
| `summarize_thoughts` | LLM prose summary over a filtered set |
| `recall_context` | Semantic + recency context block for injection |
| `link_thoughts` | Create a typed relationship between thoughts |
| `related_thoughts` | Explicit links + semantic neighbours |
| `save_file` | Store a base64-encoded image, document, audio file, or other binary and optionally link it to a thought |
| `load_file` | Retrieve a stored file by ID as base64 plus metadata |
| `list_files` | Browse stored files by thought, project, or kind |
| `backfill_embeddings` | Generate missing embeddings for stored thoughts |
| `reparse_thought_metadata` | Re-extract and normalize metadata for stored thoughts |
| `retry_failed_metadata` | Retry metadata extraction for thoughts still pending or failed |
| `add_skill` | Store a reusable agent skill (behavioural instruction or capability prompt) |
| `remove_skill` | Delete an agent skill by id |
| `list_skills` | List all agent skills, optionally filtered by tag |
| `add_guardrail` | Store a reusable agent guardrail (constraint or safety rule) |
| `remove_guardrail` | Delete an agent guardrail by id |
| `list_guardrails` | List all agent guardrails, optionally filtered by tag or severity |
| `add_project_skill` | Link an agent skill to a project |
| `remove_project_skill` | Unlink an agent skill from a project |
| `list_project_skills` | List all skills linked to a project |
| `add_project_guardrail` | Link an agent guardrail to a project |
| `remove_project_guardrail` | Unlink an agent guardrail from a project |
| `list_project_guardrails` | List all guardrails linked to a project |

## Agent Skills and Guardrails

Skills and guardrails are reusable agent behaviour instructions and constraints that can be attached to projects.

**At the start of every project session, always call `list_project_skills` and `list_project_guardrails` first.** Use the returned skills and guardrails to guide agent behaviour for that project. Only generate or create new skills/guardrails if none are returned.

### Skills

A skill is a reusable behavioural instruction or capability prompt — for example, "always respond in structured markdown" or "break complex tasks into numbered steps before starting".

```json
{ "name": "structured-output", "description": "Enforce markdown output format", "content": "Always structure responses using markdown headers and bullet points.", "tags": ["formatting"] }
```

### Guardrails

A guardrail is a constraint or safety rule — for example, "never delete files without explicit confirmation" or "do not expose secrets in output".

```json
{ "name": "no-silent-deletes", "description": "Require confirmation before deletes", "content": "Never delete, drop, or truncate data without first confirming with the user.", "severity": "high", "tags": ["safety"] }
```

Severity levels: `low`, `medium`, `high`, `critical`.

### Project linking

Link existing skills and guardrails to a project so they are automatically available when that project is active:

```json
{ "project": "my-project", "skill_id": "<uuid>" }
{ "project": "my-project", "guardrail_id": "<uuid>" }
```

## Configuration

Config is YAML-driven. Copy `configs/config.example.yaml` and set:

- `database.url` — Postgres connection string
- `auth.mode` — `api_keys` or `oauth_client_credentials`
- `auth.keys` — API keys for MCP access via `x-brain-key` or `Authorization: Bearer <key>` when `auth.mode=api_keys`
- `auth.oauth.clients` — client registry when `auth.mode=oauth_client_credentials`

**OAuth Client Credentials flow** (`auth.mode=oauth_client_credentials`):

1. Obtain a token — `POST /oauth/token` (public, no auth required):
   ```
   POST /oauth/token
   Content-Type: application/x-www-form-urlencoded
   Authorization: Basic base64(client_id:client_secret)

   grant_type=client_credentials
   ```
   Returns: `{"access_token": "...", "token_type": "bearer", "expires_in": 3600}`

2. Use the token on the MCP endpoint:
   ```
   Authorization: Bearer <access_token>
   ```

Alternatively, pass `client_id` and `client_secret` as body parameters instead of `Authorization: Basic`. Direct `Authorization: Basic` credential validation on the MCP endpoint is also supported as a fallback (no token required).
- `ai.litellm.base_url` and `ai.litellm.api_key` — LiteLLM proxy
- `ai.ollama.base_url` and `ai.ollama.api_key` — Ollama local or remote server

See `llm/plan.md` for full architecture and implementation plan.

## Backfill

Run `backfill_embeddings` after switching embedding models or importing thoughts without vectors.

```json
{
  "project": "optional-project-name",
  "limit": 100,
  "include_archived": false,
  "older_than_days": 0,
  "dry_run": false
}
```

- `dry_run: true` — report counts without calling the embedding provider
- `limit` — max thoughts per call (default 100)
- Embeddings are generated in parallel (4 workers) and upserted; one failure does not abort the run

## Metadata Reparse

Run `reparse_thought_metadata` to fix stale or inconsistent metadata by re-extracting it from thought content.

```json
{
  "project": "optional-project-name",
  "limit": 100,
  "include_archived": false,
  "older_than_days": 0,
  "dry_run": false
}
```

- `dry_run: true` scans only and does not call metadata extraction or write updates
- If extraction fails for a thought, existing metadata is normalized and written only if it changes
- Metadata reparse runs in parallel (4 workers); one failure does not abort the run

## Failed Metadata Retry

`capture_thought` now stores the thought even when metadata extraction times out or fails. Those thoughts are marked with `metadata_status: "pending"` and retried in the background. Use `retry_failed_metadata` to sweep any thoughts still marked `pending` or `failed`.

```json
{
  "project": "optional-project-name",
  "limit": 100,
  "include_archived": false,
  "older_than_days": 1,
  "dry_run": false
}
```

- `dry_run: true` scans only and does not call metadata extraction or write updates
- successful retries mark the thought metadata as `complete` and clear the last error
- failed retries update the retry markers so the daily sweep can pick them up again later

## File Storage

Use `save_file` to persist binary files as base64. Files can optionally be linked to a memory by passing `thought_id`, which also adds an attachment reference to that thought's metadata. AI clients should prefer `save_file` when the goal is to retain the artifact itself, rather than reading or summarizing the file first. Stored files and attachment metadata are not forwarded to the metadata extraction client.

```json
{
  "name": "meeting-notes.pdf",
  "media_type": "application/pdf",
  "kind": "document",
  "thought_id": "optional-thought-uuid",
  "content_base64": "<base64-payload>"
}
```

Load a stored file again with:

```json
{
  "id": "stored-file-uuid"
}
```

List files for a thought or project with:

```json
{
  "thought_id": "optional-thought-uuid",
  "project": "optional-project-name",
  "kind": "optional-image-document-audio-file",
  "limit": 20
}
```

AMCS also supports direct authenticated HTTP uploads to `/files` for clients that want to stream file bodies instead of base64-encoding them into an MCP tool call.

The Go server caps `/files` uploads at 100 MB per request. Large uploads are still also subject to available memory, Postgres limits, and any reverse proxy or load balancer limits in front of AMCS.

Multipart upload:

```bash
curl -X POST http://localhost:8080/files \
  -H "x-brain-key: <key>" \
  -F "file=@./diagram.png" \
  -F "project=amcs" \
  -F "kind=image"
```

Raw body upload:

```bash
curl -X POST "http://localhost:8080/files?project=amcs&name=meeting-notes.pdf" \
  -H "x-brain-key: <key>" \
  -H "Content-Type: application/pdf" \
  --data-binary @./meeting-notes.pdf
```

**Automatic backfill** (optional, config-gated):

```yaml
backfill:
  enabled: true
  run_on_startup: true   # run once on server start
  interval: "15m"        # repeat every 15 minutes
  batch_size: 20
  max_per_run: 100
  include_archived: false
```

```yaml
metadata_retry:
  enabled: true
  run_on_startup: true   # retry failed metadata once on server start
  interval: "24h"        # retry pending/failed metadata daily
  max_per_run: 100
  include_archived: false
```

**Search fallback**: when no embeddings exist for the active model in scope, `search_thoughts`, `recall_context`, `get_project_context`, `summarize_thoughts`, and `related_thoughts` automatically fall back to Postgres full-text search so results are never silently empty.

## Client Setup

### Claude Code

```bash
# API key auth
claude mcp add --transport http amcs http://localhost:8080/mcp --header "x-brain-key: <key>"

# Bearer token auth
claude mcp add --transport http amcs http://localhost:8080/mcp --header "Authorization: Bearer <token>"
```

### OpenAI Codex

Add to `~/.codex/config.toml`:

```toml
[[mcp_servers]]
name = "amcs"
url  = "http://localhost:8080/mcp"

[mcp_servers.headers]
x-brain-key = "<key>"
```

### OpenCode

```bash
# API key auth
opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "x-brain-key=<key>"

# Bearer token auth
opencode mcp add --name amcs --type remote --url http://localhost:8080/mcp --header "Authorization=Bearer <token>"
```

Or add directly to `opencode.json` / `~/.config/opencode/config.json`:

```json
{
  "mcp": {
    "amcs": {
      "type": "remote",
      "url": "http://localhost:8080/mcp",
      "headers": {
        "x-brain-key": "<key>"
      }
    }
  }
}
```

## Apache Proxy

If AMCS is deployed behind Apache HTTP Server, configure the proxy explicitly for larger uploads and longer-running requests.

Example virtual host settings for the current AMCS defaults:

```apache
<VirtualHost *:443>
    ServerName amcs.example.com

    ProxyPreserveHost On
    LimitRequestBody 104857600
    RequestReadTimeout handshake=0 header=20-40,MinRate=500 body=600,MinRate=500
    Timeout 600
    ProxyTimeout 600

    ProxyPass        /mcp   http://127.0.0.1:8080/mcp   connectiontimeout=30 timeout=600
    ProxyPassReverse /mcp   http://127.0.0.1:8080/mcp

    ProxyPass        /files http://127.0.0.1:8080/files connectiontimeout=30 timeout=600
    ProxyPassReverse /files http://127.0.0.1:8080/files
</VirtualHost>
```

Recommended Apache settings:

- `LimitRequestBody 104857600` matches AMCS's 100 MB `/files` upload cap.
- `RequestReadTimeout ... body=600` gives clients up to 10 minutes to send larger request bodies.
- `ProxyTimeout 600` and `ProxyPass ... timeout=600` give Apache enough time to wait for the Go backend.
- If another proxy or load balancer sits in front of Apache, align its size and timeout settings too.

## Development

Run the SQL migrations against a local database with:

`DATABASE_URL=postgres://... make migrate`

LLM integration instructions are served at `/llm`.

## Containers

The repo now includes a `Dockerfile` and Compose files for running the app with Postgres + pgvector.

1. Set a real LiteLLM key in your shell:
   `export AMCS_LITELLM_API_KEY=your-key`
2. Start the stack with your runtime:
   `docker compose -f docker-compose.yml -f docker-compose.docker.yml up --build`
   `podman compose -f docker-compose.yml up --build`
3. Call the service on `http://localhost:8080`

Notes:

- The app uses `configs/docker.yaml` inside the container.
- The local `./configs` directory is mounted into `/app/configs`, so config edits apply without rebuilding the image.
- `AMCS_LITELLM_BASE_URL` overrides the LiteLLM endpoint, so you can retarget it without editing YAML.
- `AMCS_OLLAMA_BASE_URL` overrides the Ollama endpoint for local or remote servers.
- The Compose stack uses a default bridge network named `amcs`.
- The base Compose file uses `host.containers.internal`, which is Podman-friendly.
- The Docker override file adds `host-gateway` aliases so Docker can resolve the same host endpoint.
- Database migrations `001` through `005` run automatically when the Postgres volume is created for the first time.
- `migrations/006_rls_and_grants.sql` is intentionally skipped during container bootstrap because it contains deployment-specific grants for a role named `amcs_user`.

## Ollama

Set `ai.provider: "ollama"` to use a local or self-hosted Ollama server through its OpenAI-compatible API.

Example:

```yaml
ai:
  provider: "ollama"
  embeddings:
    model: "nomic-embed-text"
    dimensions: 768
  metadata:
    model: "llama3.2"
    temperature: 0.1
  ollama:
    base_url: "http://localhost:11434/v1"
    api_key: "ollama"
    request_headers: {}
```

Notes:

- For remote Ollama servers, point `ai.ollama.base_url` at the remote `/v1` endpoint.
- The client always sends Bearer auth; Ollama ignores it locally, so `api_key: "ollama"` is a safe default.
- `ai.embeddings.dimensions` must match the embedding model you actually use, or startup will fail the database vector-dimension check.