feat(server): add support for extra maps in adapter configuration

* Introduced ExtraMapConfig to allow multiple adapter configurations. * Updated server and handler to utilize extra maps for routing. * Added dashboard handler for metrics visualization.
2026-05-05 01:26:58 +00:00 · 2026-04-11 21:43:14 +02:00
parent c12e16c9f7
commit c7a3fed6e1
10 changed files with 461 additions and 37 deletions
--- a/README.md
+++ b/README.md
@@ -136,6 +136,11 @@ Override with `--config path/to/file.yaml` or env vars prefixed `VECNA_`.
    "truncate_mode": "from_end",
    "pad_mode": "at_end"
  },
+  "extra_maps": {
+    "512":  { "target_dim": 512 },
+    "256":  { "target_dim": 256, "type": "random", "seed": 42 },
+    "fast": { "target_dim": 768, "forward_target": "small-model" }
+  },
  "metrics": {
    "enabled": true,
    "path": "/metrics",
@@ -184,6 +189,43 @@ There is no partial migration path — a mixed index produces degraded or incorr

 ---

+## Extra maps
+
+`extra_maps` lets you expose multiple adapter configurations on a single vecna instance. Each entry is a named `AdapterConfig` whose unset fields fall back to the global `adapter` values.
+
+```json
+"adapter": { "type": "truncate", "source_dim": 1024, "target_dim": 1536 },
+"extra_maps": {
+  "512":        { "target_dim": 512 },
+  "256":        { "target_dim": 256, "type": "random", "seed": 42 },
+  "openai-alt": { "target_dim": 1536, "forward_target": "openai" }
+}
+```
+
+| Route | Forwarder | Adapter |
+|-------|-----------|---------|
+| `POST /v1/embeddings` | global default | global `adapter` |
+| `POST /map/512/v1/embeddings` | global default | `extra_maps["512"]` — target 512, rest from global |
+| `POST /map/256/v1/embeddings` | global default | `extra_maps["256"]` — random projection to 256 |
+| `POST /map/openai-alt/v1/embeddings` | `openai` target | `extra_maps["openai-alt"]` adapter |
+
+All fields are overridable per map entry:
+
+| Field | Description |
+|-------|-------------|
+| `forward_target` | Named target from `forward.targets`; empty = global default |
+| `type` | `truncate` / `random` / `projection` |
+| `source_dim` | Source dimension; falls back to global `adapter.source_dim` |
+| `target_dim` | Target dimension |
+| `truncate_mode` | `from_end` / `from_start` |
+| `pad_mode` | `at_end` / `at_start` |
+| `seed` | Seed for random projection |
+| `matrix_file` | Path to projection matrix JSON |
+
+> The same re-embedding warning applies per map — changing any setting for an `extra_maps` entry requires re-embedding all vectors indexed through that endpoint.
+
+---
+
 ## Truncation and padding modes

 ### `truncate_mode` — which part of the vector is kept when downscaling
@@ -242,6 +284,18 @@ POST /v1/models/{model}:embedContent
 POST /v1/models/{model}:batchEmbedContents
 ```

+### Extra-map routes
+
+Serve the same backing model with a different adapter per endpoint. The `{mapping}` segment matches a key in `extra_maps`.
+
+```
+POST /map/{mapping}/v1/embeddings
+POST /map/{mapping}/v1/models/{model}:embedContent
+POST /map/{mapping}/v1/models/{model}:batchEmbedContents
+```
+
+All extra-map routes require the same authentication as the standard API routes.
+
 ### OpenAPI spec and docs

 ```
@@ -263,7 +317,7 @@ GET /docs

 ## Prometheus metrics

-Enable in config: `metrics.enabled: true`. Scrape at `GET /metrics`.
+Enable in config: `metrics.enabled: true`. Scrape at `GET /metrics`. Human-readable dashboard at `GET /dashboard`.

 | Metric | Type | Description |
 |--------|------|-------------|
@@ -276,6 +330,12 @@ Enable in config: `metrics.enabled: true`. Scrape at `GET /metrics`.
 | `vecna_endpoint_errors_total` | counter | Forwarding failures by error type |
 | `vecna_tokens_total` | counter | Tokens consumed, by target, model, and type (`prompt`/`total`) |

+### Dashboard
+
+`GET /dashboard` renders a live HTML view of all metrics. Counters show request counts with status-code badges, histograms show p50/p95/p99 latencies, gauges show current endpoint priority and inflight counts.
+
+Auth follows the same rules as `/metrics`: server `api_keys` apply, and `metrics.api_key` adds a second layer if set.
+
 ---

 ## Development
@@ -315,7 +375,7 @@ Starts vecna and an Ollama instance. The `vecna_config` named volume persists th
 ### Onboard (interactive setup)

 ```sh
-docker compose run --rm -it vecna onboard --config /config/vecna.json
+docker compose run --rm -it vecna onboard
 ```

 Ollama is reachable by hostname on the Docker network — the scanner will find it automatically. After onboarding, restart the proxy:
@@ -327,17 +387,17 @@ docker compose restart vecna
 ### Query

 ```sh
-docker compose run --rm vecna query --compact "hello world" --config /config/vecna.json
+docker compose run --rm vecna query --compact "hello world"
 ```

 ### Test endpoints

 ```sh
 # report latency and dims
-docker compose run --rm vecna test --config /config/vecna.json
+docker compose run --rm vecna test

 # test and remove failing endpoints
-docker compose run --rm vecna test --config /config/vecna.json --remove-broken
+docker compose run --rm vecna test --remove-broken
 ```

 ### Edit config manually