feat(server): add support for extra maps in adapter configuration

* Introduced ExtraMapConfig to allow multiple adapter configurations.
* Updated server and handler to utilize extra maps for routing.
* Added dashboard handler for metrics visualization.
This commit is contained in:
2026-04-11 21:43:14 +02:00
parent c12e16c9f7
commit c7a3fed6e1
10 changed files with 461 additions and 37 deletions

View File

@@ -136,6 +136,11 @@ Override with `--config path/to/file.yaml` or env vars prefixed `VECNA_`.
"truncate_mode": "from_end",
"pad_mode": "at_end"
},
"extra_maps": {
"512": { "target_dim": 512 },
"256": { "target_dim": 256, "type": "random", "seed": 42 },
"fast": { "target_dim": 768, "forward_target": "small-model" }
},
"metrics": {
"enabled": true,
"path": "/metrics",
@@ -184,6 +189,43 @@ There is no partial migration path — a mixed index produces degraded or incorr
---
## Extra maps
`extra_maps` lets you expose multiple adapter configurations on a single vecna instance. Each entry is a named `AdapterConfig` whose unset fields fall back to the global `adapter` values.
```json
"adapter": { "type": "truncate", "source_dim": 1024, "target_dim": 1536 },
"extra_maps": {
"512": { "target_dim": 512 },
"256": { "target_dim": 256, "type": "random", "seed": 42 },
"openai-alt": { "target_dim": 1536, "forward_target": "openai" }
}
```
| Route | Forwarder | Adapter |
|-------|-----------|---------|
| `POST /v1/embeddings` | global default | global `adapter` |
| `POST /map/512/v1/embeddings` | global default | `extra_maps["512"]` — target 512, rest from global |
| `POST /map/256/v1/embeddings` | global default | `extra_maps["256"]` — random projection to 256 |
| `POST /map/openai-alt/v1/embeddings` | `openai` target | `extra_maps["openai-alt"]` adapter |
All fields are overridable per map entry:
| Field | Description |
|-------|-------------|
| `forward_target` | Named target from `forward.targets`; empty = global default |
| `type` | `truncate` / `random` / `projection` |
| `source_dim` | Source dimension; falls back to global `adapter.source_dim` |
| `target_dim` | Target dimension |
| `truncate_mode` | `from_end` / `from_start` |
| `pad_mode` | `at_end` / `at_start` |
| `seed` | Seed for random projection |
| `matrix_file` | Path to projection matrix JSON |
> The same re-embedding warning applies per map — changing any setting for an `extra_maps` entry requires re-embedding all vectors indexed through that endpoint.
---
## Truncation and padding modes
### `truncate_mode` — which part of the vector is kept when downscaling
@@ -242,6 +284,18 @@ POST /v1/models/{model}:embedContent
POST /v1/models/{model}:batchEmbedContents
```
### Extra-map routes
Serve the same backing model with a different adapter per endpoint. The `{mapping}` segment matches a key in `extra_maps`.
```
POST /map/{mapping}/v1/embeddings
POST /map/{mapping}/v1/models/{model}:embedContent
POST /map/{mapping}/v1/models/{model}:batchEmbedContents
```
All extra-map routes require the same authentication as the standard API routes.
### OpenAPI spec and docs
```
@@ -263,7 +317,7 @@ GET /docs
## Prometheus metrics
Enable in config: `metrics.enabled: true`. Scrape at `GET /metrics`.
Enable in config: `metrics.enabled: true`. Scrape at `GET /metrics`. Human-readable dashboard at `GET /dashboard`.
| Metric | Type | Description |
|--------|------|-------------|
@@ -276,6 +330,12 @@ Enable in config: `metrics.enabled: true`. Scrape at `GET /metrics`.
| `vecna_endpoint_errors_total` | counter | Forwarding failures by error type |
| `vecna_tokens_total` | counter | Tokens consumed, by target, model, and type (`prompt`/`total`) |
### Dashboard
`GET /dashboard` renders a live HTML view of all metrics. Counters show request counts with status-code badges, histograms show p50/p95/p99 latencies, gauges show current endpoint priority and inflight counts.
Auth follows the same rules as `/metrics`: server `api_keys` apply, and `metrics.api_key` adds a second layer if set.
---
## Development
@@ -315,7 +375,7 @@ Starts vecna and an Ollama instance. The `vecna_config` named volume persists th
### Onboard (interactive setup)
```sh
docker compose run --rm -it vecna onboard --config /config/vecna.json
docker compose run --rm -it vecna onboard
```
Ollama is reachable by hostname on the Docker network — the scanner will find it automatically. After onboarding, restart the proxy:
@@ -327,17 +387,17 @@ docker compose restart vecna
### Query
```sh
docker compose run --rm vecna query --compact "hello world" --config /config/vecna.json
docker compose run --rm vecna query --compact "hello world"
```
### Test endpoints
```sh
# report latency and dims
docker compose run --rm vecna test --config /config/vecna.json
docker compose run --rm vecna test
# test and remove failing endpoints
docker compose run --rm vecna test --config /config/vecna.json --remove-broken
docker compose run --rm vecna test --remove-broken
```
### Edit config manually