mirror of
https://github.com/Warky-Devs/vecna.git
synced 2026-05-05 01:26:58 +00:00
feat(server): add support for extra maps in adapter configuration
* Introduced ExtraMapConfig to allow multiple adapter configurations. * Updated server and handler to utilize extra maps for routing. * Added dashboard handler for metrics visualization.
This commit is contained in:
70
README.md
70
README.md
@@ -136,6 +136,11 @@ Override with `--config path/to/file.yaml` or env vars prefixed `VECNA_`.
|
||||
"truncate_mode": "from_end",
|
||||
"pad_mode": "at_end"
|
||||
},
|
||||
"extra_maps": {
|
||||
"512": { "target_dim": 512 },
|
||||
"256": { "target_dim": 256, "type": "random", "seed": 42 },
|
||||
"fast": { "target_dim": 768, "forward_target": "small-model" }
|
||||
},
|
||||
"metrics": {
|
||||
"enabled": true,
|
||||
"path": "/metrics",
|
||||
@@ -184,6 +189,43 @@ There is no partial migration path — a mixed index produces degraded or incorr
|
||||
|
||||
---
|
||||
|
||||
## Extra maps
|
||||
|
||||
`extra_maps` lets you expose multiple adapter configurations on a single vecna instance. Each entry is a named `AdapterConfig` whose unset fields fall back to the global `adapter` values.
|
||||
|
||||
```json
|
||||
"adapter": { "type": "truncate", "source_dim": 1024, "target_dim": 1536 },
|
||||
"extra_maps": {
|
||||
"512": { "target_dim": 512 },
|
||||
"256": { "target_dim": 256, "type": "random", "seed": 42 },
|
||||
"openai-alt": { "target_dim": 1536, "forward_target": "openai" }
|
||||
}
|
||||
```
|
||||
|
||||
| Route | Forwarder | Adapter |
|
||||
|-------|-----------|---------|
|
||||
| `POST /v1/embeddings` | global default | global `adapter` |
|
||||
| `POST /map/512/v1/embeddings` | global default | `extra_maps["512"]` — target 512, rest from global |
|
||||
| `POST /map/256/v1/embeddings` | global default | `extra_maps["256"]` — random projection to 256 |
|
||||
| `POST /map/openai-alt/v1/embeddings` | `openai` target | `extra_maps["openai-alt"]` adapter |
|
||||
|
||||
All fields are overridable per map entry:
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `forward_target` | Named target from `forward.targets`; empty = global default |
|
||||
| `type` | `truncate` / `random` / `projection` |
|
||||
| `source_dim` | Source dimension; falls back to global `adapter.source_dim` |
|
||||
| `target_dim` | Target dimension |
|
||||
| `truncate_mode` | `from_end` / `from_start` |
|
||||
| `pad_mode` | `at_end` / `at_start` |
|
||||
| `seed` | Seed for random projection |
|
||||
| `matrix_file` | Path to projection matrix JSON |
|
||||
|
||||
> The same re-embedding warning applies per map — changing any setting for an `extra_maps` entry requires re-embedding all vectors indexed through that endpoint.
|
||||
|
||||
---
|
||||
|
||||
## Truncation and padding modes
|
||||
|
||||
### `truncate_mode` — which part of the vector is kept when downscaling
|
||||
@@ -242,6 +284,18 @@ POST /v1/models/{model}:embedContent
|
||||
POST /v1/models/{model}:batchEmbedContents
|
||||
```
|
||||
|
||||
### Extra-map routes
|
||||
|
||||
Serve the same backing model with a different adapter per endpoint. The `{mapping}` segment matches a key in `extra_maps`.
|
||||
|
||||
```
|
||||
POST /map/{mapping}/v1/embeddings
|
||||
POST /map/{mapping}/v1/models/{model}:embedContent
|
||||
POST /map/{mapping}/v1/models/{model}:batchEmbedContents
|
||||
```
|
||||
|
||||
All extra-map routes require the same authentication as the standard API routes.
|
||||
|
||||
### OpenAPI spec and docs
|
||||
|
||||
```
|
||||
@@ -263,7 +317,7 @@ GET /docs
|
||||
|
||||
## Prometheus metrics
|
||||
|
||||
Enable in config: `metrics.enabled: true`. Scrape at `GET /metrics`.
|
||||
Enable in config: `metrics.enabled: true`. Scrape at `GET /metrics`. Human-readable dashboard at `GET /dashboard`.
|
||||
|
||||
| Metric | Type | Description |
|
||||
|--------|------|-------------|
|
||||
@@ -276,6 +330,12 @@ Enable in config: `metrics.enabled: true`. Scrape at `GET /metrics`.
|
||||
| `vecna_endpoint_errors_total` | counter | Forwarding failures by error type |
|
||||
| `vecna_tokens_total` | counter | Tokens consumed, by target, model, and type (`prompt`/`total`) |
|
||||
|
||||
### Dashboard
|
||||
|
||||
`GET /dashboard` renders a live HTML view of all metrics. Counters show request counts with status-code badges, histograms show p50/p95/p99 latencies, gauges show current endpoint priority and inflight counts.
|
||||
|
||||
Auth follows the same rules as `/metrics`: server `api_keys` apply, and `metrics.api_key` adds a second layer if set.
|
||||
|
||||
---
|
||||
|
||||
## Development
|
||||
@@ -315,7 +375,7 @@ Starts vecna and an Ollama instance. The `vecna_config` named volume persists th
|
||||
### Onboard (interactive setup)
|
||||
|
||||
```sh
|
||||
docker compose run --rm -it vecna onboard --config /config/vecna.json
|
||||
docker compose run --rm -it vecna onboard
|
||||
```
|
||||
|
||||
Ollama is reachable by hostname on the Docker network — the scanner will find it automatically. After onboarding, restart the proxy:
|
||||
@@ -327,17 +387,17 @@ docker compose restart vecna
|
||||
### Query
|
||||
|
||||
```sh
|
||||
docker compose run --rm vecna query --compact "hello world" --config /config/vecna.json
|
||||
docker compose run --rm vecna query --compact "hello world"
|
||||
```
|
||||
|
||||
### Test endpoints
|
||||
|
||||
```sh
|
||||
# report latency and dims
|
||||
docker compose run --rm vecna test --config /config/vecna.json
|
||||
docker compose run --rm vecna test
|
||||
|
||||
# test and remove failing endpoints
|
||||
docker compose run --rm vecna test --config /config/vecna.json --remove-broken
|
||||
docker compose run --rm vecna test --remove-broken
|
||||
```
|
||||
|
||||
### Edit config manually
|
||||
|
||||
Reference in New Issue
Block a user