Embeddings pipeline

Module: packages/knowledge/src/embeddings/ Queue: embeddings (BullMQ). Default model: text-embedding-3-small (1536d, OpenAI). Privacy guard: OPENAI_ZDR_CONFIRMED=true required in production (runbook).

Lifecycle

[ MCP store_knowledge ]──INSERT─▶[ knowledge_entries (embedding=NULL) ]
            │
            └─enqueueEmbedding({ knowledgeId, orgId })──▶[ BullMQ embeddings queue ]
                                                                     │
                                                                     ▼
                                       [ worker: processEmbeddingJob ]
                                          1. SELECT row by id
                                          2. buildEmbeddingInput(title, tags, body)
                                          3. provider.embed([input]) — OpenAI / mock / Ollama
                                          4. UPDATE embedding + embedding_model
                                          5. Redis counters (best-effort)

Provider interface

provider.ts defines:

interface EmbeddingProvider {
  name: string; // persisted on the row as embedding_model
  dims: number; // must match knowledge_entries.embedding column type
  embed(texts: string[]): Promise<number[][]>;
}

Three implementations ship today:

name	use	failure modes
`openAIEmbeddingProvider()`	production (default)	rate-limit, 5xx, ZDR not on
`createMockEmbeddingProvider()`	tests; deterministic SHA-256 → unit vector	`failTimes`, `alwaysFail` knobs
_(Ollama, future)_	Pro tier "data stays in EU"	not implemented in Phase 0

Each row stores the provider's name in embedding_model, so we can grep for stale embeddings after a model migration (embedding_model='text-embedding-3-small' vs 'v2').

Queue defaults

setting	value	reason
`attempts`	3	OpenAI 429s are typically transient
`backoff`	exponential, base 10 s	first retry at 10 s, then 20 s, then 40 s
`jobId`	`knowledgeId`	idempotency — a repeat enqueue coalesces
`removeOnComplete`	`{ count: 500 }`	keep recent history for forensic
`removeOnFail`	`{ age: 7 * 24 * 3600 }`	7 days of failures for triage
`concurrency` (worker)	4	OpenAI rate-limit comfortable; tune later if needed

Cost / volume metrics

After every successful job the worker increments:

key	meaning
`embeddings:generated:total`	Lifetime count of vectors written.
`embeddings:tokens:month:YYYY-MM`	Approx tokens charged this calendar month (chars/4).

Errors during the metric write are warning-level only — they never fail the job.

Input shape + truncation

buildEmbeddingInput({ title, body, tags }) returns:

${title}\n\n${tags.join(' ')}\n\n${body}

Trimmed to MAX_INPUT_CHARS = 24 000 (well inside the 8 192-token OpenAI cap). Char-based cap is fine for English/Czech; if we ever store CJK-heavy content we'll swap to a real tokeniser.

Boot guard

assertOpenAIZdrConfirmed() throws in NODE_ENV=production unless OPENAI_ZDR_CONFIRMED=true. The MCP entry calls it before constructing the OpenAI provider. logOpenAIBootStatus() prints a one-line confirmation/rejection so ops can grep the boot log.

When OPENAI_EUDR_ENABLED=true, the OpenAI client points at https://eu.api.openai.com/v1 — available only for Enterprise OpenAI tier. Not enabled in Phase 0.

What this module does NOT do

Search — task 20 layers pgvector cosine + tsvector hybrid on top.
Re-embedding on model change — there's a runbook for it (Phase 9). Today, manual

job: bump EMBEDDING_MODEL, run a sweep script that re-enqueues every row.

DLQ wiring — BullMQ already retains failures for 7 days; an explicit DLQ queue lands

with the verification engine (task 32) where retry semantics are richer.