RAG

Vector retrieval framework for retrieval-augmented generation. Built-in drivers for pgvector (PostgreSQL), in-memory (testing), and null (disabled). Custom drivers can be added via extend(). The retrievable() mixin integrates vector search directly into your ORM models — chunk, embed, and store on save; remove on delete; semantic search with a single static method. Uses @strav/brain for embedding generation and @strav/database for pgvector storage.

Installation

bun add @strav/rag
bun strav install rag

The install command copies config/rag.ts into your project. The file is yours to edit.

Setup

1. Register RagManager

Using a service provider (recommended)

import { RagProvider } from '@strav/rag'

app.use(new RagProvider())

The RagProvider registers RagManager as a singleton. It depends on the config provider. Make sure BrainProvider and DatabaseProvider are also registered (for embeddings and pgvector storage respectively).

Manual setup

import RagManager from '@strav/rag'

app.singleton(RagManager)
app.resolve(RagManager)

2. Configure

Edit config/rag.ts:

import { env } from '@strav/kernel'

export default {
  default: env('RAG_DRIVER', 'pgvector'),
  prefix: env('RAG_PREFIX', ''),

  embedding: {
    provider: env('RAG_EMBEDDING_PROVIDER', 'openai'),
    model: env('RAG_EMBEDDING_MODEL', 'text-embedding-3-small'),
    dimension: 1536,
  },

  chunking: {
    strategy: 'recursive',
    chunkSize: 512,
    overlap: 64,
  },

  stores: {
    pgvector: {
      driver: 'pgvector',
    },

    memory: {
      driver: 'memory',
    },

    null: {
      driver: 'null',
    },
  },
}

3. Set environment variables

RAG_DRIVER=pgvector
RAG_EMBEDDING_PROVIDER=openai
RAG_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...

The embedding provider must be configured in config/ai.ts (via @strav/brain). The RAG package calls brain.embed() under the hood.

4. Enable pgvector (if using PostgreSQL driver)

The pgvector extension must be available in your PostgreSQL installation. The driver creates the extension and table automatically on first use.

-- If not already enabled at the database level:
CREATE EXTENSION IF NOT EXISTS vector;

Retrievable mixin

Add vector search to any model with the retrievable() mixin:

import { BaseModel } from '@strav/database'
import { retrievable } from '@strav/rag'

class Article extends retrievable(BaseModel) {
  declare id: number
  declare title: string
  declare body: string
  declare status: string

  static tableName = 'articles'

  static retrievableAs() {
    return 'articles'
  }

  toRetrievableContent() {
    return ${this.title}\n\n${this.body}
  }

  toRetrievableMetadata() {
    return { source: 'articles', authority: 0.8 }
  }

  shouldBeRetrievable() {
    return this.status === 'published'
  }
}

Works with compose() for multiple mixins:

import { compose } from '@strav/kernel'
import { searchable } from '@strav/search'

class Article extends compose(BaseModel, searchable, retrievable) {
  // Full-text search AND vector retrieval on the same model
}

retrievableAs

Returns the collection name. Defaults to the model's tableName. Override to customize:

static retrievableAs() {
  return 'knowledge_base'
}

toRetrievableContent

Returns the text content to embed. This is the text that gets chunked and turned into vectors. By default, concatenates all own string properties that don't start with _. Override to control what gets embedded:

toRetrievableContent() {
  return ${this.title}\n\n${this.body}
}

toRetrievableMetadata

Returns extra metadata stored alongside each vector. This metadata can be used for filtering during retrieval, and for reranking (e.g., authority and createdAt):

toRetrievableMetadata() {
  return {
    source: 'regulations',
    authority: 0.95,
    domain: 'legal',
    createdAt: this.createdAt.toISOString(),
  }
}

shouldBeRetrievable

Controls whether a specific instance should be vectorized. Defaults to true. Override to conditionally exclude records:

shouldBeRetrievable() {
  return this.status === 'published'
}

Chunking

Content is split into chunks before embedding. Each chunk becomes a separate vector in the store. The RAG package ships two chunking strategies:

Recursive (default)

Splits by separators (\n\n, \n, . , ) recursively until each piece fits within chunkSize. This preserves paragraph and sentence boundaries when possible. Falls back to character splitting for very long unbroken text.

Fixed-size

Splits by raw character count with overlap. Simpler and more predictable, but doesn't respect content structure.

Configure in config/rag.ts:

chunking: {
  strategy: 'recursive',  // 'recursive' or 'fixed'
  chunkSize: 512,          // max characters per chunk
  overlap: 64,             // characters of overlap between chunks
}

Custom chunker

Implement the Chunker interface:

import type { Chunker, Chunk } from '@strav/rag'

class SentenceChunker implements Chunker {
  chunk(content: string): Chunk[] {
    const sentences = content.match(/[^.!?]+[.!?]+/g) ?? [content]
    let offset = 0
    return sentences.map((s, i) => {
      const chunk = {
        content: s.trim(),
        index: i,
        startOffset: offset,
        endOffset: offset + s.length,
      }
      offset += s.length
      return chunk
    })
  }
}

Vectorizing

Instance methods

const article = await Article.find(1)

// Chunk, embed, and store in the vector store
await article.vectorize()

// Remove all chunks for this instance from the vector store
await article.vectorRemove()

vectorize() first removes any existing chunks for the model instance (via deleteBySource), then creates fresh chunks. This makes it safe to call on updates — it's a full re-vectorization.

Bulk import

Import all records from the database into the vector store:

const count = await Article.importAll()     // default batch size: 100
const count = await Article.importAll(500)  // custom batch size

This fetches rows from the database in batches, calls vectorize() on each, and returns the total number of records processed.

Collection management

// Create the vector collection (with configured dimension)
await Article.createVectorCollection()

// Flush all vectors (remove documents, keep collection)
await Article.flushVectors()

Retrieving

From a model

const result = await Article.retrieve('What is dependency injection?')

The result object:

{
  matches: [
    {
      id: '42_0',
      content: 'Dependency injection is a design pattern...',
      score: 0.92,
      similarity: 0.92,
      metadata: { source: 'articles', authority: 0.8, chunkIndex: 0 },
    },
    // ...
  ],
  query: 'What is dependency injection?',
  processingTimeMs: 45,
}

Retrieve options

const result = await Article.retrieve('PSD3 compliance requirements', {
  topK: 10,
  threshold: 0.7,
  filter: { domain: 'legal' },
  rerank: {
    similarityWeight: 0.6,
    authorityWeight: 0.2,
    recencyWeight: 0.2,
  },
})

Reranking

When rerank options are provided, the pipeline computes a composite score:

finalScore = similarity × similarityWeight
           + authority  × authorityWeight
           + recency    × recencyWeight

similarity — cosine similarity from the vector store (0–1)
authority — metadata.authority value (0–1). Set via toRetrievableMetadata()
recency — time decay: 1 / (1 + ageDays/30). Uses metadata.createdAt

Results are re-sorted by composite score. This is designed for the Knowledge Currency Validator, which needs to rank grounding sources by both relevance and freshness.

Auto-vectorizing

Article.bootRetrieval('article')

This hooks into events emitted by generated services (article.created, article.updated, article.synced, article.deleted). Call bootRetrieval() once during app bootstrap.

Vectorization failures are silently caught — they should not break the event pipeline.

rag helper

The rag helper provides a standalone API for working with vector stores directly, without going through a model:

import { rag } from '@strav/rag'

Ingesting content

Chunk, embed, and store content in one call:

const ids = await rag.ingest('knowledge_base', longDocument, {
  metadata: { source: 'wiki', authority: 0.9, createdAt: new Date().toISOString() },
  sourceId: 'doc-42',
})
// Returns array of generated chunk IDs

Per-call chunking overrides:

const ids = await rag.ingest('regulations', document, {
  chunkSize: 256,
  overlap: 32,
  strategy: 'fixed',
  metadata: { domain: 'regulatory', authority: 0.95 },
})

Retrieving content

const result = await rag.retrieve('What changed in PSD3?', {
  collection: 'regulations',
  topK: 5,
  threshold: 0.7,
  filter: { domain: 'regulatory' },
  rerank: {
    similarityWeight: 0.5,
    authorityWeight: 0.3,
    recencyWeight: 0.2,
  },
})

Removing content

// Remove specific chunks by ID
await rag.delete('knowledge_base', ['id-1', 'id-2'])

// Remove all chunks from a source (e.g., all chunks from document 42)
await rag.deleteBySource('knowledge_base', 'doc-42')

// Clear an entire collection
await rag.flush('knowledge_base')

Using the store directly

const store = rag.store()          // default store
const store = rag.store('memory')  // named store

// Low-level operations
await store.createCollection('my_index', 1536)
await store.upsert('my_index', documents)
const result = await store.query('my_index', queryVector, { topK: 5 })
await store.deleteCollection('my_index')

Collection prefix

The prefix config option prepends a string to all collection names. Useful for multi-environment setups:

RAG_PREFIX=dev_

With this config, Article.retrievableAs() returning 'articles' resolves to the collection dev_articles.

Multiple stores

You can configure and use multiple stores simultaneously:

// Use a named store
const memStore = RagManager.store('memory')

// Or via the helper
rag.store('pgvector')

Custom driver

import { rag } from '@strav/rag'
import type { VectorStore } from '@strav/rag'

rag.extend('pinecone', (config) => {
  return new PineconeDriver(config)
})

The factory receives the driver's config object from config/rag.ts. The returned object must implement the VectorStore interface:

interface VectorStore {
  readonly name: string
  createCollection(collection: string, dimension: number): Promise
  deleteCollection(collection: string): Promise
  upsert(collection: string, documents: VectorDocument[]): Promise
  delete(collection: string, ids: (string | number)[]): Promise
  deleteBySource(collection: string, sourceId: string | number): Promise
  flush(collection: string): Promise
  query(collection: string, vector: number[], options?: QueryOptions): Promise
}

Then set it as the driver in your config:

// config/rag.ts
export default {
  default: 'pinecone',
  stores: {
    pinecone: {
      driver: 'pinecone',
      apiKey: env('PINECONE_API_KEY', ''),
      environment: env('PINECONE_ENV', 'us-east-1-aws'),
    },
  },
}

CLI commands

The rag package provides two CLI commands (auto-discovered by the framework):

rag:ingest

Vectorize all records for a model into the vector store:

bun strav rag:ingest app/models/article.ts
bun strav rag:ingest app/models/article.ts --chunk 50

Options:

--chunk — Records per batch (default: 100).

rag:flush

Remove all vectors from a model's collection:

bun strav rag:flush app/models/article.ts

Testing

Use the NullDriver to disable RAG in tests:

.env.test
RAG_DRIVER=null

Or use the MemoryDriver for in-memory vector search (useful for integration tests that need actual similarity results):

RAG_DRIVER=memory

Swap stores at runtime:

import RagManager, { MemoryDriver } from '@strav/rag'

RagManager.useStore(new MemoryDriver())

Call RagManager.reset() in test teardown to clear cached stores.

Drivers

pgvector

The default driver. Uses PostgreSQL with the pgvector extension. Stores all collections in a single _strav_vectors table with a collection discriminator column. Table schema (created automatically on first createCollection() call):

CREATE TABLE _strav_vectors (
  id BIGSERIAL PRIMARY KEY,
  collection VARCHAR(255) NOT NULL,
  source_id VARCHAR(255),
  content TEXT NOT NULL,
  metadata JSONB DEFAULT '{}',
  embedding vector(1536),
  created_at TIMESTAMPTZ DEFAULT NOW()  -- PostgreSQL syntax
);

Similarity is computed via the <=> operator (cosine distance). Per-collection HNSW indexes are created for fast approximate nearest neighbor search. Requires: PostgreSQL with pgvector extension installed.

Memory

In-memory vector store that computes cosine similarity in pure JavaScript. Documents are stored in Map. Ideal for testing — no external dependencies required. The MemoryDriver exposes a getCollection() method for test assertions:

const driver = new MemoryDriver()
await driver.upsert('test', documents)
const stored = driver.getCollection('test')
expect(stored.length).toBe(3)

Null

No-op driver. All writes are discarded, queries return empty results. Useful for disabling RAG in specific environments without code changes.

Architecture

The package follows the same Manager + Driver + Mixin + Helper architecture as @strav/search:

RagProvider          →  registers RagManager singleton
RagManager           →  config loading, driver resolution, collection naming
VectorStore          →  interface implemented by drivers
rag helper           →  convenience API (ingest, retrieve, delete, flush)
retrievable() mixin  →  ORM integration (vectorize, vectorRemove, importAll)

1:N chunk handling

Unlike searchable() (1 model = 1 search document), retrievable() produces 1 model = N chunks = N vectors. The deleteBySource() method on VectorStore handles cleanup — it removes all chunks sharing the same sourceId (the model's primary key). When vectorize() is called:

Delete all existing chunks for this model ID

Chunk the content using the configured strategy

Batch embed all chunks via brain.embed()

Upsert all chunk vectors with sourceId set to the model's primary key

This makes re-vectorization on update idempotent and clean.