RAG
Vector retrieval framework for retrieval-augmented generation. Built-in drivers for pgvector (PostgreSQL), in-memory (testing), and null (disabled). Custom drivers can be added viaextend().
The retrievable() mixin integrates vector search directly into your ORM models — chunk, embed, and store on save; remove on delete; semantic search with a single static method.
Uses @strav/brain for embedding generation and @strav/database for pgvector storage.
Installation
bun add @strav/rag
bun strav install rag
The install command copies config/rag.ts into your project. The file is yours to edit.
Setup
1. Register RagManager
Using a service provider (recommended)
import { RagProvider } from '@strav/rag'
app.use(new RagProvider())
The RagProvider registers RagManager as a singleton. It depends on the config provider. Make sure BrainProvider and DatabaseProvider are also registered (for embeddings and pgvector storage respectively).
Manual setup
import RagManager from '@strav/rag'
app.singleton(RagManager)
app.resolve(RagManager)
2. Configure
Editconfig/rag.ts:
import { env } from '@strav/kernel'
export default {
default: env('RAG_DRIVER', 'pgvector'),
prefix: env('RAG_PREFIX', ''),
embedding: {
provider: env('RAG_EMBEDDING_PROVIDER', 'openai'),
model: env('RAG_EMBEDDING_MODEL', 'text-embedding-3-small'),
dimension: 1536,
},
chunking: {
strategy: 'recursive',
chunkSize: 512,
overlap: 64,
},
stores: {
pgvector: {
driver: 'pgvector',
},
memory: {
driver: 'memory',
},
null: {
driver: 'null',
},
},
}
3. Set environment variables
RAG_DRIVER=pgvector
RAG_EMBEDDING_PROVIDER=openai
RAG_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=sk-...
The embedding provider must be configured in config/ai.ts (via @strav/brain). The RAG package calls brain.embed() under the hood.
4. Enable pgvector (if using PostgreSQL driver)
The pgvector extension must be available in your PostgreSQL installation. The driver creates the extension and table automatically on first use.
-- If not already enabled at the database level:
CREATE EXTENSION IF NOT EXISTS vector;
Retrievable mixin
Add vector search to any model with theretrievable() mixin:
import { BaseModel } from '@strav/database'
import { retrievable } from '@strav/rag'
class Article extends retrievable(BaseModel) {
declare id: number
declare title: string
declare body: string
declare status: string
static tableName = 'articles'
static retrievableAs() {
return 'articles'
}
toRetrievableContent() {
return ${this.title}\n\n${this.body}
}
toRetrievableMetadata() {
return { source: 'articles', authority: 0.8 }
}
shouldBeRetrievable() {
return this.status === 'published'
}
}
Works with compose() for multiple mixins:
import { compose } from '@strav/kernel'
import { searchable } from '@strav/search'
class Article extends compose(BaseModel, searchable, retrievable) {
// Full-text search AND vector retrieval on the same model
}
retrievableAs
Returns the collection name. Defaults to the model'stableName. Override to customize:
static retrievableAs() {
return 'knowledge_base'
}
toRetrievableContent
Returns the text content to embed. This is the text that gets chunked and turned into vectors. By default, concatenates all own string properties that don't start with_. Override to control what gets embedded:
toRetrievableContent() {
return ${this.title}\n\n${this.body}
}
toRetrievableMetadata
Returns extra metadata stored alongside each vector. This metadata can be used for filtering during retrieval, and for reranking (e.g.,authority and createdAt):
toRetrievableMetadata() {
return {
source: 'regulations',
authority: 0.95,
domain: 'legal',
createdAt: this.createdAt.toISOString(),
}
}
shouldBeRetrievable
Controls whether a specific instance should be vectorized. Defaults totrue. Override to conditionally exclude records:
shouldBeRetrievable() {
return this.status === 'published'
}
Chunking
Content is split into chunks before embedding. Each chunk becomes a separate vector in the store. The RAG package ships two chunking strategies:
Recursive (default)
Splits by separators (\n\n, \n, . , ) recursively until each piece fits within chunkSize. This preserves paragraph and sentence boundaries when possible. Falls back to character splitting for very long unbroken text.
Fixed-size
Splits by raw character count with overlap. Simpler and more predictable, but doesn't respect content structure.
Configure inconfig/rag.ts:
chunking: {
strategy: 'recursive', // 'recursive' or 'fixed'
chunkSize: 512, // max characters per chunk
overlap: 64, // characters of overlap between chunks
}
Custom chunker
Implement theChunker interface:
import type { Chunker, Chunk } from '@strav/rag'
class SentenceChunker implements Chunker {
chunk(content: string): Chunk[] {
const sentences = content.match(/[^.!?]+[.!?]+/g) ?? [content]
let offset = 0
return sentences.map((s, i) => {
const chunk = {
content: s.trim(),
index: i,
startOffset: offset,
endOffset: offset + s.length,
}
offset += s.length
return chunk
})
}
}
Vectorizing
Instance methods
const article = await Article.find(1)
// Chunk, embed, and store in the vector store
await article.vectorize()
// Remove all chunks for this instance from the vector store
await article.vectorRemove()
vectorize() first removes any existing chunks for the model instance (via deleteBySource), then creates fresh chunks. This makes it safe to call on updates — it's a full re-vectorization.
Bulk import
Import all records from the database into the vector store:
const count = await Article.importAll() // default batch size: 100
const count = await Article.importAll(500) // custom batch size
This fetches rows from the database in batches, calls vectorize() on each, and returns the total number of records processed.
Collection management
// Create the vector collection (with configured dimension)
await Article.createVectorCollection()
// Flush all vectors (remove documents, keep collection)
await Article.flushVectors()
Retrieving
From a model
const result = await Article.retrieve('What is dependency injection?')
The result object:
{
matches: [
{
id: '42_0',
content: 'Dependency injection is a design pattern...',
score: 0.92,
similarity: 0.92,
metadata: { source: 'articles', authority: 0.8, chunkIndex: 0 },
},
// ...
],
query: 'What is dependency injection?',
processingTimeMs: 45,
}
Retrieve options
const result = await Article.retrieve('PSD3 compliance requirements', {
topK: 10,
threshold: 0.7,
filter: { domain: 'legal' },
rerank: {
similarityWeight: 0.6,
authorityWeight: 0.2,
recencyWeight: 0.2,
},
})
Reranking
Whenrerank options are provided, the pipeline computes a composite score:
finalScore = similarity × similarityWeight
+ authority × authorityWeight
+ recency × recencyWeight
- similarity — cosine similarity from the vector store (0–1)
- authority —
metadata.authorityvalue (0–1). Set viatoRetrievableMetadata() - recency — time decay:
1 / (1 + ageDays/30). Usesmetadata.createdAt
Results are re-sorted by composite score. This is designed for the Knowledge Currency Validator, which needs to rank grounding sources by both relevance and freshness.
Auto-vectorizing
Register event listeners so models are automatically vectorized on create/update and removed on delete:
Article.bootRetrieval('article')
This hooks into events emitted by generated services (article.created, article.updated, article.synced, article.deleted). Call bootRetrieval() once during app bootstrap.
Vectorization failures are silently caught — they should not break the event pipeline.
rag helper
Therag helper provides a standalone API for working with vector stores directly, without going through a model:
import { rag } from '@strav/rag'
Ingesting content
Chunk, embed, and store content in one call:
const ids = await rag.ingest('knowledge_base', longDocument, {
metadata: { source: 'wiki', authority: 0.9, createdAt: new Date().toISOString() },
sourceId: 'doc-42',
})
// Returns array of generated chunk IDs
Per-call chunking overrides:
const ids = await rag.ingest('regulations', document, {
chunkSize: 256,
overlap: 32,
strategy: 'fixed',
metadata: { domain: 'regulatory', authority: 0.95 },
})
Retrieving content
const result = await rag.retrieve('What changed in PSD3?', {
collection: 'regulations',
topK: 5,
threshold: 0.7,
filter: { domain: 'regulatory' },
rerank: {
similarityWeight: 0.5,
authorityWeight: 0.3,
recencyWeight: 0.2,
},
})
Removing content
// Remove specific chunks by ID
await rag.delete('knowledge_base', ['id-1', 'id-2'])
// Remove all chunks from a source (e.g., all chunks from document 42)
await rag.deleteBySource('knowledge_base', 'doc-42')
// Clear an entire collection
await rag.flush('knowledge_base')
Using the store directly
const store = rag.store() // default store
const store = rag.store('memory') // named store
// Low-level operations
await store.createCollection('my_index', 1536)
await store.upsert('my_index', documents)
const result = await store.query('my_index', queryVector, { topK: 5 })
await store.deleteCollection('my_index')
Collection prefix
Theprefix config option prepends a string to all collection names. Useful for multi-environment setups:
RAG_PREFIX=dev_
With this config, Article.retrievableAs() returning 'articles' resolves to the collection dev_articles.
Multiple stores
You can configure and use multiple stores simultaneously:
// Use a named store
const memStore = RagManager.store('memory')
// Or via the helper
rag.store('pgvector')
Custom driver
Register a custom vector store driver withextend():
import { rag } from '@strav/rag'
import type { VectorStore } from '@strav/rag'
rag.extend('pinecone', (config) => {
return new PineconeDriver(config)
})
The factory receives the driver's config object from config/rag.ts. The returned object must implement the VectorStore interface:
interface VectorStore {
readonly name: string
createCollection(collection: string, dimension: number): Promise
deleteCollection(collection: string): Promise
upsert(collection: string, documents: VectorDocument[]): Promise
delete(collection: string, ids: (string | number)[]): Promise
deleteBySource(collection: string, sourceId: string | number): Promise
flush(collection: string): Promise
query(collection: string, vector: number[], options?: QueryOptions): Promise
}
Then set it as the driver in your config:
// config/rag.ts
export default {
default: 'pinecone',
stores: {
pinecone: {
driver: 'pinecone',
apiKey: env('PINECONE_API_KEY', ''),
environment: env('PINECONE_ENV', 'us-east-1-aws'),
},
},
}
CLI commands
The rag package provides two CLI commands (auto-discovered by the framework):
rag:ingest
Vectorize all records for a model into the vector store:
bun strav rag:ingest app/models/article.ts
bun strav rag:ingest app/models/article.ts --chunk 50
Options:
--chunk— Records per batch (default:100).
rag:flush
Remove all vectors from a model's collection:
bun strav rag:flush app/models/article.ts
Testing
Use theNullDriver to disable RAG in tests:
.env.test
RAG_DRIVER=null
Or use the MemoryDriver for in-memory vector search (useful for integration tests that need actual similarity results):
RAG_DRIVER=memory
Swap stores at runtime:
import RagManager, { MemoryDriver } from '@strav/rag'
RagManager.useStore(new MemoryDriver())
Call RagManager.reset() in test teardown to clear cached stores.
Drivers
pgvector
The default driver. Uses PostgreSQL with the pgvector extension. Stores all collections in a single_strav_vectors table with a collection discriminator column.
Table schema (created automatically on first createCollection() call):
CREATE TABLE _strav_vectors (
id BIGSERIAL PRIMARY KEY,
collection VARCHAR(255) NOT NULL,
source_id VARCHAR(255),
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding vector(1536),
created_at TIMESTAMPTZ DEFAULT NOW() -- PostgreSQL syntax
);
Similarity is computed via the <=> operator (cosine distance). Per-collection HNSW indexes are created for fast approximate nearest neighbor search.
Requires: PostgreSQL with pgvector extension installed.
Memory
In-memory vector store that computes cosine similarity in pure JavaScript. Documents are stored inMap. Ideal for testing — no external dependencies required.
The MemoryDriver exposes a getCollection() method for test assertions:
const driver = new MemoryDriver()
await driver.upsert('test', documents)
const stored = driver.getCollection('test')
expect(stored.length).toBe(3)
Null
No-op driver. All writes are discarded, queries return empty results. Useful for disabling RAG in specific environments without code changes.
Architecture
The package follows the same Manager + Driver + Mixin + Helper architecture as@strav/search:
RagProvider → registers RagManager singleton
RagManager → config loading, driver resolution, collection naming
VectorStore → interface implemented by drivers
rag helper → convenience API (ingest, retrieve, delete, flush)
retrievable() mixin → ORM integration (vectorize, vectorRemove, importAll)
1:N chunk handling
Unlikesearchable() (1 model = 1 search document), retrievable() produces 1 model = N chunks = N vectors. The deleteBySource() method on VectorStore handles cleanup — it removes all chunks sharing the same sourceId (the model's primary key).
When vectorize() is called:
brain.embed()sourceId set to the model's primary keyThis makes re-vectorization on update idempotent and clean.