pgEdge RAG Server

The pgEdge RAG server is a simple API server for performing Retrieval-Augmented Generation (RAG) of text based on content from a Postgres database using pgvector.

What is RAG?

Retrieval-Augmented Generation combines information retrieval with generative AI to produce accurate, grounded responses. Instead of relying solely on an LLM's training data, RAG:

retrieves relevant documents from a knowledge base.
provides those documents as context to the LLM.
generates an answer based on the retrieved information.

This approach reduces hallucinations and keeps responses current with your data.

The RAG server supports the following providers:

Provider	Embedding Support	Completion Support
`openai`	Yes	Yes
`anthropic`	No*	Yes
`voyage`	Yes	No
`ollama`	Yes	Yes

A RAG server is ideal when you have a well-defined use case with predictable query patterns. Consider using a RAG server when:

users will ask predictable questions of your application about your products, documentation, or support knowledge base.
you need to maintain strict control over the specific data that users can access. RAG defines the searchable corpus, and you define the retrieval logic.
performance and cost are critical. A RAG system can be heavily optimised for specific query patterns, with caching, pre-computed embeddings, and finely-tuned retrieval algorithms.
your application's queries frequently reference unstructured data like documents, articles, or support tickets.

Features

Multiple Pipelines - Configure separate RAG pipelines for different data sources, each with its own database, embedding model, and LLM.
Hybrid Search - Combines vector similarity (semantic) and BM25 (keyword) search using Reciprocal Rank Fusion for better results.
Multiple LLM Providers - Support for OpenAI, Anthropic, Voyage, and Ollama.
Token Budget Management - Automatically manages context size to control LLM costs.
Streaming Responses - Optional real-time streaming via Server-Sent Events.
TLS Support - Built-in HTTPS support for production deployments.