Tutorial · 8 min read

How Alee Answers: RAG, Embeddings & the Knowledge Brain

A plain-English guide to how Alee uses RAG, embeddings, and pgvector retrieval to chunk your content, ground answers, cache, and self-check.

When a visitor asks your Alee bot a question, it doesn't make up an answer from general knowledge — it looks the answer up in your content first. That whole pipeline has a name: Retrieval-Augmented Generation, or RAG. This guide walks through every stage of how Alee turns your website, PDFs, and FAQs into a searchable "knowledge brain," then retrieves the right pieces and writes a grounded, sourced reply.

The big picture: what RAG actually means

Most chatbots either know nothing about your business or hallucinate confidently. RAG fixes that by splitting the job in two:

Retrieval — find the most relevant snippets from your content for the visitor's exact question.
Generation — hand those snippets to a language model and ask it to write an answer using only what was retrieved.

Alee runs this loop on every message. The model never answers from thin air; it answers from the chunks Alee pulled out of your knowledge brain. If nothing relevant comes back, the bot says it doesn't know instead of inventing something. That single design choice is what makes the answers trustworthy enough to put on a live site.

Step 1: You add knowledge sources

Everything starts with what you feed the bot. In your bot's Sources area, you add one or more knowledge sources:

A website URL — Alee crawls that page and pulls the text.
A whole sitemap or many pages — point Alee at a sitemap and it ingests the pages in one go.
PDFs and documents — upload brochures, manuals, policy docs, price lists.
YouTube videos — Alee uses the video transcript, so spoken content becomes searchable text.
Raw text or FAQ — paste answers directly, perfect for things that live in your head and nowhere else.

You can mix sources freely and add more any time. Re-crawl a page after you update it, drop in a new PDF, paste a fresh FAQ — the brain grows with your business. See the full list on the features page.

Step 2: Chunking — slicing content into bite-size pieces

A 20-page PDF is too big to hand a model whole, and too coarse to search well. So Alee splits each source into chunks — short, self-contained passages of a few sentences each.

Good chunking matters because retrieval works at the chunk level. If a chunk is too big, it mixes several topics and muddies the match. If it's too small, it loses context. Alee handles the splitting for you, but you can help it along:

Use clear headings and short paragraphs in your source pages — natural structure makes for cleaner chunks.
Keep one idea per FAQ entry rather than cramming five questions into one block.
For pricing or specs, lay information out plainly (a line per item) so each fact lands in its own neighbourhood.

Step 3: Embeddings — turning words into coordinates

Here's the part that sounds like magic but isn't. Each chunk is passed through an embedding model that converts its meaning into a long list of numbers — a vector. Think of it as plotting every chunk as a point in a giant "meaning map." Chunks about refund policy land near each other; chunks about gym timings cluster somewhere else entirely.

The key property: similarity is about meaning, not exact words. A chunk that says "we'll return your money within 7 days" sits close to the question "can I get a refund?" even though they share almost no words. That's why an Alee bot can answer a question phrased in Hindi-English, casual slang, or a way you never wrote down — as long as the meaning matches something in your content.

Step 4: pgvector — the knowledge brain that stores it all

All those vectors need a home that can search them fast. Alee stores them in a pgvector index — a vector database built on PostgreSQL. This is the literal "knowledge brain": every chunk's embedding, its original text, and a pointer back to its source, all indexed so Alee can find the nearest points to any new question in milliseconds.

Because it's per-bot, each bot only ever searches its own brain. An Agency plan reseller running ten client bots keeps ten separate brains — no cross-contamination between a gym's bot and an ecommerce store's bot.

Step 5: Retrieval — finding the right chunks at question time

Now the live moment. A visitor types a question. Here's what happens in order:

Alee embeds the question with the same embedding model used for your chunks, so it lands in the same meaning map.
It searches the pgvector index for the chunks whose vectors sit closest to the question's vector.
It pulls back the top handful of matches — the passages most likely to contain the answer.

This is semantic search, not keyword search. The visitor doesn't have to use your exact wording. They ask "how much for the annual plan?" and Alee retrieves your pricing chunk even if that page says "yearly subscription."

Step 6: Grounding — writing the answer from your content only

The retrieved chunks are handed to the model along with strict instructions: answer only from these passages, and if the answer isn't there, say so. This is grounding — tethering the reply to real, retrieved facts.

Two things follow from it:

Sources. Because Alee knows which chunks fed the answer, it can show where the information came from, so visitors (and you) can verify it.
Honest "I don't know." If retrieval comes back empty or weak, the bot tells the visitor it doesn't have that information rather than guessing. No invented prices, no made-up policies.

A quick worked example

Say you run a yoga studio and your timetable PDF says: "Morning batches: 6 AM and 7 AM, Monday to Friday."

That line becomes a chunk, gets embedded, and lands in your pgvector brain.
A visitor asks: "Do you have early morning classes on weekdays?"
Alee embeds the question, finds that timetable chunk as the nearest match.
The model reads the chunk and replies: "Yes — morning batches run at 6 AM and 7 AM, Monday to Friday," with the timetable shown as the source.

Notice the visitor never said "batch," "6 AM," or "PDF." Meaning-based retrieval bridged the gap.

Step 7: Caching — instant answers for repeat questions

Most real chat traffic is repetitive. "What are your timings?" gets asked fifty times a day. Re-running retrieval and generation for every identical question wastes time and your message quota.

So Alee caches answers. When a question matches one it has already answered (or a very similar one), it serves the stored reply instantly — no full round-trip needed. For your visitors that means snappier responses; for you it means your monthly messages stretch further. The cache refreshes naturally as your content and questions evolve.

Step 8: Self-check — guarding against drift

Before an answer goes out, Alee runs a self-check: it verifies the reply is actually supported by the retrieved chunks. This is the last line of defence against an answer that drifts away from your source material. If the answer can't be backed by what was retrieved, the bot falls back to a safe "I don't know" rather than shipping something shaky.

Stacked together — grounding, sources, honest fallbacks, and the self-check — these are why you can embed an Alee bot on a live storefront without babysitting every reply.

How to make your knowledge brain answer better

The quality of answers tracks the quality of what you feed in. To get sharper replies:

Add the obvious sources first — your main site, pricing or services page, and a plain FAQ covering your ten most common questions.
Use the question inbox. Alee's analytics show a Top Questions list and a triage inbox. Mark questions important or FAQ, and where the bot fumbled, teach it a better answer — that becomes new knowledge.
Re-crawl after changes. Updated your prices or hours? Re-crawl that page so the brain reflects reality.
Cover gaps with pasted text. If something only lives in your head, paste it as a raw-text source. The brain can only retrieve what you've added.
Keep sources focused. A tight, well-structured set of pages beats dumping a hundred half-relevant URLs.

Want to set this up end to end? Start free and add your first source in minutes, or browse more guides for customization, leads, and embedding.

Frequently asked questions

Does Alee make up answers if it can't find them in my content?

No. Alee only answers from the chunks it retrieves from your knowledge brain, and it self-checks each reply for grounding before sending. If the answer isn't in your content, the bot says it doesn't know rather than guessing.

Do I need to understand embeddings or pgvector to use Alee?

Not at all. Chunking, embedding, indexing, retrieval, and caching all happen automatically the moment you add a source. Your only job is to give the bot good content and, optionally, teach better answers from the question inbox.

How fast does the bot pick up new content I add?

Once you add or re-crawl a source, Alee chunks and embeds it into the brain so it's available for retrieval shortly after. Repeat or similar questions are then served from cache for near-instant replies.

Ready to give your content a brain of its own? [Start free with Alee](/signup) and watch your first bot answer from your own words.

Try it in your own Alee bot

Train it on your site, embed it anywhere, capture leads 24/7. Free to start, no card.