✨ Train your first AI chatbot free — no credit card neededStart free →
Alee
← All tutorials
Tutorial · 7 min read

How to Train Your Chatbot on a Website URL

Train your Alee chatbot on a single website URL: how the crawl works, what gets indexed, and how to fix thin or blocked pages.

Adding a single website URL is the fastest way to give your Alee chatbot real knowledge. You paste one link, Alee crawls that page, turns it into a searchable knowledge brain, and your bot can start answering questions grounded in that content within a minute or two. This guide walks through exactly how the crawl works, what gets indexed, and how to fix the two problems that trip people up most: thin pages and blocked pages.

Before you start

You need an Alee account and at least one bot created. If you do not have one yet, start free — the Free plan gives you one bot and 200 messages a month, which is plenty to test a single URL.

Have the exact page address ready. A few things to check first:

  • Use the live, public URL — the same one you would open in a normal browser window, not a staging or password-protected version.
  • Pick the page that actually holds the answers. If your pricing lives on /pricing and your FAQ lives on /faq, those are better single-URL targets than your homepage, which is often mostly images and headlines.
  • Confirm the page loads without a login. Alee crawls what an anonymous visitor sees. If a human needs to sign in to read it, the crawler cannot reach it either.

Step 1: Open the bot's Sources tab and add a source

  1. From your dashboard, open the bot you want to train.
  2. Go to the bot's Sources area (where all knowledge sources for this bot live).
  3. Choose to add a source and pick the website / URL option (as opposed to sitemap, PDF, YouTube, or pasted text).
  4. Paste the full URL, including https://. For example: https://yourgym.com/membership-plans.
  5. Confirm to start the crawl.

That is the whole action. Alee does the rest.

Step 2: What happens during the crawl

When you submit a single URL, Alee fetches that one page the way a browser would, then reads the page's main content — headings, paragraphs, list items, tables, and link text. It strips out the parts that are not useful to a chatbot: navigation menus, footers, cookie banners, and styling.

What is left — the actual words on the page — is then split into small overlapping chunks. Each chunk is converted into a vector embedding (a numeric fingerprint of its meaning) and stored in your bot's pgvector index, the "knowledge brain." This is the Advanced RAG method that powers every answer: when a visitor asks something, Alee embeds their question, finds the closest chunks, and the model writes an answer grounded only in what it retrieved, with sources attached. If the answer is not in your content, the bot says it does not know rather than making something up.

A few useful details:

  • Only the page you submit gets crawled. A single URL means one page. Alee does not automatically follow links to crawl your whole site. If you want many pages at once, use the sitemap source type instead — that is the right tool for "index everything."
  • The crawl is usually quick — most single pages finish in seconds to a minute, depending on page size.
  • You will see a status. Watch for the source to move to a ready/indexed state. If it errors or comes back empty, that is your signal to troubleshoot (see below).

Step 3: Test the bot before you trust it

Do not assume the crawl worked just because it finished. Open the bot preview and ask three or four real questions whose answers live on that page.

Worked example. Say you crawled https://yourgym.com/membership-plans, which lists a Monthly plan at ₹1,499 and a Quarterly plan at ₹3,999 with a free trial week.

  • Ask: "How much is the monthly membership?" — a good answer cites ₹1,499 and points to the source.
  • Ask: "Is there a free trial?" — it should mention the trial week.
  • Ask something not on the page: "Do you have a swimming pool?" — a healthy bot says it does not have that information, instead of inventing a pool.

If the bot answers the first two correctly and declines the third, your single-URL training is working. Repeat or similar questions after this will be served from cache, so they come back instantly.

What gets indexed (and what does not)

It helps to know the crawler's blind spots up front:

  • Visible text is indexed. Body copy, headings, lists, table cells, and FAQ entries all make it in.
  • Text inside images is not. A pricing table saved as a JPG, or an infographic, is invisible to the crawler. Put that information in real text on the page, or paste it as a text source.
  • Content loaded after heavy interaction may be missed. Accordions, tabs, "load more" buttons, and content that only appears after a click can be hit or miss. If your key content is hidden behind a tab, consider linking to a plain version of it.
  • Video and audio are not transcribed from a web page. For a YouTube video, use the dedicated YouTube source type, which reads the transcript.
  • PDFs linked on the page are not pulled in. Add those separately using the PDF / documents source type.

Troubleshooting thin or blocked pages

Two failure modes cover almost every problem.

Thin pages (the crawl works but the bot knows nothing)

A "thin" page returns a success but barely any usable text. Common causes and fixes:

  • The page is mostly a hero image and a headline. There is little for the model to retrieve. Fix: point at a content-rich page instead (your FAQ, docs, or a long pricing page), or paste the key facts as a text/FAQ source.
  • The real content is an image or embed. Re-type that information as actual text on the page, then re-crawl.
  • The content is rendered by heavy scripts and never appears as plain text. Try a more static version of the page, or paste the important answers directly as text.
  • You picked a shallow landing page. Landing pages sell; they rarely explain. Choose the page that answers questions.

After fixing the page, re-crawl the source so the brain picks up the new text. You can re-crawl or add sources any time, and the brain grows with each one.

Blocked pages (the crawl fails or returns nothing)

If Alee cannot fetch the page at all, check these in order:

  1. Login or paywall. If the page needs a sign-in, the crawler cannot see it. Use a public URL, or paste the content as text.
  2. Wrong or redirecting URL. Confirm the link opens cleanly in a private browser window with no redirect chain. Copy the final address from the address bar.
  3. The page blocks bots. Some sites and CDNs block automated requests or hide content behind a "are you human" challenge. If you control the site, allow the crawler or, again, paste the content as a text source.
  4. `noindex` or robots rules. Pages set to discourage indexing may be skipped. If it is your site, you can adjust those rules; if not, fall back to pasting the text.
  5. Server errors or very slow pages. A page that times out or returns a 404/500 cannot be indexed. Confirm it loads fast and returns a normal page.

When in doubt, the universal fallback is the paste text source: copy the page's content into Alee directly. It always works because it skips fetching entirely, and it indexes exactly like a crawled page.

After it works: keep the brain fresh

A single URL is a starting point, not a finish line.

  • Re-crawl after you update the page so the bot reflects new prices, hours, or policies.
  • Add more sources — another URL, your sitemap, a PDF, or pasted FAQs — to widen what the bot can answer. Each source stacks into the same brain.
  • Use the question-triage inbox. Mark questions important, FAQ, or answered, and teach a better answer where the bot fell short. The Top Questions list shows you exactly which pages to add next.

Want to see everything a trained bot can do? Browse the features overview, check the pricing tiers when you are ready to scale to more bots, or read more guides on sitemaps, leads, and embedding. If you are comparing tools, here is Alee vs SiteGPT.

Frequently asked questions

Does adding one URL crawl my whole website?

No. A single URL crawls exactly one page. To index many pages at once, use the sitemap source type, which lets Alee pull in your full site (or a large section of it) in one go.

Why does my bot say "I don't know" after I added a page?

Either the page was thin (little real text), the content was locked inside images or scripts, or the crawl was blocked. Test with questions you know are answered on that page, and if it still draws a blank, re-crawl a content-rich version of the page or paste the key facts as a text source.

How do I update the bot when the page changes?

Open the source and re-crawl it. Alee re-fetches the live page and refreshes the chunks in your knowledge brain, so the bot starts answering from the updated content. The cache for old answers gives way to the new ones.

Ready to put this into practice? [Start free with Alee](/signup) and train your first bot on a website URL in the next five minutes.

Try it in your own Alee bot

Train it on your site, embed it anywhere, capture leads 24/7. Free to start, no card.

Keep learning