Integrations · 13 min read

Using a Chatbot REST API: A Developer Guide

A practical developer guide to the chatbot REST API: endpoints, auth, streaming, webhooks, error handling, and shipping a production integration.

The embeddable widget in the bottom-right corner of a website is the part of a chatbot most people see. The part developers actually live in is the chatbot API behind it — the HTTP contract that lets your code send a question, get a grounded answer, and do something useful with the result. Once you reach for a chatbot REST API, you stop being limited to "bubble on a marketing site" and start treating the bot as a service: a Slack command, a mobile app screen, a backend that pre-answers support tickets, a voice agent, an internal tool that queries your own knowledge base. This guide is for the engineer who has to make that integration work, ship it, and keep it from paging them at 2 a.m.

We'll stay concrete. You'll see what a typical request and response look like, how authentication and streaming work, how to handle conversation state, what the common failure modes are, and how to move from a quick prototype to something you'd put in front of real users. The examples are vendor-neutral where possible, and where it helps to be specific we'll reference how a platform like Alee exposes these capabilities, since it's built around training a bot on your own content and answering through both a widget and an API.

What a chatbot REST API actually gives you

A REST API for a chatbot is just a set of HTTP endpoints that wrap the same retrieval-and-generation pipeline the widget uses. Strip away the framing and almost every chatbot API does the same handful of things:

Send a message to a specific bot and get back a generated answer.
Ground that answer in your indexed content rather than the model's general training, which is the whole point of a RAG chatbot.
Return citations or source references so you can show where an answer came from.
Maintain conversation context across multiple turns via a session or conversation ID.
Capture structured data like a captured lead, a contact email, or a chosen intent.
Fire events (webhooks) when something happens — a new conversation, a captured lead, an unanswered question.

If you've worked with any modern LLM API, the shape will feel familiar: JSON in, JSON out, bearer-token auth, predictable HTTP status codes. The difference is that a chatbot REST API sits one layer up. You're not managing prompts, embeddings, vector search, and re-ranking yourself — the platform does that and hands you a clean endpoint. That's the trade you're making: less control over the internals, far less code to maintain.

REST, streaming, and the two response styles

Most chatbot APIs offer two ways to receive an answer, and you'll pick based on UX:

Blocking (synchronous) responses. You POST a message, the connection holds, and you get the full answer in one JSON body. Simplest to code. Best for backend jobs, batch processing, or anywhere a human isn't watching a cursor blink. The downside is latency: the user waits for the entire generation before seeing a single word.
Streaming responses. The server pushes tokens as they're generated, usually over Server-Sent Events (SSE) or a chunked HTTP stream. This is what makes a chat UI feel alive — words appearing as the model "types." It's more work to parse on the client, but for any interactive surface it's worth it.

A good rule: stream when a person is waiting and reading; block when a machine is waiting and parsing.

Authentication and your first request

Before any real work, you need credentials. Nearly every chatbot REST API authenticates with an API key passed as a bearer token in the Authorization header. Some also scope keys to a specific bot or project so a leaked key can't touch your whole account.

A few non-negotiables that apply regardless of vendor:

Never ship a secret API key to the browser. A key embedded in client-side JavaScript is a key you've published. If you need the bot to run client-side, use the platform's public widget or embed token (designed to be exposed), and keep secret keys on your server.
Store keys in environment variables or a secrets manager, never in source control. Rotate them if they leak.
Use separate keys per environment. A dev key and a production key let you revoke one without taking down the other.

A minimal request

Here's the canonical first call against a chatbot REST API — sending a single message to a bot and reading the answer. The exact field names vary by provider, but the structure is consistent:

```bash
curl https://api.example.com/v1/chat \
-H "Authorization: Bearer $CHATBOTAPIKEY" \
-H "Content-Type: application/json" \
-d '{
"botid": "bot8f2a",
"message": "What is your refund window?",
"conversation_id": null
}'
```

A typical JSON response:

```json
{
"conversationid": "conv3b91",
"message": {
"role": "assistant",
"content": "Refunds are available within 30 days of purchase, provided the item is unused."
},
"sources": [
{ "title": "Returns & Refunds Policy", "url": "https://yoursite.com/refunds" }
],
"finish_reason": "stop"
}
```

Three things to notice. First, the answer is grounded — that refund window came from indexed content, not invented. Second, you get a conversation_id back even though you passed null; you'll reuse it on the next turn. Third, sources lets you render citations, which is one of the most underrated trust features in a support context.

The same call in JavaScript

For a Node backend (note this runs server-side, where the key is safe):

```javascript
const res = await fetch("https://api.example.com/v1/chat", {
method: "POST",
headers: {
"Authorization": Bearer ${process.env.CHATBOT_API_KEY},
"Content-Type": "application/json",
},
body: JSON.stringify({
botid: "bot8f2a",
message: "What is your refund window?",
conversation_id: existingConversationId ?? null,
}),
});

if (!res.ok) {
throw new Error(Chatbot API error: ${res.status});
}

const data = await res.json();
console.log(data.message.content);
```

That's the whole loop for a non-streaming integration. Everything else is refinement.

Managing conversation state

A single question is rarely the whole job. Real conversations have follow-ups — "what about international orders?" only makes sense if the bot remembers you were just asking about refunds. There are two patterns for carrying that context, and a chatbot REST API will use one of them.

Server-managed conversations (the common case)

The platform stores conversation history and you reference it with a conversation_id:

First message: send with conversation_id: null. The API creates a conversation and returns its ID.
Every following message: include that same conversation_id. The server stitches in prior turns automatically.
To start fresh: send null again, or call a dedicated "new conversation" endpoint.

This is the easiest to work with. You hold one string per active chat and let the platform do the memory management. Alee and most hosted platforms default to this model because it keeps your client thin.

Client-managed history (more control, more responsibility)

Some APIs let you pass the full message array on every request:

```json
{
"botid": "bot8f2a",
"messages": [
{ "role": "user", "content": "What's your refund window?" },
{ "role": "assistant", "content": "Refunds are available within 30 days." },
{ "role": "user", "content": "What about international orders?" }
]
}
```

You own the transcript. This gives you flexibility — you can trim, summarize, or edit history before sending — but you also pay for it in tokens and complexity, and you have to decide how much history to keep. For most teams, server-managed conversations are the right default; reach for client-managed only when you genuinely need to manipulate the transcript.

A note on session identity

Whatever the model, tie conversations to a stable identifier for the end user (a hashed user ID, a session cookie, an anonymous visitor ID). It makes analytics coherent, lets you resume conversations across page loads, and is essential if you want to attribute captured leads to a real person later. If you care about measuring this stuff — and you should — our guide to chatbot analytics and the metrics that matter covers what to track.

Handling streaming responses

For interactive UIs, streaming is what separates a snappy assistant from a laggy one. Here's how it works in practice with Server-Sent Events.

You send the same request with a streaming flag (often "stream": true or a dedicated endpoint), and instead of one JSON body you receive a sequence of events:

```
data: {"delta": "Refunds "}
data: {"delta": "are available "}
data: {"delta": "within 30 days."}
data: {"event": "done", "conversationid": "conv3b91"}
```

A minimal browser-safe consumer using the Fetch API and a stream reader:

```javascript
const res = await fetch("/api/chat-proxy", { // your server proxies the real key
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ message, conversationId }),
});

const reader = res.body.getReader();
const decoder = new TextDecoder();
let answer = "";

while (true) {
const { value, done } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
for (const line of chunk.split("\n")) {
if (!line.startsWith("data: ")) continue;
const payload = JSON.parse(line.slice(6));
if (payload.delta) {
answer += payload.delta;
renderPartial(answer); // update the UI live
}
}
}
```

Practical streaming tips that save real debugging time:

Always proxy streaming through your own server when the bot is user-facing, so the secret key never reaches the browser. Your server holds the key, calls the chatbot REST API, and re-streams the bytes to the client.
Handle partial lines. Network chunks don't respect message boundaries — a data: line can be split across two reads. Buffer until you hit a newline.
Plan for mid-stream errors. A stream can fail after it's begun. Send a final event you can check for, and surface a clean "something went wrong, try again" rather than a half-rendered sentence.
Set sensible timeouts. Streams hold connections open; make sure your proxy and load balancer won't kill a legitimately long generation.

Errors, rate limits, and retries

The difference between a demo and a production integration is almost entirely in how you handle the unhappy path. A chatbot API will return standard HTTP status codes, and you should map each to a deliberate behavior.

`400 Bad Request` — malformed body, missing bot_id, message too long. Don't retry; fix the request. Log the payload so you can debug.
`401 / 403` — bad or unauthorized key. Don't retry. Alert yourself; this usually means a rotated or misconfigured secret.
`404` — wrong bot or conversation ID. Don't retry blindly; verify the ID.
`429 Too Many Requests` — you've hit a rate limit. Retry, but with backoff (see below).
`5xx` — transient server-side problem. Retry with backoff a few times, then fail gracefully.

Exponential backoff with jitter

For 429 and 5xx, naive immediate retries make things worse — every client retrying in lockstep creates a thundering herd. Back off exponentially and add randomness:

```javascript
async function callWithRetry(fn, max = 4) {
for (let attempt = 0; attempt < max; attempt++) {
const res = await fn();
if (res.ok) return res;
if (res.status !== 429 && res.status < 500) return res; // non-retryable
const base = 2 * attempt 500; // 0.5s, 1s, 2s, 4s
const jitter = Math.random() * 300;
await new Promise((r) => setTimeout(r, base + jitter));
}
throw new Error("Chatbot API: retries exhausted");
}
```

If the API sends a Retry-After header on a 429, honor it — it's the server telling you exactly how long to wait.

Graceful degradation

When the bot genuinely can't answer — API down, retries exhausted, or the question is outside its knowledge — don't show a stack trace. Show a fallback: a contact form, a link to your help center, or a handoff to a human. A bot that says "I'm not sure, but you can reach our team here" beats a spinner that never resolves. This is doubly true for support use cases; our AI customer service guide goes deeper on designing these handoff moments.

Webhooks: letting the bot push to you

Polling an API to ask "anything new?" is wasteful. Webhooks invert it — the platform calls your URL when an event happens. This is how you wire a chatbot into the rest of your stack without writing a polling loop.

Common events a chatbot REST API will emit:

`conversation.started` — a new chat began. Useful for real-time dashboards.
`lead.captured` — the bot collected a name, email, or phone number. Push it straight into your CRM.
`message.unanswered` — the bot couldn't confidently answer. Gold for finding content gaps.
`conversation.handoff` — a visitor asked for a human. Route it to your support queue or Slack.

To consume them, expose an HTTPS endpoint and register its URL in your dashboard:

```javascript
app.post("/webhooks/chatbot", express.json(), (req, res) => {
// 1. Verify the signature BEFORE trusting the body
const signature = req.headers["x-signature"];
if (!verifySignature(req.body, signature, process.env.WEBHOOK_SECRET)) {
return res.status(401).end();
}

// 2. Acknowledge fast, process async
res.status(200).end();

// 3. Do the real work off the request path
const { event, data } = req.body;
if (event === "lead.captured") {
enqueue(() => syncToCrm(data));
}
});
```

Three webhook rules that matter:

Verify signatures. Webhook endpoints are public URLs; without signature verification, anyone can POST fake events. Use the shared secret the platform gives you.
Acknowledge fast, work later. Return 200 immediately and do heavy lifting (CRM sync, emails) in a background job. Slow webhook handlers get retried or marked failed.
Be idempotent. Networks cause duplicate deliveries. Key off an event ID so processing the same lead.captured twice doesn't create two CRM records.

Webhooks are where a chatbot stops being a silo. Lead capture flowing automatically into your CRM is one of the highest-leverage integrations you can build; if that's your goal, see how we think about lead generation with chatbots.

Going to production

A working call in Postman is not a production integration. Here's the checklist that separates the two.

Security and key hygiene

Secret keys live server-side only. Audit your client bundle to be sure none leaked.
Use per-environment keys and rotate on a schedule.
Put the chatbot API behind your own thin proxy so you control rate limiting, logging, and auth at your edge — and so you can swap providers without touching client code.

Performance and cost

Cache where it's safe. Identical, non-personalized questions ("what are your hours?") can be cached briefly to cut latency and cost. Don't cache anything personalized or conversation-specific.
Set request timeouts on every call. A hung upstream request shouldn't hang your whole endpoint.
Watch your token and request budgets. Long conversation histories cost more each turn. Server-managed conversations help here because the platform optimizes context for you.

Observability

Log every request with a correlation ID, the bot ID, latency, and status — but redact PII from message bodies before logging.
Track the same metrics in production you'd track in the dashboard: resolution rate, unanswered questions, handoff rate. Unanswered questions are a direct to-do list for improving your content.
Alert on error-rate spikes and on auth failures specifically; a sudden 401 wall usually means a rotated key didn't propagate.

A note for regulated industries

If you're integrating a chatbot for a bank, insurer, clinic, or legal or financial services business, scope it deliberately. A bot built on your content is excellent at logistics and FAQs — hours, locations, how to start a claim, what documents to bring, how to book an appointment, where to find a form. It should not be positioned as a source of medical, legal, or financial advice, and it shouldn't improvise answers to questions that carry real consequences. Make human handoff a first-class path: when a conversation crosses into advice territory or a regulated decision, the bot's job is to collect the right context and route the person to a qualified human, quickly and clearly. Treat the API's conversation.handoff event and a clean fallback as required features, not nice-to-haves, in these settings.

Choosing how to integrate: API, widget, or both

Not every integration needs raw API calls. It's worth being honest about the trade-offs.

Use the embeddable widget when you want a chat experience on a website with near-zero engineering. Paste a snippet, done. Most teams start here, and for many that's the whole project. If that's you, our walkthrough on embedding an AI chatbot on your website is the faster path.
Use the chatbot REST API when you need the bot somewhere the widget can't go: a native mobile app, a Slack or Discord bot, a backend that drafts ticket replies, a custom-designed chat UI, or an automated pipeline that queries your knowledge base programmatically.
Use both — and most mature setups do. The widget handles the website; the API powers the integrations around it. Because they're backed by the same trained bot and the same content, answers stay consistent everywhere.

Platforms differ in how much API surface they expose. Some chat tools are widget-first with a thin API; developer-focused options give you fuller programmatic control over conversations, sources, and events. Tools like Intercom and Drift lean heavily toward their own ecosystems, which is great if you live there and more friction if you don't. Alee is designed so the same bot you train on your content is reachable through both the widget and a REST API, which keeps the "answer once, deploy everywhere" promise intact. Whatever you choose, weigh it against how much you actually need to customize — if you want a deeper comparison, we maintain a roundup of SiteGPT alternatives.

A pragmatic build order

If you're starting today, this sequence gets you to a solid integration without backtracking:

Index your content first. The API is only as good as what the bot was trained on. Get your real docs, help center, and policies in before you write a line of integration code.
Prototype with a blocking call. Prove the request/response loop in a script. No streaming, no UI — just confirm you get grounded answers.
Add conversation state. Wire up conversation_id so follow-ups work.
Layer in streaming if you have an interactive UI, proxied through your server.
Harden the edges. Retries with backoff, graceful fallback, timeouts, logging.
Wire webhooks for leads and handoffs so the bot feeds the rest of your stack.
Instrument and iterate. Watch unanswered questions; feed the gaps back into your content.

Each step is independently shippable, which means you're never far from something that works.

Frequently asked questions

What is the difference between a chatbot API and a raw LLM API?

A raw LLM API gives you a model and nothing else — you supply the prompt, manage embeddings, run retrieval, and assemble context yourself. A chatbot REST API sits on top of that and hands you a finished pipeline: it already retrieves from your indexed content, grounds the answer, manages conversation memory, and returns citations. You trade some low-level control for dramatically less code to build and maintain.

Do I need to handle vector search and embeddings myself?

No. That's the main reason to use a chatbot REST API instead of wiring up a model directly. The platform indexes your content, generates embeddings, runs retrieval, and re-ranks results behind the endpoint. You send a question and get a grounded answer back. If you want to understand what's happening under the hood, our explainer on what RAG is covers the mechanics.

How do I keep my API key safe in a web app?

Never put a secret key in client-side code — anything in the browser is public. Keep secret keys on your server and have the browser call your own backend, which then calls the chatbot API. For client-side chat experiences, use the platform's public widget or a purpose-built embed token designed to be exposed, and reserve secret keys strictly for server-to-server calls.

Can I use a chatbot API for regulated industries like finance or healthcare?

Yes, for the right scope. A content-trained bot is well suited to logistics and FAQs — hours, locations, document checklists, how to start a process. It should not give medical, legal, or financial advice, and you should make human handoff a built-in path for anything that crosses into a regulated decision. Treat the handoff event and a clean fallback as required parts of the integration, not optional extras.

What should I do when the bot can't answer a question?

Degrade gracefully. Catch the unanswered case and offer a fallback — a contact form, a help-center link, or a handoff to a human — rather than a dead end or an error. Also log those unanswered questions: they're the most direct signal of where your content has gaps, and closing them is the single best way to raise your bot's resolution rate over time.

Should I use streaming or blocking responses?

Stream when a person is reading the answer in real time — it makes the experience feel fast and responsive. Use blocking responses when a machine is consuming the output, like a backend job, a batch process, or anywhere you just need the final JSON. Streaming is more work to parse on the client, so don't reach for it unless there's a human watching.

Ready to put this into practice? Train a bot on your own content, drop it on your site with a snippet, and reach it from anywhere through a clean REST API — all without standing up your own RAG pipeline. Start free with Alee, ship your first integration this afternoon, and let the same trained bot answer visitors on your website and power every integration around it.

Build your own AI chatbot with Alee

Train it on your site, embed it anywhere, capture leads 24/7. Free to start.