Text to Markdown: The Complete Conversion Guide
Learn how to convert plain text to markdown precisely — formatting rules, tools, automation tips, and when a markdown converter saves hours.
Plain text is everywhere — pasted from a PDF, copied out of a Word doc, exported from a CMS, scraped from a webpage, or typed into a Notes app. Markdown is what modern publishing, documentation, and AI systems actually want. Knowing how to handle text to markdown conversion cleanly — by hand, with a tool, or via automation — is one of those skills that looks small until the day you need it at scale.
This guide covers the full picture: the syntax you'll use most of the time, the tools worth trusting, a decision framework for choosing between them, common mistakes people make, and how platforms like Alee use markdown internally when they process your content sources.
What "text to markdown" actually means
"Text to markdown" means one of two things depending on context:
- Plain text → markdown: you have unformatted prose and want to add structural markup so it renders with headings, bold, lists, and links.
- Rich text / HTML → markdown: you have content from a web page, Word document, or Google Doc and want to strip out the HTML or proprietary formatting in favor of clean, portable markdown.
Both are legitimate conversions. The underlying goal is the same: get content into a format that's human-readable as raw text, renders beautifully in markdown-aware tools, and plays well with version control, static site generators, documentation platforms, and AI ingestion pipelines.
Markdown was designed by John Gruber in 2004 to be readable even without rendering. That constraint shapes everything about it — the syntax uses symbols that look like what they mean (a # really does look like a heading, **bold** looks emphatic, - item looks like a list bullet).
---
Core markdown syntax you need to know
If you're doing manual conversion, this is your practical reference. These cover the vast majority of real documents.
Headings
```
# Heading 1
## Heading 2
### Heading 3
```
Use # for the page title (H1), ## for major sections, ### for subsections. Don't skip levels — jumping from ## to #### confuses both readers and accessibility tools.
Emphasis and strong
| Intent | Markdown | Output |
|---|---|---|
| Italic | *word* or _word_ | word |
| Bold | **word** or __word__ | word |
| Bold + italic | ***word*** | *word* |
| Strikethrough | ~~word~~ | ~~word~~ |
A common mistake: using underscores for bold inside words (like under_score_here) — many parsers won't render it, and it looks wrong in raw form anyway. Asterisks are safer universally.
Lists
Unordered list — use -, *, or + (pick one and be consistent):
```
- First item
- Second item
- Nested item (two spaces or a tab before the dash)
```
Ordered list — just use numbers followed by a period:
```
- Step one
- Step two
- Step three
```
One gotcha: if you accidentally put a blank line between list items without consistent indentation, many parsers will break the list into separate blocks.
Links and images
```
Link text
!Alt text
[Link with title](https://example.com "Hover tooltip")
```
When converting from HTML or Word, the most common failure is forgetting to include the https:// prefix in the URL. Relative links (/about) work fine in static sites but won't resolve in standalone markdown renderers.
Code — inline and block
For inline code (commands, filenames, variable names):
```
Use npm install to install dependencies.
```
For fenced code blocks, use three backticks and optionally name the language for syntax highlighting:
````
```javascript
const name = "Alee";
console.log(Hello from ${name});
```
````
The language hint after the opening backticks (javascript, python, bash, json, etc.) is optional but worth adding — it enables syntax highlighting in GitHub, VS Code, and documentation platforms.
Blockquotes and horizontal rules
```
> This is a blockquote. Add a > at the start of each line.
> It can span multiple lines.
---
```
Three hyphens (---), asterisks (***), or underscores (___) on their own line produce a horizontal rule. This is also used as frontmatter delimiter in many static site generators (Hugo, Jekyll, Astro, Next.js with content layer), which is how this very article is structured.
---
Markdown tables: the format that trips people up most
A markdown table uses pipes (|) and hyphens (-). Here's the template:
```
| Column A | Column B | Column C |
|----------|----------|----------|
| Row 1 | Data | Data |
| Row 2 | Data | Data |
```
The alignment row (second line) is required. You can optionally control column alignment:
| Syntax | Effect |
|---|---|
| |---| | Default (left) |
| |:---| | Explicit left |
| |:---:| | Center |
| |---:| | Right |
Practical tip: don't try to keep column widths visually aligned in raw markdown unless your editor auto-formats tables. It's nice when someone else does it, but it's irrelevant to the rendered output. What matters is that every row has the same number of | separators.
Converting a table from Word or a spreadsheet manually is tedious. This is the one case where a table-specific converter (or a VS Code extension like Markdown Table Formatter) saves real time. For a quick one-off, paste the data into any markdown table generator and copy the result.
---
Four conversion approaches compared
Here's a decision framework. The right approach depends on volume, source format, and how often you need to do this.
| Approach | Best for | Pros | Cons |
|---|---|---|---|
| Manual typing | One-off, small docs, learning | Full control, no tools needed | Slow at scale |
| Browser-based converters | HTML/Word/Google Docs paste | Fast, no install, handles rich text | Requires copy-paste workflow, no batch |
| CLI tools (pandoc) | Bulk file conversion, scripting | Powerful, supports dozens of formats | Requires installation, steeper CLI learning curve |
| VS Code / editor plugins | Developers, daily writing | Integrated into existing workflow | Only helps if you live in that editor |
| AI-assisted conversion | Complex, semi-structured content | Handles messy formatting intelligently | Requires review, occasional hallucination of structure |
Browser-based tools
For most people, pasting rich text into a converter and getting clean markdown is the fastest path. A few worth knowing:
- Turndown (turndown.js demo page) — converts HTML to markdown, open-source, runs in the browser
- Dillinger.io — live preview editor; paste text on the left, see rendered output on the right
- StackEdit — full-featured online markdown editor with import from Google Drive and Dropbox
- Markdown Table Generator — specific to tables, handy for the most painful part of formatting work
None of these require an account for basic use. If you're converting a one-page document, you'll be done in under two minutes.
Pandoc: the professional-grade converter
Pandoc is the Swiss Army knife of document conversion. If you need to convert documents at scale — or you're moving files from Word (.docx), HTML, LaTeX, reStructuredText, or EPUB — pandoc handles it cleanly.
Basic install on macOS:
```bash
brew install pandoc
```
Convert a Word document to markdown:
```bash
pandoc document.docx -o document.md
```
Convert an HTML file to markdown:
```bash
pandoc page.html -t markdown -o page.md
```
Strip images from the output (common when converting web pages where you don't have the image assets):
```bash
pandoc page.html -t markdown --strip-comments -o page.md
```
Pandoc's markdown output is CommonMark-compliant by default. If you need GitHub Flavored Markdown (GFM) — which adds task lists, tables, and strikethrough — add -t gfm:
```bash
pandoc document.docx -t gfm -o document.md
```
The main limitation: pandoc is not perfect with complex multi-column layouts or heavily styled Word documents. It gets the content right but may need manual cleanup for formatting anomalies.
---
Rich text to markdown: the HTML source approach
When you're converting content from a webpage or a web-based tool, going via the HTML source often gives the cleanest result. Here's a reliable workflow:
- Open the source page in your browser.
- View Page Source (
Cmd+Uon Mac /Ctrl+Uon Windows) or copy the element's outer HTML from DevTools. - Paste the HTML into a tool like Turndown or pipe it through pandoc.
- Review the output — especially headings, links, and any tables.
- Strip any conversion artifacts (empty lines between list items, stray
\characters, redundant<br>conversions).
This is how content migration usually works in practice: a developer pulls the HTML of each page via a script, runs it through pandoc or a custom Turndown-based script, and writes the output to .md files. It's not glamorous, but it's reliable.
For Google Docs specifically, the cleanest path is File → Download → Markdown (.md) — Google added native markdown export in 2023. It's not perfect on complex docs, but it's dramatically better than copying and pasting formatted text.
---
Common mistakes in text to markdown conversion
These are the problems that come up repeatedly, especially for people new to markdown or doing bulk conversions.
1. Missing blank lines between block elements
Markdown is whitespace-sensitive in a specific way. You need a blank line between a paragraph and a list, between a heading and content, and between different block elements. Without it, many parsers run elements together or fail to render them correctly.
Wrong:
```
## My section
This paragraph immediately follows.
- Item one
- Item two
```
Right:
```
## My section
This paragraph immediately follows.
- Item one
- Item two
```
2. Escaping characters that should be literal
If your text contains characters that markdown treats as special — *, _, #, ` `, [, ], (, ), \` — you need to escape them with a backslash if you want them to render literally.
Example: if you're writing "enter 1*2*3 in the field", the *2* would become italic without escaping. Write it as 1\*2\*3 or wrap the whole expression in backticks.
3. Inconsistent list indentation
Nested lists require consistent indentation. Two spaces or a tab — pick one and stick to it throughout the document. Mixing two-space and four-space indentation within the same list is a common source of rendering bugs, particularly in strict CommonMark parsers.
4. Using H1 more than once
A markdown document should have exactly one H1 (#). In practice, especially when converting web content, you'll often encounter pages where the title is in the H1 and every section heading is also H1. Normalize these to ## for sections and ### for subsections.
5. Line breaks vs paragraph breaks
A single line break in markdown source does NOT produce a line break in the rendered output. You need either:
- A blank line (for a new paragraph)
- Two trailing spaces at the end of the line (for a
<br>break within a paragraph — fragile, avoid where possible) - A
\at the end of the line (CommonMark syntax for a hard line break)
When converting poetry, lyrics, or code comments that rely on single-line breaks, you'll need to explicitly handle this.
---
Markdown flavors: which one are you targeting?
"Markdown" isn't one spec — it's a family of related formats. When doing your conversion, knowing your target flavor avoids wasted effort.
| Flavor | Used by | Key extras |
|---|---|---|
| CommonMark | Most modern tools, default pandoc output | Fully spec'd, predictable |
| GitHub Flavored Markdown (GFM) | GitHub, many dev tools | Tables, task lists, ~~strikethrough~~, autolinks |
| MultiMarkdown (MMD) | macOS apps, academic writing | Footnotes, citations, cross-references |
| kramdown | Jekyll, older GitHub Pages | Attribute lists, footnotes |
| MDX | Next.js, Astro, Docusaurus | JSX components inside markdown |
For general content — blog posts, documentation, chatbot knowledge bases — GFM is the safest default. It's a strict superset of CommonMark with the practical extras (tables, task lists) that most people need.
For developer docs published via a static site generator, check what your generator expects. Hugo supports GFM via Goldmark; Astro and Docusaurus support MDX if you need components; Jekyll's default is kramdown. Using the wrong flavor can cause subtle rendering bugs with tables or frontmatter.
---
Automating text to markdown at scale
If you're doing this once, pick any tool. If you're doing it repeatedly — content migrations, weekly doc imports, CMS exports, ingesting training data — automation is worth the setup time.
A Node.js pipeline using Turndown
Turndown is the most widely used JavaScript library for HTML-to-markdown conversion. A simple script:
```javascript
import TurndownService from 'turndown';
import { gfm } from 'turndown-plugin-gfm';
import fs from 'fs';
const td = new TurndownService({ headingStyle: 'atx', codeBlockStyle: 'fenced' });
td.use(gfm); // adds GFM table support
const html = fs.readFileSync('input.html', 'utf8');
const markdown = td.turndown(html);
fs.writeFileSync('output.md', markdown);
```
The gfm plugin from turndown-plugin-gfm adds table conversion — without it, Turndown will convert HTML tables to text without any table formatting.
Pandoc via a shell script for bulk conversion
```bash
#!/bin/bash
# Convert all .html files in current directory to .md
for file in *.html; do
pandoc "$file" -t gfm -o "${file%.html}.md"
echo "Converted: $file"
done
```
Add this to a Makefile or a CI pipeline step if you're ingesting external content on a schedule.
Python with markdownify
For Python-based pipelines, markdownify is the cleanest option:
```python
from markdownify import markdownify as md
html = open("page.html").read()
markdownoutput = md(html, headingstyle="ATX", bullets="-")
open("page.md", "w").write(markdown_output)
```
---
How Alee uses markdown in your knowledge brain
When you add content to Alee — whether it's a website URL, a PDF, a YouTube transcript, or pasted text — the platform processes it through an ingestion pipeline that normalizes formatting before chunking and embedding. You don't need to pre-format your content before uploading it. Alee handles the conversion internally using an LLM-backed pipeline.
That said, there are cases where uploading markdown-formatted content gives you better results:
- Structured FAQs: if your content is already in
## Question/ answer format, Alee's chunker can split on headings, keeping each Q&A pair as a coherent chunk rather than splitting mid-paragraph. - Code documentation: fenced code blocks are preserved in the embedding, which helps the chatbot give accurate syntax-specific answers.
- Numbered steps: an ordered list in markdown survives chunking as a unit better than a block of prose that just happens to mention "first," "second," "third."
If you're migrating a documentation site into Alee and want the highest-quality knowledge brain, running your HTML content through pandoc to produce clean GFM before uploading is worth the extra step. Explore Alee's features to see how the ingestion pipeline and knowledge brain work together.
---
Choosing the right tool: a quick decision guide
Use this checklist to pick the right approach for your situation:
- One document, rich text source (Word/Docs/Notion): export to markdown directly if available (Google Docs, Notion both support this); otherwise paste into Dillinger or StackEdit.
- One document, HTML source: paste into a Turndown demo or use
pandoc -t gfm. - Bulk HTML-to-markdown (content migration): write a Turndown or markdownify script, or use pandoc in a shell loop.
- Complex layouts or PDFs: pandoc first, then manual cleanup — no tool handles heavily formatted PDFs without errors.
- Ongoing/automated ingestion: build it into your pipeline with Turndown (Node.js) or markdownify (Python); add a lint step with
markdownlint. - Just learning markdown syntax: write it by hand in VS Code with the Markdown Preview extension open. There's no substitute for building the muscle memory.
For teams maintaining documentation at scale, check out the tutorials section for workflows that combine markdown linting, automated conversion, and version control. You can also compare Alee vs SiteGPT to see how structured markdown content affects chatbot quality across platforms.
---
Validating and linting your markdown output
Converting is only half the job. The other half is making sure the output is clean and consistent. A few tools that help:
- markdownlint (
npm install -g markdownlint-cli) — catches common issues: inconsistent heading levels, trailing spaces, missing blank lines, hard tabs, improper list indentation. Run it as a pre-commit hook or in CI. - remark — a pluggable markdown processor; use it to reformat, lint, and programmatically transform markdown files.
- VS Code Markdown extension — built-in, gives you live preview, outline view, and link checking.
- mdformat (Python) — opinionated formatter that normalizes markdown to a consistent style, similar to Black for Python.
A simple markdownlint check:
```bash
markdownlint */.md
```
Add a .markdownlint.json to your project to configure rules — for example, disabling the rule that limits line length (MD013) if you write long prose paragraphs:
```json
{
"MD013": false
}
```
---
Key takeaways
- Markdown is a lightweight markup language: plain text with a few extra characters that render as formatted output.
- Most text-to-markdown conversions take under a minute manually if you know the core syntax; tools handle bulk or complex cases.
- Pasting rich text (HTML, Word, Google Docs) into a markdown converter is faster than formatting by hand.
- Tables, code blocks, and nested lists are the three spots where manual conversion most often goes wrong.
- Alee accepts your content in plain text, markdown, PDFs, and URLs — it normalizes everything into its knowledge brain automatically.
- Validate your markdown before publishing; a missed backtick can break an entire code block.
- GFM (GitHub Flavored Markdown) is the right default for most modern docs, static sites, and chatbot knowledge bases.
---
Frequently asked questions
What's the fastest way to convert a Word document to markdown?
Open the document in Google Docs (upload via Drive), then use File → Download → Markdown (.md). This preserves headings, bold, italic, lists, and basic tables. For more complex documents or batch conversion, pandoc is more reliable: pandoc document.docx -t gfm -o document.md.
Why does my markdown not render correctly after converting from HTML?
The most common causes are: missing blank lines between block elements, incorrect list indentation, or special characters (like * and _) that markdown is interpreting as formatting. Run your output through markdownlint to catch systematic issues, and check the blank-line rules manually in any sections that look broken.
Is there a difference between plain text to markdown and rich text to markdown?
Yes. Plain text conversion is mostly about adding markup — inserting ## before headings, - before list items, ** around bold phrases. Rich text conversion (from HTML, Word, or Docs) is about stripping out one formatting system and replacing it with markdown syntax. Rich text conversion is usually easier to automate because the source already encodes structure; you're just translating it.
Can I use markdown conversion to feed content into a chatbot or AI tool?
Absolutely — and it's one of the most practical uses. Platforms like Alee ingest documents and convert them into a searchable knowledge brain. Uploading clean, well-structured markdown (rather than raw HTML or poorly formatted text) gives the chunking and embedding pipeline cleaner inputs, which typically produces better retrieval accuracy and more precise answers. See Alee's pricing for plans that include unlimited document ingestion.
Which markdown flavor should I use for documentation?
For most modern documentation, GitHub Flavored Markdown (GFM) is the right default. It's supported by GitHub, GitLab, most static site generators, and documentation platforms like Docusaurus, GitBook, and ReadMe. If you're using a specific generator (Hugo, Jekyll, Astro), check its docs — some have minor parser differences around tables and footnotes that matter in edge cases.
---
Ready to put your markdown content to work as a knowledge-powered chatbot? [Start free](/signup) — Alee turns your docs, website, and PDFs into an AI assistant your visitors can actually talk to.
See all plans and pricing · Explore the full feature set · Browse more guides
Build your own AI chatbot with Alee
Train it on your site, embed it anywhere, capture leads 24/7. Free to start.