

An AI chatbot is only as good as the content you train it on. This sounds obvious — but almost every ecommerce store that deploys a chatbot and then complains it "gives wrong answers" or "sounds robotic" has the same root problem: the content fed into the knowledge base was wrong, incomplete, or formatted in a way the AI could not read properly.
The AI did not fail. The content did.
Most guides on this topic give you a generic list: "upload your FAQs, product pages, and return policy." That is true as far as it goes — but it leaves out six critical details: which content types matter most, what format each one needs to be in, how to write content that an AI can actually interpret correctly, what ecommerce-specific content most stores forget entirely, and what happens when each content type is done wrong.
This guide covers all of it — specifically for ecommerce stores, with Shopify context throughout. By the end, you will have a complete picture of what your chatbot training content should contain and exactly how to structure it so your AI chatbot gives accurate, helpful answers from day one.
Why Content Quality Is the Only Variable That Matters
Before the seven content types, it helps to understand why content — not the AI model — is what separates chatbots that genuinely help customers from ones that frustrate them.
Modern AI chatbots for ecommerce work through a method called retrieval-augmented generation (RAG). When a customer asks a question, the AI does not guess the answer from general internet knowledge — it searches through the content you have uploaded, finds the most relevant passages, and generates an answer based on what it found. This means:
If the content does not contain the answer, the AI cannot give it accurately
If the content is ambiguous, the AI will give an ambiguous answer
If the content contradicts itself across different documents, the AI may give different answers to the same question on different occasions
If the content is formatted in a way the AI cannot parse correctly — dense PDFs, heavily formatted tables, nested navigation menus — the retrieval fails silently and the AI falls back on generalities
This is why two stores using the identical chatbot platform can have completely different results. The platform is not the variable. The content is.
The good news: content quality is entirely within your control. Here is what that content should look like across the seven most important types.
1. Product Catalogue and Variant Data
This is the single most impactful content type for ecommerce chatbot performance, and the one most stores get wrong in the same way.
What it is
Your product catalogue includes everything your AI needs to answer product questions: product names, descriptions, variants (sizes, colours, materials, scents, flavours), specifications, compatibility information, use cases, care instructions, and in-stock status.
Why it matters more than any other content type
Across all categories of customer message, product questions are among the highest-volume and highest-intent contacts a store receives. A customer asking "Does this come in a size 14?" or "Is this compatible with iPhone 15?" is a customer who is ready to buy — if they get an immediate, accurate answer. If they do not, they leave.
Research consistently shows that product question abandonment — customers who had a question, did not get an answer, and left without purchasing — accounts for a significant share of lost ecommerce revenue. Chatbots trained on complete, accurate product data can recover the majority of this revenue by answering instantly, at any hour.
What most stores do wrong
The most common mistake is uploading the raw Shopify product export or the default product page HTML without any editing. Shopify product exports are optimised for import, not for AI reading — they contain HTML tags, Liquid variables, broken formatting, and truncated descriptions that AI cannot parse accurately.
The second most common mistake is incomplete variant data. If a product comes in six sizes and four colours, your training content needs to include meaningful information about each variant — not just "available in multiple sizes." When a customer asks "Does the black one run small?" the AI needs to know whether the black variant has different sizing from the others, not just that a black option exists.
How to format it correctly
For each product, create a clean text document structured as follows:
Product name and category (as customers describe it, not internal SKU codes)
What it is and who it is for (2–3 plain English sentences)
Key specifications (dimensions, materials, weight, capacity — whatever is relevant to your category)
Variants and what distinguishes them (size guide if relevant, colour differences if any affect function or appearance)
Compatibility or use case notes ("works with", "suitable for", "not recommended for")
Care and usage instructions
Stock status format — include a note on how to interpret in-stock vs. pre-order vs. out-of-stock language consistently
For stores with large catalogues, this does not mean writing a document for every single product. Group products by category and variant structure, and write shared context at the category level with product-specific details at the individual level.
The better your product content, the better your chatbot handles the pre-purchase questions that make or break conversions. This connects directly to automating pre-sales questions — one of the highest-ROI applications of a well-trained chatbot.
2. Structured FAQ Content
FAQ content is the backbone of most chatbot knowledge bases — and the type most stores write in a format that actively undermines AI performance.
What good FAQ content looks like for AI
There is a specific structural difference between FAQ content written for humans reading a webpage and FAQ content optimised for AI retrieval. Human FAQ pages are often written with short, telegraphic answers that assume the reader sees surrounding context. AI reads each question-answer pair in isolation — which means every answer must be self-contained.
Compare these two answers to "Can I return a sale item?":
Typical FAQ page answer (written for humans):
"Sale items are final sale unless faulty."
AI-optimised FAQ answer:
"Items purchased during a sale or at a discounted price are final sale and cannot be returned or exchanged unless the item arrives damaged or faulty. If a sale item arrives damaged, please contact us within 7 days of delivery with a photo and your order number, and we will arrange a replacement or refund."
The human version relies on the reader already understanding the return policy context. The AI version contains everything needed to give a complete, accurate answer regardless of what else it has or has not read.
The question-phrasing problem
Most FAQ pages list questions the way businesses think customers phrase them: "What is your return policy?" But customers contact your chatbot the way they actually speak: "Can I send this back?", "I want to return it", "How do I get a refund?", "The size is wrong, what do I do?"
Your FAQ content should include multiple phrasings for each question — not just the formal version. Add a "Customers may also ask this as:" section to each FAQ entry that lists three to five alternative phrasings. This dramatically improves how often the AI matches an incoming message to the correct FAQ response.
How many FAQs do you need?
Start by auditing your last 90 days of customer messages and extracting the top 30–40 question categories. Write a self-contained FAQ entry for each one. Then add question entries for the top 10 product-specific questions for your best-selling products. A knowledge base with 50–80 high-quality, correctly formatted FAQ entries consistently outperforms one with 200 thin, human-written FAQ page exports.
For the complete approach to building a knowledge base your chatbot can actually use, read our guide on knowledge base chatbots.
3. Store Policies — Written for AI, Not for Lawyers
Every ecommerce store has policies: returns and exchanges, shipping times, damaged goods, cancellations, discount terms, and more. Most stores write these policies in dense, legalistic language designed to protect the business from edge cases. That language is the worst possible format for AI training.
Why legal-language policies break chatbots
Legal policy language is designed for ambiguity protection — phrases like "at our sole discretion," "subject to conditions," and "may vary based on circumstances" are intentionally vague. When an AI is trained on this language, it generates equally vague answers. A customer asking "Can I cancel my order?" gets a response like "Order cancellation may be possible subject to fulfillment status" — which is technically accurate but completely unhelpful, and will likely generate another contact from the same customer asking for clarification.
The solution: policy translation documents
Keep your legal policies for your website Terms and Conditions page. For your chatbot knowledge base, create a separate set of "policy translation" documents — plain-language versions of each policy written as direct answers to the questions customers actually ask.
A policy translation document for returns might read:
Can I return an item?
Yes. You can return most items within 30 days of delivery for a full refund to your original payment method. Items must be unused and in their original packaging.
What cannot be returned?
Sale items, personalised products, and opened hygiene products (such as underwear or earrings) cannot be returned unless they are faulty.
How do I start a return?
Email [support email] with your order number and the reason for your return. We will send you a prepaid return label within 24 hours.
When will I receive my refund?
Refunds are processed within 3–5 business days of us receiving the returned item. You will receive an email confirmation when the refund has been issued.
This format — direct Q&A, plain language, specific timeframes — is what allows an AI to give a complete, accurate answer to any policy question on the first attempt.
Clear policies are also a critical part of reducing refunds with a chatbot — because most refund requests stem from customers who did not understand the policy, not customers who genuinely wanted to return.
4. Historical Chat Transcripts — Your Most Underused Asset
This is the content type that almost no competitor article mentions, and it is one of the most powerful inputs you can give an ecommerce AI chatbot. Your historical chat transcripts — the real conversations your support team has had with real customers — contain two things no other content type can provide: the actual language customers use, and the actual answers that successfully resolved their questions.
Why transcripts are uniquely valuable
Product descriptions are written by your marketing team. FAQs are written by your support manager. Policies are written by your legal team. None of these people writes the way your customers think and speak.
Chat transcripts capture exactly how customers phrase questions — the informal language, the abbreviations, the "I ordered the thing last Tuesday and it still hasn't arrived" messages that no FAQ page ever anticipates. When an AI is trained on transcripts alongside your formal content, it gets dramatically better at matching incoming messages to the right answer because it has seen real examples of how real customers express real needs.
Transcripts also show you which answers actually worked. If a support agent's response to a specific type of question consistently resolved it without follow-up, that response is a template for what your AI should say. If a response consistently generated follow-up questions, the AI should not replicate it.
How to prepare transcripts for training
Raw transcripts need curation before they go into your knowledge base. The process:
Export 3–6 months of resolved conversations from your helpdesk or live chat tool.
Filter for resolution quality. Include conversations where the customer's question was fully resolved without escalation or follow-up. Exclude conversations involving refunds, escalated complaints, or situations that required judgment calls — these are not appropriate for AI to replicate.
Group by topic. Cluster transcripts by the type of question: shipping, returns, product questions, discount codes, etc. This makes it easier to review quality and easier for the AI to retrieve relevant examples.
Redact personal information. Remove customer names, email addresses, order numbers (or replace with placeholders), and any personally identifiable information before uploading. This is a compliance requirement, not optional.
Convert to question-answer pairs. Do not upload raw transcripts. Extract the core question and the core successful answer and reformat as a clean Q&A pair. The surrounding chat noise — greetings, pleasantries, "let me check on that for you" — adds no training value and can confuse retrieval.
A library of 200–300 curated transcript-derived Q&A pairs, updated quarterly as new conversation patterns emerge, is one of the most reliable ways to keep your chatbot's answer quality high as your store evolves.
5. Shipping and Delivery Information
Shipping questions are the single highest-volume category of customer contact for most ecommerce stores — accounting for 35–45% of all inbound messages at scale. Yet shipping information is consistently the most poorly formatted content in chatbot knowledge bases.
What your shipping content must include
Most stores upload a single "Shipping Policy" page. That covers the basics but misses the specific, situational questions that generate the most support contacts. Your shipping content needs to be broken into distinct, separately retrievable sections:
Domestic shipping options and timelines — list each shipping method you offer (standard, express, next-day), the realistic delivery window for each (not the carrier's stated window — the actual window your customers experience), and the cutoff time for same-day dispatch. Be specific: "Orders placed before 2pm Monday to Friday are dispatched the same day. Orders placed after 2pm or on weekends are dispatched the next business day."
International shipping — if you ship internationally, create a separate section for each region or country group you ship to. Include realistic delivery windows, customs information for customers (who pays duties, how to track through customs), and any restrictions on what can be shipped to specific countries.
Carrier-specific tracking information — many customers ask "Why does my tracking say it's in [city] when I'm in [different city]?" The AI needs to know how to explain carrier routing — that packages often travel through distribution hubs before reaching the delivery address, and that this is normal. Without this content, the AI either cannot answer the question or gives an unhelpful "please check your tracking link" response.
Delay scenarios — what the AI should say when a package is delayed. This needs to be explicit: how many days constitutes a delay worth investigating, what the customer should do, and what the store will do. Vague content here produces vague answers that generate follow-up contacts.
Lost and damaged parcels — the exact process for reporting a lost parcel, the timeline for investigation, and when the store will issue a replacement or refund. This is one of the highest-stakes scenarios in post-purchase support — the AI's answer needs to be reassuring, specific, and action-oriented.
For the full picture on automating the post-purchase experience, see our guide on automating shipping notifications.
The cutoff date problem
Shipping information changes. Carrier partnerships change. Shipping rates change. Estimated delivery windows change seasonally. If your shipping content in the chatbot knowledge base is six months out of date, your chatbot is giving customers incorrect information — which is worse than no answer, because it creates false expectations that become complaints when reality does not match.
Build a quarterly review of all shipping content into your operational calendar. Treat it as maintenance, not a one-time task.
6. Brand Voice and Tone Guidelines
This content type does not answer customer questions — it controls how every answer sounds. And it is the most commonly skipped content type in ecommerce chatbot setups, which is why so many AI chatbots sound generic, robotic, or tonally inconsistent with the brand they represent.
Why tone matters more than you think
A customer who contacts a premium skincare brand and receives a response that sounds like a generic help desk has a brand experience that is worse than their expectations — even if the answer was technically correct. Brand voice is part of customer experience. An AI chatbot that consistently sounds like your brand — warm, expert, playful, or whatever your brand voice actually is — builds trust in a way that a tonally neutral bot never can.
Research on customer trust in AI interactions consistently shows that tone-consistent responses are rated more highly for helpfulness and trustworthiness than tonally neutral ones, even when the information content is identical.
What brand voice guidelines for AI training should include
Tone descriptors with examples — do not just say "we are friendly and approachable." Give examples. "We say 'Hi [name]!' not 'Dear Customer.' We say 'Got it — let me check on that for you' not 'Your inquiry has been received.'" Concrete examples are what the AI can actually learn from.
Words and phrases to use — a positive vocabulary list of words and sentence starters that reflect your brand. For a casual DTC brand this might include contractions, conversational phrasing, and occasional warmth. For a luxury brand it might mean precise, elevated language without exclamation marks.
Words and phrases to avoid — equally important. If your brand never uses corporate jargon like "please be advised" or "per your request," the AI needs to know. If you avoid using the word "unfortunately" because it sounds apologetic in a way that does not fit your brand, say so explicitly.
Response length norms — should the AI give short, punchy answers or more detailed explanations? Different brands have different norms. A technical electronics store may expect detailed, thorough answers. A fashion brand may want brevity with personality. Specify the expected response length range so the AI calibrates correctly.
Escalation language — what exact phrasing should the AI use when handing off to a human agent? "Let me connect you with someone from our team" reads differently than "I'll transfer you to a specialist now." Specify the handoff language so the transition from AI to human is smooth and on-brand.
Brand voice guidelines are what separate a chatbot that extends your brand from one that undermines it. If you have gone to the effort of building a recognisable brand voice across your website, emails, and social media, your chatbot should carry that voice too.
7. Escalation and Edge Case Rules
The final content type is the one that protects your customers from bad AI experiences when the AI reaches the limits of what it should handle alone. This is not content the AI uses to answer questions — it is content that tells the AI when not to answer, and what to do instead.
Why escalation rules are training content
Many store owners think of escalation as a platform feature — a button that routes to a human agent. But escalation quality is a training quality problem. An AI that has been given clear rules about which situations warrant escalation will escalate smoothly and appropriately. An AI without those rules will either attempt to resolve everything (including situations it should not) or escalate too frequently (making the AI layer pointless).
What your escalation rules content should define
Mandatory escalation triggers — list the specific situations where the AI must always hand off to a human, with no attempt at resolution:
Customer reports a lost parcel (after a threshold number of days)
Customer explicitly requests to speak to a human
Customer expresses significant distress, anger, or uses language indicating an urgent problem
Requests for exceptions to policy (refunds outside the return window, custom arrangements)
Any situation involving a potential legal claim, media threat, or public complaint
Payment disputes or chargeback queries
Graceful fallback language — what the AI should say when it does not know the answer and needs to escalate or acknowledge limitation. "I don't have that information right now — let me get someone from our team to help you directly" is far better than "I'm sorry, I don't understand your question" or, worse, an invented answer.
After-hours escalation handling — what should the AI say when a customer needs human help but the team is offline? Specify the exact response: acknowledge the issue, set an expectation for when they will be contacted, and offer any self-service options that might help in the meantime. This prevents customers from feeling abandoned when an after-hours conversation cannot be resolved immediately.
Sensitive topic handling — if your products or customer base means you sometimes receive messages on sensitive topics (health-related questions if you sell wellness products, allergy queries if you sell food, safety questions for children's products), define explicit rules for how the AI should handle these. The general rule: when in doubt, the AI should acknowledge, not advise, and escalate to a human.
Getting escalation rules right is part of the broader design of AI vs human support — knowing which situations belong to which tier is what makes the combined system work.
How AeroChat Uses These 7 Content Types Together
AeroChat is built around the principle that chatbot quality is a content quality problem — which is why the platform is designed to ingest, organise, and use all seven content types described above, rather than relying on a single uploaded document.
When you connect AeroChat to your Shopify store, it automatically pulls your product catalogue and keeps it current as your inventory changes — solving the variant data problem without manual updates. It then combines that live product data with the policy documents, FAQ content, shipping information, brand voice guidelines, and escalation rules you provide to build a layered knowledge base that covers both structured information and brand personality.
The result is a chatbot that does not just technically answer questions — it answers them in the right voice, with the right level of detail, and with the right judgment about when to resolve and when to escalate. For stores with large product catalogues and high message volume, this combination of live Shopify data and structured training content is what separates genuine resolution from generic deflection.
To see how this works in practice for getting your chatbot to answer any question, or to understand the broader best AI chatbot for Shopify landscape, those guides cover the next steps in detail.
Content Quality Checklist Before You Train
Before uploading any content to your chatbot knowledge base, run through this checklist:
Product data
All variants documented with meaningful distinguishing information (not just "available in multiple sizes")
Descriptions written in plain English — no HTML tags, no Liquid variables, no internal codes
Compatibility and "not suitable for" information included where relevant
Care instructions included for relevant product categories
FAQ content
Every answer is self-contained — it makes sense without surrounding context
Each FAQ entry includes at least three alternative phrasings of the question
Answers include specific details (timeframes, amounts, steps) not vague generalities
Top 40 customer question categories covered based on real message audit
Policies
Plain-language translation documents created — separate from legal policy pages
Every policy written as direct Q&A not paragraphs of prose
Specific timelines and conditions stated explicitly (not "may vary" or "at our discretion")
Chat transcripts
Only resolved, high-quality conversations included
All personal data redacted
Reformatted as clean Q&A pairs — no raw transcripts
Grouped by topic for easier review and retrieval
Shipping information
Domestic shipping options, timelines, and dispatch cutoffs documented
International shipping covered by region
Carrier routing explained for common tracking questions
Delay and lost parcel processes defined with specific steps and timelines
Last reviewed date noted — schedule quarterly update
Brand voice
Tone descriptors backed by concrete examples (not just adjectives)
Positive vocabulary list provided
Words and phrases to avoid listed explicitly
Response length norms specified
Escalation handoff language defined
Escalation rules
Mandatory escalation triggers listed specifically
Graceful fallback language written out
After-hours handling defined
Sensitive topic rules specified for your product category
The Takeaway
The gap between a chatbot that frustrates customers and one that genuinely helps them is almost never the AI model. It is the content. Stores that invest time in structuring these seven content types correctly — in the right format, at the right depth, with the right maintenance cadence — get dramatically better results from the same chatbot platform as stores that upload a quick FAQ export and hope for the best.
Start with your product data and your policy translation documents — these two content types have the highest impact on answer accuracy. Add structured FAQ content from a real message audit. Then layer in transcripts, shipping content, brand voice, and escalation rules as you refine the system over time.
The chatbot gets better as the content gets better. Both are entirely within your control. For the next step — deploying all of this in a chatbot that connects live to your Shopify data — read how AeroChat works for Shopify stores and how it handles repetitive customer questions automatically once the knowledge base is in place.