Training
With this feature, your chatbot can undergo training using texts from your website or PDF documents. Once the training is successfully completed, the chatbot will be able to answer questions related to your content.
Initiating Chatbot Training
- Input Content URLs:
- Go to Settings > Artificial Intelligence > Open AI - Training Sources.
- Enter URLs for websites, text or PDF files, or XML sitemaps.
- PDF and text files can also be uploaded directly from Settings > Artificial Intelligence > OpenAI - Training Sources - PDF and Text Files.
- Start Training:
- Once the sources are set, click the Train your chatbot button.
- Wait for the training process to complete.
Managing Training Content
- Personalized Q&A: Add and manage custom questions and answers from Settings > Artificial Intelligence > OpenAI - Questions and Answers or from the chatbot training window.
- File Formats: Only PDF and TXT formats are supported for uploads.
- Website Crawling: Provide the website URL to include and crawl all child URLs. For large websites, using an XML sitemap is more efficient and less prone to errors.
- Create an XML sitemap with a service like xml-sitemaps.com.
- Edit the sitemap to include only specific pages if needed.
- Upload the sitemap to your server or an external online location, then add the sitemap URL in Settings > Artificial Intelligence > OpenAI - Training Sources.
- Use services like tmpfiles.org to upload large files and sitemaps.
Multilingual Training
- Language-Specific Training: If training a multi-language website, limit the chatbot to retrieve answers only from pages in the user's language by enabling Settings > Artificial Intelligence > OpenAI > Multilingual Training Sources. Ensure the
<html>
tag contains the lang
attribute for language detection.
Additional Information
- Training Files: Uploaded files are removed after training. To retrain the chatbot, upload all necessary documents again.
- Adding New Sources: Retrain the chatbot to add new sources without losing previous training data.
- Automatic Sources: AIX articles and conversations are used as training sources automatically. Training occurs every 24 hours via a cron job, using only user and agent messages, not chatbot messages.
- Character Limits: Refer to your plan for Character limits. Generally the free plan allows up to 10,000 characters and the Starter plan allows up to 500,000 characters.
- Embedding Model: The text-embedding-3-small model is used for training and handling user messages. This model cannot be disabled or changed. Pricing information is available on OpenAI's pricing page.
- Response Links: OpenAI responses can include links to the corresponding website pages where the answers were sourced.
- Deleting Training Data: Click Delete training to remove all previous training data for the chatbot.
- Embeddings Storage: Embeddings are stored as JSON files in the AIX uploads folder and are secured using a password-by-filename approach.