Uploading Files
Upload PDF, DOCX, TXT, and CSV files to your knowledge base
Instead of copying and pasting content, you can upload files directly to your knowledge base.
Supported File Types
| Format | Extension | Notes |
|---|---|---|
.pdf | Text is extracted; images/scans not supported | |
| Word | .docx | Modern Word format (not .doc) |
| Text | .txt | Plain text files |
| Markdown | .md | Plain text with Markdown formatting |
| CSV | .csv | Each row becomes a separate document |
How to Upload Files
- Navigate to Training in the sidebar
- Click the + Add Content button
- Select the File tab in the dialog
- Click to browse or drag and drop files
- Files will upload and begin processing
Processing Files
After upload, files go through these stages:
- Uploading - File is transferred to Goldilocks
- Extracting - Text content is extracted from the file
- Processing - Content is chunked and embedded
- Active - Ready to use for AI responses
Processing time depends on file size:
- Small files (< 10 pages): A few seconds
- Medium files (10-50 pages): 10-30 seconds
- Large files (50+ pages): 1-2 minutes
File-Specific Notes
PDF Files
- Text-based PDFs work best
- Scanned documents (images of text) are not supported
- Tables may not extract perfectly -consider reformatting
- Headers and footers are included in extraction
If your PDF has formatting issues after upload, try copying the text and creating a manual document instead.
Word Documents
- Formatting (bold, italic) is stripped -only text is kept
- Tables are converted to plain text
- Images are not extracted
- Use
.docxformat (not older.doc)
CSV Files
CSV files are treated specially:
- Each row becomes a separate document
- First row should be headers
- Use for bulk importing FAQs or structured data
Example CSV structure:
title,content
"Return Policy","Our return policy allows returns within 30 days..."
"Shipping Info","We ship to all 50 states..."Text Files
- Simplest format -content is used as-is
- UTF-8 encoding recommended
- Line breaks are preserved
File and Content Limits
- Maximum file size: 5MB per file
- Maximum extracted content: 150,000 characters (~25,000 words) per document
We extract text only from your files; images increase file size but are not stored. Large image-heavy PDFs often exceed 5MB without adding useful content.
Content size limit
If a document extracts to more than 150,000 characters, you'll be asked to split it into smaller, topic-focused documents. This improves search quality and AI retrieval. For example, split a long handbook into separate documents per policy or product section.
Bulk Upload Tips
When uploading many files:
- Organize first - Use clear file names
- Check formatting - Preview text extraction in a tool first
- Start small - Upload a few files and verify quality
- Use CSV - For structured data, CSV is more reliable
After Upload
Once files are processed:
- Review the extracted content for accuracy
- Edit titles if auto-generated names aren't clear
- Remove any documents that didn't extract well
- Test with the Search feature
Updating Documents
To replace a document's content with a new file, open the document detail page and click Replace with file. Upload a new file; it will replace the existing content and re-process automatically.
Troubleshooting
"File type not supported"
Check that your file has the correct extension (.pdf, .docx, .txt, .csv, .md).
"File too large"
Maximum file size is 5MB. We extract text only, so large images inflate file size without adding content. Try:
- Splitting into multiple files
- Removing images/graphics from Word docs
- Compressing PDFs
"Content too long"
Documents may not exceed 150,000 characters of extracted text. Split long content into topic-focused documents (e.g. one per policy or product section) for better search results.
Poor text extraction
If extracted text is garbled or missing:
- PDF may be scanned (image-based) -not supported
- Try a different export format
- Copy/paste content manually instead