Uploading Files

Instead of copying and pasting content, you can upload files directly to your knowledge base.

Supported File Types

Format	Extension	Notes
PDF	`.pdf`	Text is extracted; images/scans not supported
Word	`.docx`	Modern Word format (not `.doc`)
Text	`.txt`	Plain text files
Markdown	`.md`	Plain text with Markdown formatting
CSV	`.csv`	Each row becomes a separate document

How to Upload Files

Navigate to Training in the sidebar
Click the + Add Content button
Select the File tab in the dialog
Click to browse or drag and drop files
Files will upload and begin processing

Processing Files

After upload, files go through these stages:

Uploading - File is transferred to Goldilocks
Extracting - Text content is extracted from the file
Processing - Content is chunked and embedded
Active - Ready to use for AI responses

Processing time depends on file size:

Small files (< 10 pages): A few seconds
Medium files (10-50 pages): 10-30 seconds
Large files (50+ pages): 1-2 minutes

File-Specific Notes

PDF Files

Text-based PDFs work best
Scanned documents (images of text) are not supported
Tables may not extract perfectly -consider reformatting
Headers and footers are included in extraction

If your PDF has formatting issues after upload, try copying the text and creating a manual document instead.

Word Documents

Formatting (bold, italic) is stripped -only text is kept
Tables are converted to plain text
Images are not extracted
Use .docx format (not older .doc)

CSV Files

CSV files are treated specially:

Each row becomes a separate document
First row should be headers
Use for bulk importing FAQs or structured data

Example CSV structure:

title,content
"Return Policy","Our return policy allows returns within 30 days..."
"Shipping Info","We ship to all 50 states..."

Text Files

Simplest format -content is used as-is
UTF-8 encoding recommended
Line breaks are preserved

File and Content Limits

Maximum file size: 5MB per file
Maximum extracted content: 150,000 characters (~25,000 words) per document

We extract text only from your files; images increase file size but are not stored. Large image-heavy PDFs often exceed 5MB without adding useful content.

If a document extracts to more than 150,000 characters, you'll be asked to split it into smaller, topic-focused documents. This improves search quality and AI retrieval. For example, split a long handbook into separate documents per policy or product section.

Bulk Upload Tips

When uploading many files:

Organize first - Use clear file names
Check formatting - Preview text extraction in a tool first
Start small - Upload a few files and verify quality
Use CSV - For structured data, CSV is more reliable

After Upload

Once files are processed:

Review the extracted content for accuracy
Edit titles if auto-generated names aren't clear
Remove any documents that didn't extract well
Test with the Search feature

Splitting into multiple files
Removing images/graphics from Word docs
Compressing PDFs

"Content too long"

Documents may not exceed 150,000 characters of extracted text. Split long content into topic-focused documents (e.g. one per policy or product section) for better search results.

Poor text extraction

If extracted text is garbled or missing:

PDF may be scanned (image-based) -not supported
Try a different export format
Copy/paste content manually instead