Goldilocks Docs
Training

Uploading Files

Upload PDF, DOCX, TXT, and CSV files to your knowledge base

Instead of copying and pasting content, you can upload files directly to your knowledge base.

Supported File Types

FormatExtensionNotes
PDF.pdfText is extracted; images/scans not supported
Word.docxModern Word format (not .doc)
Text.txtPlain text files
Markdown.mdPlain text with Markdown formatting
CSV.csvEach row becomes a separate document

How to Upload Files

  1. Navigate to Training in the sidebar
  2. Click the + Add Content button
  3. Select the File tab in the dialog
  4. Click to browse or drag and drop files
  5. Files will upload and begin processing

Processing Files

After upload, files go through these stages:

  1. Uploading - File is transferred to Goldilocks
  2. Extracting - Text content is extracted from the file
  3. Processing - Content is chunked and embedded
  4. Active - Ready to use for AI responses

Processing time depends on file size:

  • Small files (< 10 pages): A few seconds
  • Medium files (10-50 pages): 10-30 seconds
  • Large files (50+ pages): 1-2 minutes

File-Specific Notes

PDF Files

  • Text-based PDFs work best
  • Scanned documents (images of text) are not supported
  • Tables may not extract perfectly -consider reformatting
  • Headers and footers are included in extraction

If your PDF has formatting issues after upload, try copying the text and creating a manual document instead.

Word Documents

  • Formatting (bold, italic) is stripped -only text is kept
  • Tables are converted to plain text
  • Images are not extracted
  • Use .docx format (not older .doc)

CSV Files

CSV files are treated specially:

  • Each row becomes a separate document
  • First row should be headers
  • Use for bulk importing FAQs or structured data

Example CSV structure:

title,content
"Return Policy","Our return policy allows returns within 30 days..."
"Shipping Info","We ship to all 50 states..."

Text Files

  • Simplest format -content is used as-is
  • UTF-8 encoding recommended
  • Line breaks are preserved

File and Content Limits

  • Maximum file size: 5MB per file
  • Maximum extracted content: 150,000 characters (~25,000 words) per document

We extract text only from your files; images increase file size but are not stored. Large image-heavy PDFs often exceed 5MB without adding useful content.

Content size limit

If a document extracts to more than 150,000 characters, you'll be asked to split it into smaller, topic-focused documents. This improves search quality and AI retrieval. For example, split a long handbook into separate documents per policy or product section.

Bulk Upload Tips

When uploading many files:

  1. Organize first - Use clear file names
  2. Check formatting - Preview text extraction in a tool first
  3. Start small - Upload a few files and verify quality
  4. Use CSV - For structured data, CSV is more reliable

After Upload

Once files are processed:

  1. Review the extracted content for accuracy
  2. Edit titles if auto-generated names aren't clear
  3. Remove any documents that didn't extract well
  4. Test with the Search feature

Updating Documents

To replace a document's content with a new file, open the document detail page and click Replace with file. Upload a new file; it will replace the existing content and re-process automatically.

Troubleshooting

"File type not supported"

Check that your file has the correct extension (.pdf, .docx, .txt, .csv, .md).

"File too large"

Maximum file size is 5MB. We extract text only, so large images inflate file size without adding content. Try:

  • Splitting into multiple files
  • Removing images/graphics from Word docs
  • Compressing PDFs

"Content too long"

Documents may not exceed 150,000 characters of extracted text. Split long content into topic-focused documents (e.g. one per policy or product section) for better search results.

Poor text extraction

If extracted text is garbled or missing:

  • PDF may be scanned (image-based) -not supported
  • Try a different export format
  • Copy/paste content manually instead