Skip to main content

Custom Storage

Custom Storage lets you connect your own vector database, such as Pinecone, to handle document embeddings outside of SmythOS. This gives you full control over storage, scaling, and retrieval logic—especially in enterprise or multi-agent environments.

Why use Custom Storage?

Use Custom Storage if you need external access to your vector data, want to scale embedding limits, or plan to share the same data across multiple agents or systems.

How It Works

When Custom Storage is configured, all indexing operations from your Data Spaces write directly to the external vector store. During Retrieval-Augmented Generation (RAG), SmythOS queries that store for relevant context.

This setup is managed from the Data Pool, and supports both built-in RAG tools and direct API-based approaches.

Working with Pinecone in SmythOS

There are two ways to integrate Pinecone with SmythOS:

1. Use Built-in RAG Components

For most workflows, use these drag-and-drop components in Studio:

  • RAG Remember to write data into Pinecone
  • RAG Search to retrieve context during queries
  • RAG Forget to remove data from the index

This approach gives you a clean, user-friendly interface for managing vector data.

2. Use Direct API Access

For components that consume credits
  • An icon appears next to these components indicating they contain cost information
  • When clicked, the estimated cost is displayed in the details panel
  • This allows real-time monitoring of resource usage during development

Advanced users can use the API Call component to communicate directly with Pinecone’s REST API. This enables custom operations such as batch inserts, metadata filtering, or advanced querying.

To configure:

  • Add your Pinecone API key from the Vault
  • Use endpoint URLs from Pinecone's API reference
  • Pass headers, payloads, and parameters manually
Vector dimension must be 1536

SmythOS requires a vector dimension of 1536 to match its embedding model. Other dimensions are not supported.

Best Practices for Data Organization

Organizing your data into multiple Data Spaces can improve both relevance and performance during semantic search.

Use Multiple Data Spaces for Targeted Queries

Rather than placing all your content in a single index, create separate Data Spaces for different types of content:

  • contacts-data for contact records
  • product-specs for product metadata
  • support-docs for documentation and FAQs

This separation ensures that each agent or component searches only within the appropriate context.

Contextual Search with RAG Components

  • Configure the RAG Search component to point to the specific Data Space required for the task
  • Tailor the query to match the content type and format stored within that space

Benefits of This Approach

  • Higher precision by avoiding irrelevant results
  • Faster searches due to smaller index sizes
  • Simplified maintenance by updating only relevant sections
  • Smarter agents that specialize in specific domains
Why Data Spaces matter

Files within a Data Space are converted into vector embeddings. While you cannot target individual files, you can isolate content types by creating multiple, purpose-specific Data Spaces.

Pinecone Setup Steps

  1. Create an account at pinecone.io
  2. Set up an index with:
    • Dimension: 1536
    • Environment: e.g., us-west4-gcp
  3. From your Pinecone dashboard, copy:
    • API Key
    • Index Name
    • Host URL
    • Environment
  4. In SmythOS Studio:
    • Open Data Pool
    • Click Customize Storage
    • Paste the credentials into the configuration form
    • Save and activate
Custom Storage configuration form in SmythOS

Paste your Pinecone credentials and save to activate external storage.

When to Use Internal vs Custom Storage

Use CaseInternal StorageCustom Storage (Pinecone)
Quick testing and prototyping
Large-scale production use
Full control over indexing and cost
Shared access across multiple agents
Fastest performance inside SmythOS
Start simple, scale when needed

If you're unsure whether Custom Storage is necessary, start with the default internal setup. You can always switch to Pinecone later.

What's Next