Custom Storage
Custom Storage lets you connect your own vector database, such as Pinecone or Milvus, to handle document embeddings outside of SmythOS. This gives you full control over storage, scaling, and retrieval logic—especially in enterprise or multi-agent environments.
How It Works
When you connect a Custom Storage provider to a Data Space, all indexing operations write directly to your external vector database. During Retrieval-Augmented Generation (RAG), SmythOS queries that provider for relevant context.
This setup is managed from the Data Pool when creating or updating Data Spaces, and supports both built-in RAG tools and direct API-based approaches.
Supported Vector Database Providers
SmythOS Managed Pinecone (Default)
- Best for: Quick setups, testing, and standard RAG workflows
- Setup required: None
- Cost: Included with SmythOS plan
- Scaling: Managed by SmythOS
Pinecone (Your Own Account)
- Best for: Enterprise deployments, full control, existing Pinecone investments
- Setup required: API key, index name, endpoint
- Cost: Pinecone usage fees
- Scaling: Managed by you in Pinecone
Milvus
- Best for: Self-hosted infrastructure, open-source deployments, on-premises requirements
- Setup required: Server address, token, configuration
- Cost: Self-hosted infrastructure costs
- Scaling: Managed by you or your Milvus deployment
Setting Up Custom Providers
Creating a Provider Connection
Provider connections are created when you set up a new Data Space:
- Open the Data Pool
- Click Add Data Space
- In the Vector Database Provider dropdown, click Create New Connection
- Select your provider type: Pinecone or Milvus
- Enter your provider credentials (see below)
- Click Create Connection
The connection is now saved and available for current and future Data Spaces.
Pinecone Connection Details
To connect your own Pinecone account:
- Create an account or log in at pinecone.io
- Create an index with:
- Dimension: Must match your embedding model (typically
1536) - Metric:
cosine(recommended for semantic search)
- Dimension: Must match your embedding model (typically
- From your Pinecone dashboard, copy:
- API Key – Found in your project settings
- Index Name – Name of your created index
- In SmythOS, when creating a provider connection, enter:
- Connection Name – A label for this connection (e.g., "Production Pinecone")
- API Key – Your Pinecone API key
- Index Name – Your Pinecone index name
- Click Create Connection
Milvus Connection Details
To connect your Milvus instance:
- Set up or access your Milvus deployment (cloud or self-hosted)
- From your Milvus dashboard, gather:
- Server Address – The endpoint URL (e.g.,
https://milvus.example.com:19530) - Token – Your authentication token (if enabled)
- Server Address – The endpoint URL (e.g.,
- In SmythOS, when creating a provider connection, enter:
- Connection Name – A label for this connection (e.g., "Self-Hosted Milvus")
- Address – Your Milvus server address
- Token – Your Milvus authentication token
- Click Create Connection
Managing Multiple Provider Connections
You can create and organize multiple provider connections from the same service or account:
Example organization structure:
-
Development Environment
- Connection name: "Dev Pinecone"
- Pinecone index:
smythos-dev
-
Staging Environment
- Connection name: "Staging Pinecone"
- Pinecone index:
smythos-staging
-
Production Environment
- Connection name: "Prod Pinecone"
- Pinecone index:
smythos-prod
This allows you to:
- Keep environments isolated
- Test changes in lower environments before production
- Scale each environment independently
- Manage costs by environment
Working with Vector Data in SmythOS
There are two ways to interact with your custom storage:
1. Use Built-in RAG Components (Recommended)
For most workflows, use these drag-and-drop components in Studio:
RAG Remember– Write data into your vector databaseRAG Search– Retrieve context during queriesRAG Forget– Remove data from the index
This approach gives you a clean, user-friendly interface for managing vector data without worrying about API details.
2. Use Direct API Access (Advanced)
Advanced users can use the API Call component to communicate directly with your provider's REST API. This enables custom operations such as:
- Batch inserts or updates
- Metadata filtering
- Advanced querying
- Direct index management
To configure:
- Add your provider credentials from the Vault
- Use endpoint URLs from your provider's API reference:
- Pass headers, payloads, and parameters manually
Best Practices for Data Organization
Organizing your data into multiple Data Spaces can improve both relevance and performance during semantic search.
Use Multiple Data Spaces for Targeted Queries
Rather than placing all your content in a single index, create separate Data Spaces for different types of content:
contacts-data– Contact records and customer informationproduct-specs– Product metadata and specificationssupport-docs– Documentation and FAQslegal-documents– Contracts and compliance materials
This separation ensures that each agent or component searches only within the appropriate context.
Contextual Search with RAG Components
- Configure the
RAG Searchcomponent to point to the specific Data Space required for the task - Tailor the query to match the content type and format stored within that space
- Use consistent naming conventions for easy identification
Benefits of This Approach
- Higher precision by avoiding irrelevant results
- Faster searches due to smaller, focused indexes
- Simplified maintenance by updating only relevant sections
- Smarter agents that specialize in specific domains
- Better cost control by scaling individual spaces independently
Comparing Storage Options
| Aspect | SmythOS Managed | Your Pinecone | Milvus |
|---|---|---|---|
| Setup required | None | API key, index | Address, token |
| Hosting | SmythOS | Pinecone | Self-hosted or managed |
| Cost model | Included | Per-usage | Infrastructure |
| Full control | Limited | Full | Full |
| Shared access | Within SmythOS | Across tools | Across tools |
| Enterprise readiness | Yes | Yes | Yes |
| Best for | Testing, quick start | Production, scalability | Self-hosted, compliance |
Migration Between Providers
If you need to switch providers:
- Create a new Data Space with the target provider
- Re-upload your sources to the new space (or use batch import if available)
- Update agent connections to use the new Data Space
- Verify retrieval works as expected
- Delete the old Data Space once confirmed
Troubleshooting Custom Storage
Connection fails
- Pinecone: Verify API key is valid and index exists with correct name
- Milvus: Verify address is reachable and token is valid
- Check network/firewall settings if accessing remote instances
Indexing fails after connection
- Verify vector dimensions match between Data Space and your provider's index
- Check that credentials have appropriate permissions
- Review provider logs for specific error messages
Slow retrieval performance
- Consider reducing chunk size for faster, more granular searches
- Verify your provider has adequate resources
- Use multiple Data Spaces to reduce search scope
Cost concerns
- Review indexing frequency and data volume in your provider dashboard
- Consider consolidating less-used Data Spaces
- Use batch operations when possible to reduce API calls