ScrapingBee Integration with SmythOS
Need to extract data from any website? Connect ScrapingBee to SmythOS and let your agents handle complex web scraping tasks, including JavaScript rendering and proxy rotation, automatically.
List of ScrapingBee Components
Quickly compare ScrapingBee components by what they do and their key I/O. Click any component name to jump directly to its detailed guide.
Component | Action | What it Does | Inputs | Key Outputs | Use Case |
---|---|---|---|---|---|
Scrape Web Page (Basic) | Scrape | Extracts the raw HTML from a simple web page. | required url | Response | Fetching content from static websites. |
Scrape Web Page (Advanced) | Scrape | Scrapes complex sites with JS rendering, proxy, and other options. | required url optional country_code | Response | Extracting data from dynamic, single-page applications. |
Google Search Scrape | Scrape | Fetches structured Google Search results for a query. | required search optional language | organic_results | Monitoring SERP positions and competitor ads. |
Google News Scrape | Scrape | Fetches structured Google News results for a query. | required search optional language | news_results | Tracking brand mentions in the news. |
Google Maps Scrape | Scrape | Fetches structured local business data from Google Maps. | required search optional language | maps_results | Collecting lead data or business information. |
Prerequisites
Before you begin, please ensure you have the following:
- An active SmythOS account. (Sign up here).
- A ScrapingBee account.
- Your ScrapingBee API Key.
Getting Started With ScrapingBee
The connection between SmythOS and ScrapingBee is configured using a secure API key.
Step 1: Get Your ScrapingBee API Key
- Log in to your ScrapingBee Dashboard.
- Your API Key is displayed on the main dashboard page.
- Copy the API key to your clipboard.
Step 2: Store Your API Key in SmythOS Vault
Your API Key is a sensitive credential. Use the SmythOS Vault
to store it securely.
- In your SmythOS dashboard, navigate to the Vault.
- Create a new secret and paste your ScrapingBee API Key as the value. Give it a memorable name, like
scrapingbee_api_key
. - For more details, see the Vault Documentation.
Step 3: Configure a ScrapingBee Component
- In your SmythOS agent graph, drag and drop any ScrapingBee component.
- Click the component to open its Settings panel.
- In the
API Key
field, select the secret you saved in the Vault (e.g.,scrapingbee_api_key
). - Your connection is now configured for that component.
Which ScrapingBee Component Should I Use?
If you need to… | Target | Use this Component | Why this one? |
---|---|---|---|
Scrape a simple, static website | A URL | Scrape Web Page (Basic) | The most straightforward component for basic HTML extraction. |
Scrape a modern JavaScript-heavy site | A URL of a single-page app | Scrape Web Page (Advanced) | Offers essential features like JavaScript rendering and proxy rotation. |
Get structured Google search results | A search keyword | Google Search Scrape | Parses the SERP and returns clean data like organic results and ads. |
Gather local business information | A business name or category | Google Maps Scrape | Specifically designed to extract structured data from Google Maps listings. |
Track news articles about a topic | A search keyword | Google News Scrape | Focuses only on Google News and returns structured article data. |
Component Details
This section provides detailed information for each ScrapingBee component.
Basic: Scrape Web Page
Extracts the HTML content from a web page. Ideal for simple, static sites that do not require JavaScript rendering.
Inputs
Field | Type | Required | Notes |
---|---|---|---|
url | string | Yes | The full URL of the web page to scrape. |
Outputs
Field | Type | Description |
---|---|---|
Response | string | The raw HTML content of the scraped page. |
Headers | object | The HTTP headers from the API response. |
{
"component": "scrapingbee.basicScrapeWebPage",
"url": "[https://www.example.com/sitemap.xml](https://www.example.com/sitemap.xml)"
}
Advanced: Scrape Web Page
Scrapes web pages with advanced options, including JavaScript rendering, proxy management, and screenshot capabilities.
Component-Specific Settings
In the component's settings panel, you can configure several advanced options:
- Render JS:
true
/false
. Set totrue
for single-page applications. - Block Resources:
true
/false
. Block images/CSS to speed up requests (only applies when JS is rendered). - Premium Proxy:
true
/false
. Use for sites that are difficult to scrape. - Stealth Proxy:
true
/false
. Use for websites with the most advanced anti-bot measures. - Screenshots:
true
/false
. Capture a screenshot of the page. - Full-Page Screenshots:
true
/false
. Capture a screenshot of the entire page length.
Inputs
Field | Type | Required | Notes |
---|---|---|---|
url | string | Yes | The URL of the dynamic web page to scrape. |
country_code | string | Optional | Two-letter country code for a geographically specific proxy. Default: us . |
Outputs
Field | Type | Description |
---|---|---|
Response | string | The HTML content of the page after JavaScript has been rendered. |
Headers | object | The HTTP headers from the API response. |
{
"component": "scrapingbee.advancedScrapeWebPage",
"url": "[https://www.amazon.com/dp/B08P2H5L72](https://www.amazon.com/dp/B08P2H5L72)",
"country_code": "us"
}
Google Search Scrape
Performs a Google search and returns structured data from the Search Engine Results Page (SERP).
Inputs
Field | Type | Required | Notes |
---|---|---|---|
search | string | Yes | The keyword or phrase to search for. |
nb_results | integer | Optional | Number of results to return. Default: 100 . |
page | integer | Optional | Page number of search results. Default: 1 . |
language | string | Optional | Two-letter language code. Default: en . |
country_code | string | Optional | Two-letter country code for localized results. Default: us . |
Outputs
Field | Type | Description |
---|---|---|
organic_results | array | An array of objects, each representing an organic search result. |
meta_data | object | Contains metadata about the search, such as the full query URL. |
Response | object | The full, raw JSON response from the ScrapingBee API. |
Headers | object | The HTTP headers from the API response. |
{
"component": "scrapingbee.googleSearchScrape",
"search": "no-code agent builder",
"country_code": "gb"
}
Google News Scrape
Performs a search on Google News and returns structured article data.
Inputs
Field | Type | Required | Notes |
---|---|---|---|
search | string | Yes | The topic or keyword to search for in the news. |
nb_results | integer | Optional | Number of results to return. Default: 100 . |
page | integer | Optional | Page number of news results. Default: 1 . |
language | string | Optional | Two-letter language code. Default: en . |
country_code | string | Optional | Two-letter country code for localized news. Default: us . |
Outputs
Field | Type | Description |
---|---|---|
news_results | array | An array of objects, each representing a news article result. |
meta_data | object | Contains metadata about the search. |
Response | object | The full, raw JSON response from the ScrapingBee API. |
Headers | object | The HTTP headers from the API response. |
{
"component": "scrapingbee.googleNewsScrape",
"search": "SmythOS",
"language": "en"
}
Google Maps Scrape
Performs a search on Google Maps and returns structured data about local businesses and places.
Inputs
Field | Type | Required | Notes |
---|---|---|---|
search | string | Yes | The search query (e.g., "pizza in new york", "plumbers near me"). |
nb_results | integer | Optional | Number of results to return. Default: 100 . |
page | integer | Optional | Page number of map results. Default: 1 . |
language | string | Optional | Two-letter language code. Default: en . |
country_code | string | Optional | Two-letter country code for localized map results. Default: us . |
Outputs
Field | Type | Description |
---|---|---|
maps_results | array | An array of objects, each representing a local business listing. |
meta_data | object | Contains metadata about the search. |
Response | object | The full, raw JSON response from the ScrapingBee API. |
Headers | object | The HTTP headers from the API response. |
{
"component": "scrapingbee.googleMapsScrape",
"search": "HVAC services in Austin, TX",
"nb_results": 50
}
Best Practices & Advanced Tips
- Secure Your API Key: Always store your ScrapingBee API key in the SmythOS
Vault
. - Choose the Right Tool: Use the specialized Google scraping components when you need structured data from a SERP. Use the general "Scrape Web Page" components for all other websites.
- Enable JavaScript Rendering: For modern websites (SPAs), you must enable JavaScript rendering in the "Advanced: Scrape Web Page" component settings to get the full page content.
- Use Country Codes: To get geographically relevant results, especially for Google Search and Maps, always specify a
country_code
.
Troubleshooting Common Issues
-
Error:
401 Unauthorized
- Cause: The API key is missing, invalid, or disabled.
- Solution: Verify that the API key in your SmythOS Vault is correct and matches the one in your ScrapingBee dashboard.
-
Error:
402 Payment Required
- Cause: You have run out of API credits, or your subscription does not cover the feature you are trying to use (e.g., premium proxies).
- Solution: Check your credit balance and subscription plan in your ScrapingBee dashboard.
-
Empty Response or Blocked Message
- Cause: The target website is successfully blocking the request.
- Solution: For the "Advanced: Scrape Web Page" component, try enabling Premium Proxies or Stealth Proxies in the settings. For Google scrapes, this is handled automatically but can still occasionally fail.
-
Incomplete Data from a Dynamic Site
- Cause: JavaScript rendering was not enabled for a site that requires it.
- Solution: Use the "Advanced: Scrape Web Page" component and ensure the Render JS setting is toggled to
true
.
What's Next?
You're now ready to build powerful data extraction agents with the SmythOS ScrapingBee Integration!
Consider these ideas:
-
Build an Agent That...
- Scrapes product information from a list of e-commerce URLs and logs the name, price, and availability to a Google Sheet for price comparison.
- Performs a Google search for a topic, scrapes the top 5 organic results, and feeds the content into an AI component to generate a summary report.
- Scrapes Google Maps for a list of businesses in a specific category and location to build a lead list for your sales team.
-
Explore Other Integrations:
- Combine ScrapingBee with the Google Sheets Integration to store your scraped data in a structured format.
- Use scraped data to fuel workflows with AI components for analysis, summarization, or classification.
- Trigger Gmail or Slack alerts based on changes detected in the data you scrape (e.g., a price drop).