Skip to main content

ScrapingBee Integration with SmythOS

Need to extract data from any website? Connect ScrapingBee to SmythOS and let your agents handle complex web scraping tasks, including JavaScript rendering and proxy rotation, automatically.

TL;DR

Securely link your ScrapingBee account to SmythOS using an API key. Then, use our suite of scraping components to extract data from standard web pages, Google Search, Google News, and Google Maps.

List of ScrapingBee Components

Quickly compare ScrapingBee components by what they do and their key I/O. Click any component name to jump directly to its detailed guide.

ComponentActionWhat it DoesInputsKey OutputsUse Case
Scrape Web Page (Basic)ScrapeExtracts the raw HTML from a simple web page.required urlResponseFetching content from static websites.
Scrape Web Page (Advanced)ScrapeScrapes complex sites with JS rendering, proxy, and other options.required url
optional country_code
ResponseExtracting data from dynamic, single-page applications.
Google Search ScrapeScrapeFetches structured Google Search results for a query.required search
optional language
organic_resultsMonitoring SERP positions and competitor ads.
Google News ScrapeScrapeFetches structured Google News results for a query.required search
optional language
news_resultsTracking brand mentions in the news.
Google Maps ScrapeScrapeFetches structured local business data from Google Maps.required search
optional language
maps_resultsCollecting lead data or business information.
INFO
Why Integrate ScrapingBee with Your Agent?

ScrapingBee is a robust web scraping API that handles the complexities of modern web environments. Integrating it with SmythOS allows you to build powerful agents that can reliably extract data from virtually any source.

  • Automate Data Collection: Create agents that automatically extract product prices, stock levels, or real estate listings from e-commerce sites and marketplaces.
  • Power Market Research: Programmatically gather data from competitor websites, news outlets, or forums to analyze market trends and sentiment.
  • Enrich Your Datasets: Use agents to visit a list of URLs and scrape specific information (like contact details or social media links) to enhance your existing contact lists or databases.
  • Bypass Blocks: Leverage ScrapingBee's large proxy pool and advanced anti-bot detection systems to ensure your agents can access the data they need without being blocked.

Prerequisites

Before you begin, please ensure you have the following:

  • An active SmythOS account. (Sign up here).
  • A ScrapingBee account.
  • Your ScrapingBee API Key.

Getting Started With ScrapingBee

The connection between SmythOS and ScrapingBee is configured using a secure API key.

Step 1: Get Your ScrapingBee API Key

  1. Log in to your ScrapingBee Dashboard.
  2. Your API Key is displayed on the main dashboard page.
  3. Copy the API key to your clipboard.

Step 2: Store Your API Key in SmythOS Vault

Your API Key is a sensitive credential. Use the SmythOS Vault to store it securely.

  1. In your SmythOS dashboard, navigate to the Vault.
  2. Create a new secret and paste your ScrapingBee API Key as the value. Give it a memorable name, like scrapingbee_api_key.
  3. For more details, see the Vault Documentation.

Step 3: Configure a ScrapingBee Component

  1. In your SmythOS agent graph, drag and drop any ScrapingBee component.
  2. Click the component to open its Settings panel.
  3. In the API Key field, select the secret you saved in the Vault (e.g., scrapingbee_api_key).
  4. Your connection is now configured for that component.
Heads-up
You must add the API Key from the Vault to each ScrapingBee component you use. This ensures all connections are secure and properly authenticated.

Which ScrapingBee Component Should I Use?

If you need to…TargetUse this ComponentWhy this one?
Scrape a simple, static websiteA URLScrape Web Page (Basic)The most straightforward component for basic HTML extraction.
Scrape a modern JavaScript-heavy siteA URL of a single-page appScrape Web Page (Advanced)Offers essential features like JavaScript rendering and proxy rotation.
Get structured Google search resultsA search keywordGoogle Search ScrapeParses the SERP and returns clean data like organic results and ads.
Gather local business informationA business name or categoryGoogle Maps ScrapeSpecifically designed to extract structured data from Google Maps listings.
Track news articles about a topicA search keywordGoogle News ScrapeFocuses only on Google News and returns structured article data.

Component Details

This section provides detailed information for each ScrapingBee component.

Basic: Scrape Web Page

Extracts the HTML content from a web page. Ideal for simple, static sites that do not require JavaScript rendering.

INFO
This component requires an API Key for authentication, as detailed in the Getting Started section.

Inputs

FieldTypeRequiredNotes
urlstringYesThe full URL of the web page to scrape.

Outputs

FieldTypeDescription
ResponsestringThe raw HTML content of the scraped page.
HeadersobjectThe HTTP headers from the API response.
Use Case

An agent scrapes the sitemap of a website to get a list of all page URLs, which can then be fed into other components for further processing.

{
"component": "scrapingbee.basicScrapeWebPage",
"url": "[https://www.example.com/sitemap.xml](https://www.example.com/sitemap.xml)"
}
Limitations

This component does not render JavaScript. For dynamic websites (React, Vue, Angular), use the "Advanced: Scrape Web Page" component instead.

Advanced: Scrape Web Page

Scrapes web pages with advanced options, including JavaScript rendering, proxy management, and screenshot capabilities.

INFO
This component requires an API Key for authentication, as detailed in the Getting Started section.

Component-Specific Settings

In the component's settings panel, you can configure several advanced options:

  • Render JS: true/false. Set to true for single-page applications.
  • Block Resources: true/false. Block images/CSS to speed up requests (only applies when JS is rendered).
  • Premium Proxy: true/false. Use for sites that are difficult to scrape.
  • Stealth Proxy: true/false. Use for websites with the most advanced anti-bot measures.
  • Screenshots: true/false. Capture a screenshot of the page.
  • Full-Page Screenshots: true/false. Capture a screenshot of the entire page length.

Inputs

FieldTypeRequiredNotes
urlstringYesThe URL of the dynamic web page to scrape.
country_codestringOptionalTwo-letter country code for a geographically specific proxy. Default: us.

Outputs

FieldTypeDescription
ResponsestringThe HTML content of the page after JavaScript has been rendered.
HeadersobjectThe HTTP headers from the API response.
Use Case

Scrape product prices from an e-commerce site that loads prices dynamically with JavaScript.

{
"component": "scrapingbee.advancedScrapeWebPage",
"url": "[https://www.amazon.com/dp/B08P2H5L72](https://www.amazon.com/dp/B08P2H5L72)",
"country_code": "us"
}
API Credits

Using advanced features like JavaScript rendering or premium proxies will consume more API credits than a basic request.

Google Search Scrape

Performs a Google search and returns structured data from the Search Engine Results Page (SERP).

INFO
This component requires an API Key for authentication, as detailed in the Getting Started section.

Inputs

FieldTypeRequiredNotes
searchstringYesThe keyword or phrase to search for.
nb_resultsintegerOptionalNumber of results to return. Default: 100.
pageintegerOptionalPage number of search results. Default: 1.
languagestringOptionalTwo-letter language code. Default: en.
country_codestringOptionalTwo-letter country code for localized results. Default: us.

Outputs

FieldTypeDescription
organic_resultsarrayAn array of objects, each representing an organic search result.
meta_dataobjectContains metadata about the search, such as the full query URL.
ResponseobjectThe full, raw JSON response from the ScrapingBee API.
HeadersobjectThe HTTP headers from the API response.
Use Case

Track your website's daily ranking for a target keyword by searching for it and looping through the organic_results to find your domain.

{
"component": "scrapingbee.googleSearchScrape",
"search": "no-code agent builder",
"country_code": "gb"
}
Data Structure

The Response object contains more than just organic results, including ads, related questions, and other SERP features. Parse it directly for more in-depth analysis.

Google News Scrape

Performs a search on Google News and returns structured article data.

INFO
This component requires an API Key for authentication, as detailed in the Getting Started section.

Inputs

FieldTypeRequiredNotes
searchstringYesThe topic or keyword to search for in the news.
nb_resultsintegerOptionalNumber of results to return. Default: 100.
pageintegerOptionalPage number of news results. Default: 1.
languagestringOptionalTwo-letter language code. Default: en.
country_codestringOptionalTwo-letter country code for localized news. Default: us.

Outputs

FieldTypeDescription
news_resultsarrayAn array of objects, each representing a news article result.
meta_dataobjectContains metadata about the search.
ResponseobjectThe full, raw JSON response from the ScrapingBee API.
HeadersobjectThe HTTP headers from the API response.
Use Case

Set up a media monitoring agent to search for your brand's name every day and send a summary of new articles from the news_results to a Slack channel.

{
"component": "scrapingbee.googleNewsScrape",
"search": "SmythOS",
"language": "en"
}
Timeliness

This component scrapes live Google News results, making it ideal for tracking breaking stories and recent brand mentions.

Google Maps Scrape

Performs a search on Google Maps and returns structured data about local businesses and places.

INFO
This component requires an API Key for authentication, as detailed in the Getting Started section.

Inputs

FieldTypeRequiredNotes
searchstringYesThe search query (e.g., "pizza in new york", "plumbers near me").
nb_resultsintegerOptionalNumber of results to return. Default: 100.
pageintegerOptionalPage number of map results. Default: 1.
languagestringOptionalTwo-letter language code. Default: en.
country_codestringOptionalTwo-letter country code for localized map results. Default: us.

Outputs

FieldTypeDescription
maps_resultsarrayAn array of objects, each representing a local business listing.
meta_dataobjectContains metadata about the search.
ResponseobjectThe full, raw JSON response from the ScrapingBee API.
HeadersobjectThe HTTP headers from the API response.
Use Case

Build a lead generation agent that searches for a specific business type in a list of cities, extracts the business name and website from the maps_results, and logs them to a Google Sheet.

{
"component": "scrapingbee.googleMapsScrape",
"search": "HVAC services in Austin, TX",
"nb_results": 50
}
Data Richness

The maps_results output can contain rich data including address, phone number, rating, and reviews count, making it extremely valuable for local market research.

Best Practices & Advanced Tips

  • Secure Your API Key: Always store your ScrapingBee API key in the SmythOS Vault.
  • Choose the Right Tool: Use the specialized Google scraping components when you need structured data from a SERP. Use the general "Scrape Web Page" components for all other websites.
  • Enable JavaScript Rendering: For modern websites (SPAs), you must enable JavaScript rendering in the "Advanced: Scrape Web Page" component settings to get the full page content.
  • Use Country Codes: To get geographically relevant results, especially for Google Search and Maps, always specify a country_code.

Troubleshooting Common Issues

  • Error: 401 Unauthorized

    • Cause: The API key is missing, invalid, or disabled.
    • Solution: Verify that the API key in your SmythOS Vault is correct and matches the one in your ScrapingBee dashboard.
  • Error: 402 Payment Required

    • Cause: You have run out of API credits, or your subscription does not cover the feature you are trying to use (e.g., premium proxies).
    • Solution: Check your credit balance and subscription plan in your ScrapingBee dashboard.
  • Empty Response or Blocked Message

    • Cause: The target website is successfully blocking the request.
    • Solution: For the "Advanced: Scrape Web Page" component, try enabling Premium Proxies or Stealth Proxies in the settings. For Google scrapes, this is handled automatically but can still occasionally fail.
  • Incomplete Data from a Dynamic Site

    • Cause: JavaScript rendering was not enabled for a site that requires it.
    • Solution: Use the "Advanced: Scrape Web Page" component and ensure the Render JS setting is toggled to true.

What's Next?

You're now ready to build powerful data extraction agents with the SmythOS ScrapingBee Integration!

Consider these ideas:

  • Build an Agent That...

    • Scrapes product information from a list of e-commerce URLs and logs the name, price, and availability to a Google Sheet for price comparison.
    • Performs a Google search for a topic, scrapes the top 5 organic results, and feeds the content into an AI component to generate a summary report.
    • Scrapes Google Maps for a list of businesses in a specific category and location to build a lead list for your sales team.
  • Explore Other Integrations:

    • Combine ScrapingBee with the Google Sheets Integration to store your scraped data in a structured format.
    • Use scraped data to fuel workflows with AI components for analysis, summarization, or classification.
    • Trigger Gmail or Slack alerts based on changes detected in the data you scrape (e.g., a price drop).