Skip to main content

Tutorial: Azure Vision OCR Agent

This agent is a highly-efficient text recognition system designed to recognize and extract printed text from images and convert it into a machine-usable character stream.

RapidAPI Registration

RapidAPI provides a variety of endpoints for various purposes. Among these, the Microsoft Computer Vision endpoint is utilized by this agent.

Follow the following steps to get your Rapid API key:

  1. Go to https://rapidapi.com/
  2. If you do not have an account, proceed to create one by following the registration process.
  3. After completing the sign up process, you will be assigned a default API key that you can access here.
alt text
  1. Copy your API key using the copy icon. You can also click the eye icon to see your API key.
alt text

Microsoft Computer Vision Subscription

  1. Navigate to this link and go to the pricing tab.
alt text
  1. Select a subscription plan from the options available: Basic (free), Pro , or Ultra Plan. The Basic free plan offers 5,000 requests/month.
alt text
  1. Click the Subscribe button.
alt text
  1. Subscription created successfully.
alt text

Azure Vision OCR Agent Setup

  1. In SmythOS, go to Templates tab
alt text
  1. Find the Azure OCR Agent, hover over it, click the Remix button, and allow the template to initialize and configure.
alt text

LLM Prompt Setup

  1. In SmythOS, locate the LLM Prompt component then click on the gear icon to access its settings.
alt text
  1. Input your RapidAPI key in the Prompt section. Remember to click on the check mark icon to confirm and save your configuration.
alt text
  1. You’re all set!

Test the Agent

Follow the steps outlined below to evaluate the Azure Vision OCR Agent.

  1. Open the ChatBot embodiment and click on the chat icon.
alt text
  1. Ask the agent to analyze your image URL.

    1. Supported Extensions: .png or .jpeg or .jpg or .webp
    2. URL Format: https://www.example.com/image.png
  2. Let’s use the image below as an example.

    1. URL: https://i.stack.imgur.com/sFPWe.png
alt text
  1. Ask the agent to analyze your image.
    1. Prompt: Analyze this image https://i.stack.imgur.com/sFPWe.png
alt text
  1. Here’s the result!
alt text