Calling Azure AI Document Intelligence Using the Rest API

Category : Azure AI & Azure Open AI
10-Jan-2024
Post Views : 2496

Azure AI Document Intelligence can interpret photos and PDF scans of forms, extracting data for subsequent usage in data solutions. While several language SDKs are available, these services can also be accessed directly via the REST API. This lesson leads you through the REST API process.

Azure AI Document Intelligence (previously known as Form Recognizer) is a service that reads documents and forms. It employs machine learning to evaluate documents stored in various formats, like as JPEG and PDF, and extracts structured data from them.

To access these back-end services, we often utilize one of the available high-level language SDKs (C#, Java, Python, or JavaScript). However, the services can also be accessed directly using the underlying REST API.

This page gives an end-to-end tutorial for utilizing the REST API, including what I believe to be the most confusing/complex portion of the process: encoding supplied documents and embedding them in JSON payloads.

Source Document

The following is an example of the documents used int his post:

Web Requests

Using Document Intelligence involves two steps:

POST submission: A document is sent to Azure AI using an HTTP POST request. The request provides the document as a base64-encoded body, and specifies the document model to use as well as the API Key (the key is used to authorize the request and record the transaction for billing purposes).
GET results: When a submission is sent to Azure AI, it's queued for batch processing. While the processing is typically fast (seconds), it's not immediate, and the requesting process must wait for the analysis to complete before retrieving results.

Post Request

The key components of a POST request are:

The Azure API Endpoint is the endpoint that is assigned when we build an Azure AI service through the portal or the az command line interface. Each Azure Service has its own globally unique endpoint.

The command attached to the URL specifies the action we want Document Intelligence to do. In this scenario, the command we'll use is analyze.

The API Version: As Azure REST endpoints evolve, they may bring breaking changes or change the behavior somewhat. By include an API Version in the URL, we may continue to use earlier API versions (for a limited period) while ensuring a reliable back-end response over time.

The document model name is Azure AI, which analyzes documents using a trained model. There are many pre-trained models available for common cases, but we frequently train a bespoke model for specific sorts of forms used in the workplace. In this example, we will utilize the pre-trained model prebuilt-tax.us.w2.

Ocp-Apim-Subscription-Key: Similar to other Azure services, a key can be used to authorize queries to a specific API endpoint. This key is included in the HTTP header.

POST request example

Here's an example of a legitimate post in generic HTTP format:

Post Response

When a legitimate request is delivered to Azure, a response confirming that the request was accepted and queued is returned. For example:

Next, we'll utilize the GET request to get the status of the data extraction from Document Intelligence, which hopefully was successful. To make the GET request, we must record the request id received via the apim-request-id response header.

Getting Results

To obtain the status and outcomes of the document intelligence process, we can utilize a GET request. The GET's prerequisites are as follows:

The Azure API Endpoint: this is the same endpoint assigned when we create an Azure AI service in the portal or via the az CLI.
The API Version: the GET request also requires the API key to be provided in the Ocp-Apim-Subscription-Key request header.
The request id. The request id provided by the POST response is passed to GET via the URL.

When the request is successfully finished, the answer to the GET request will provide a success status, information about the model used, when the request was received and completed, and the results of the analysis.

Response Example

If the request failed or is still queued when the status is requested, the payload will reflect these scenarios.

Success Payload Example

If the request is successful, the content will comprise the data extracted from the form(s) contained in the uploaded image.

An example of a W-2 response is shown below.

It should be noted that each field found in the form contains not only the value, but also the data type, confidence, and coordinates of the polygon where the field was found.

In an interactive application, the polygon coordinates can be used to draw bounding boxes around data elements found in the image.

Base64 Encoding

When sending POST queries to the backend Azure AI Services, the document content is expected to be encoded from binary to Base64 ASCII.

It is also feasible to include a public URL with the document for analysis. If this scenario satisfies your requirements, it eliminates the need to load, encode, and send the file payload. Azure would retrieve the file contents from the URL you provided.

SDKs for Base64 encoding byte arrays to strings are often available within a program (such as a web or desktop app). When utilizing an API test tool, you may have access to automatic Base64 encoding methods (for example, Postman).

If you need to encode your own files, a convenient way to do it is to use the OpenSSL CLI to encode a file and then copy it to the clipboard (macOS example here):

Encoding the File

Encode the file with the OpenSSL API

Once encoded, open the file and paste the value into your request. Alternatively, if using macOS, use the built-in pbcopy terminal command to copy the file contents to the clipboard.