Azure AI Document Intelligence can interpret photos and PDF scans of forms, extracting data for subsequent usage in data solutions. While several language SDKs are available, these services can also be accessed directly via the REST API. This lesson leads you through the REST API process.
Azure AI Document Intelligence (previously known as Form Recognizer) is a service that reads documents and forms. It employs machine learning to evaluate documents stored in various formats, like as JPEG and PDF, and extracts structured data from them.
To access these back-end services, we often utilize one of the available high-level language SDKs (C#, Java, Python, or JavaScript). However, the services can also be accessed directly using the underlying REST API.
This page gives an end-to-end tutorial for utilizing the REST API, including what I believe to be the most confusing/complex portion of the process: encoding supplied documents and embedding them in JSON payloads.
The following is an example of the documents used int his post:
Using Document Intelligence involves two steps:
The key components of a POST request are:
The Azure API Endpoint is the endpoint that is assigned when we build an Azure AI service through the portal or the az command line interface. Each Azure Service has its own globally unique endpoint.
The command attached to the URL specifies the action we want Document Intelligence to do. In this scenario, the command we'll use is analyze.
The API Version: As Azure REST endpoints evolve, they may bring breaking changes or change the behavior somewhat. By include an API Version in the URL, we may continue to use earlier API versions (for a limited period) while ensuring a reliable back-end response over time.
The document model name is Azure AI, which analyzes documents using a trained model. There are many pre-trained models available for common cases, but we frequently train a bespoke model for specific sorts of forms used in the workplace. In this example, we will utilize the pre-trained model prebuilt-tax.us.w2.
Ocp-Apim-Subscription-Key: Similar to other Azure services, a key can be used to authorize queries to a specific API endpoint. This key is included in the HTTP header.
Here's an example of a legitimate post in generic HTTP format:
When a legitimate request is delivered to Azure, a response confirming that the request was accepted and queued is returned. For example:
Next, we'll utilize the GET request to get the status of the data extraction from Document Intelligence, which hopefully was successful. To make the GET request, we must record the request id received via the apim-request-id response header.
To obtain the status and outcomes of the document intelligence process, we can utilize a GET request. The GET's prerequisites are as follows:
az CLI
.Ocp-Apim-Subscription-Key
request header.
When the request is successfully finished, the answer to the GET request will provide a success status, information about the model used, when the request was received and completed, and the results of the analysis.
If the request failed or is still queued when the status is requested, the payload will reflect these scenarios.
Success Payload Example
If the request is successful, the content will comprise the data extracted from the form(s) contained in the uploaded image.
An example of a W-2 response is shown below.
It should be noted that each field found in the form contains not only the value, but also the data type, confidence, and coordinates of the polygon where the field was found.
In an interactive application, the polygon coordinates can be used to draw bounding boxes around data elements found in the image.
When sending POST queries to the backend Azure AI Services, the document content is expected to be encoded from binary to Base64 ASCII.
It is also feasible to include a public URL with the document for analysis. If this scenario satisfies your requirements, it eliminates the need to load, encode, and send the file payload. Azure would retrieve the file contents from the URL you provided.
SDKs for Base64 encoding byte arrays to strings are often available within a program (such as a web or desktop app). When utilizing an API test tool, you may have access to automatic Base64 encoding methods (for example, Postman).
If you need to encode your own files, a convenient way to do it is to use the OpenSSL CLI to encode a file and then copy it to the clipboard (macOS example here):
Encode the file with the OpenSSL API
Once encoded, open the file and paste the value into your request. Alternatively, if using macOS, use the built-in pbcopy terminal command to copy the file contents to the clipboard.