Create Generative AI Apps with Azure AI Search Integrated Vector Embeddings

This post addresses Azure AI Search's integrated vector embeddings for Retrieval Augmented Generation (RAG) when employing a big language model for generative AI Q&A solutions, as well as how to create a vector index with Azure AI Search.

When using the Retrieval Augmented Generation (RAG) pattern to Generative AI systems, the grounding data is examined to determine which information is most relevant to the user's query. The most relevant pieces of text are then placed to the user prompt, serving as a source of knowledge for the LLM to draw from when developing a response.

Retrieval Augmented Generation

Keyword searches can be useful, but they may not catch the linguistic meaning in the user's inquiry. Vector embeddings are used to match user prompts with indexed content based on their meaning.

Azure AI Search Integrated Embeddings

The integrated embeddings feature incorporates the following two vector embedding algorithms into Azure AI Search as a talent, utilizing abilities given by Microsoft:

  • Text Splitting- the process of breaking a large document (e.g. a PDF) into smaller chunks that can be represented by a fixed vector size, typically 1536 bytes.
  • Embedding - the mathematical process of analyzing a chunk of text and converting it to a vector (array) of floating-point numbers.

By integrating text splitting and vector calculations into AI Search, we can use the index configuration (rather than code) to implement a powerful and high-performance vector search data source to use with generative AI solutions.

While using integrated vector embedding support will meet the needs of many applications and use cases, there will still be some scenarios where building embeddings with code may be needed.

Service Limitations

Azure AI Search integrated vector embeddings will meet a significant number of RAG requirements; however, there are certain restrictions to be mindful of.

  • At this writing (November 2023), only text embeddings are supported by the integrating embedding skills.  Other media types (e.g. images and video) will need a different embedding solution --but the resulting vectors could still be added to Azure AI Search.
  • The documents to be indexed would need to be supported by the Azure AI Search Indexers. Common files such as PDF, Word, Text, etc., are supported..
  • Azure AI Search indexes and indexers have file size and vector size limitations depending on the service tier. Limitations and sizing maximums are documented by service tier.

Building a Solution

This section will lead you through the steps of creating a vector embedding index and then connecting it with a Question/Answer application that uses Azure OpenAI GPT3.5-Turbo as the backend Large Language Model (LLM).

I'll create the indexes using the Azure CLI and REST interfaces. If you prefer, you may complete all of these configurations through the Azure Portal web UI.

 

Create the Azure AI Search Service

To create an index that contains vectors, first we need to create a search service in Azure.  AI Search Service is available in multiple tiers--from Free to very large.  For this demo, we can use the Free tier since our capacity requirements are low.

Let's create the service using the Azure CLI:

 

 

To create the index and related objects in the search service, we'll need an admin key from the new service.  We'll use the CLI to fetch the key:

 

The output from the admin-key show command will be a hexadecimal string, which is added to the HTTP headers for REST calls in the next section.

Create the Data Source

To index files using an Azure AI Search Indexer, we must first construct a data source within the search service. The data source is just a mechanism to associate a name for external data with a connection string that allows the indexer to access the data.

In this case, the external data source is an Azure BLOB container containing a number of PDF files.

 

 

Create the Index

An index can be thought of as a database table, and in the case of a vector index, the columns include:

  • Chunk - as a PDF file is split into chunks, each chunk's raw text is stored inthe index.  Since we won't search the index with text, the chunk text becomes the result of the index search.
  • Vector - Azure AI Search will use OpenAI's text-embedding-ada-002 model to create embeddings on our behalf, and the resulting vectors are stored in the index.
  • Chunk Source - we store the source file for the chunk. This allows apps using the index to know what document they could offer to the user when asked for the original source of knowledge used to create the LLM response.

 

To create the index, we call the following REST call, with a JSON body describing the index details.

 

 

Create the Skillset

The Indexer crawls our PDF files, divides them into parts, and generates vector embeddings for each chunk using a set of instructions known as a skill. An indexer can have a variety of talents, which are collectively referred to as a skillset.

We create the Skillset using REST. This talent combines the knowledge of splitting files into pieces and creating vector embeddings for each chunk extracted from the file.

Each embedding is saved in the index, together with the chunk that created it. When users search the index, the index employs the same embedding logic used to embed prompts, and then selects the chunks that are most similar in meaning to the input prompt. By the end, its all math.

 

 

Create the Indexer

Creating the indexer is the final step in the vector index creation. An indexer functions similarly to a web crawler in a web search engine, reading each file in the source knowledge base, processing the data, and adding items to the index.

When using a vector embedding index, each file goes through the following steps:

  1. Read a source file (PDF)
  2. Split the PDF into chunks of some maximum size (size depends on how much text can be processed in to the fixed embedding size).
  3. Create a vector for each chunk
  4. Create an entry in the index for each chunk/vector pair

 

 

Examine the Vector Index

Before moving on to using the index in our Generative AI application, let's look at the contents of the index to reinforce understanding of just what a vector index is:

We can query the vector index for some text by sending a POST /indexes/{index name}/docs/search REST call to the search service with payload such as this:

 

 

The response will give a list of vectors that are semantically related to the query's input text. The following object represents a single match from the set of highest-ranking index matches.

 

 

Run the Demo App

 

To test the entire approach, we developed a Streamlit web app in Python. The app allows users to enter prompt text and receive a response from the OpenAI LLM.

The whole procedure that supports the user experience is described below:

  1. User enters text prompt and presses the "Ask" button
  2. The Streamlit app sends the user prompt to the Azure OpenAI service, along with a request that the service use the index we created above to search for PDF chunks that contain knowledge to answer the user prompt.
  3. Azure OpenAI service uses the embedding model to create a vector embedding of the user prompt.
  4. The OpenAI Service queries the Azure AI Search index for chunks of text close in meaning to the user prompt (this is a vector search that produces text as a n output).
  5. Azure OpenAI Service adds the chunk text to the prompt ("using this information, answer this user prompt").
  6. Azure Open AI Service returns the LLM response to the application, which displays it for the user.
  7. The application also receives the filename of the file(s) that were used to answer the question, and presents a button the user can press to open the original file.