This post addresses Azure AI Search's integrated vector embeddings for Retrieval Augmented Generation (RAG) when employing a big language model for generative AI Q&A solutions, as well as how to create a vector index with Azure AI Search.
When using the Retrieval Augmented Generation (RAG) pattern to Generative AI systems, the grounding data is examined to determine which information is most relevant to the user's query. The most relevant pieces of text are then placed to the user prompt, serving as a source of knowledge for the LLM to draw from when developing a response.
Keyword searches can be useful, but they may not catch the linguistic meaning in the user's inquiry. Vector embeddings are used to match user prompts with indexed content based on their meaning.
Azure AI Search Integrated Embeddings
The integrated embeddings feature incorporates the following two vector embedding algorithms into Azure AI Search as a talent, utilizing abilities given by Microsoft:
By integrating text splitting and vector calculations into AI Search, we can use the index configuration (rather than code) to implement a powerful and high-performance vector search data source to use with generative AI solutions.
While using integrated vector embedding support will meet the needs of many applications and use cases, there will still be some scenarios where building embeddings with code may be needed.
Azure AI Search integrated vector embeddings will meet a significant number of RAG requirements; however, there are certain restrictions to be mindful of.
This section will lead you through the steps of creating a vector embedding index and then connecting it with a Question/Answer application that uses Azure OpenAI GPT3.5-Turbo as the backend Large Language Model (LLM).
I'll create the indexes using the Azure CLI and REST interfaces. If you prefer, you may complete all of these configurations through the Azure Portal web UI.
To create an index that contains vectors, first we need to create a search service in Azure. AI Search Service is available in multiple tiers--from Free to very large. For this demo, we can use the Free tier since our capacity requirements are low.
Let's create the service using the Azure CLI:
To create the index and related objects in the search service, we'll need an admin key from the new service. We'll use the CLI to fetch the key:
The output from the admin-key show
command will be a hexadecimal string, which is added to the HTTP headers for REST calls in the next section.
To index files using an Azure AI Search Indexer, we must first construct a data source within the search service. The data source is just a mechanism to associate a name for external data with a connection string that allows the indexer to access the data.
In this case, the external data source is an Azure BLOB container containing a number of PDF files.
An index can be thought of as a database table, and in the case of a vector index, the columns include:
text-embedding-ada-002
model to create embeddings on our behalf, and the resulting vectors are stored in the index.
To create the index, we call the following REST call, with a JSON body describing the index details.
The Indexer crawls our PDF files, divides them into parts, and generates vector embeddings for each chunk using a set of instructions known as a skill. An indexer can have a variety of talents, which are collectively referred to as a skillset.
We create the Skillset using REST. This talent combines the knowledge of splitting files into pieces and creating vector embeddings for each chunk extracted from the file.
Each embedding is saved in the index, together with the chunk that created it. When users search the index, the index employs the same embedding logic used to embed prompts, and then selects the chunks that are most similar in meaning to the input prompt. By the end, its all math.
Creating the indexer is the final step in the vector index creation. An indexer functions similarly to a web crawler in a web search engine, reading each file in the source knowledge base, processing the data, and adding items to the index.
When using a vector embedding index, each file goes through the following steps:
Before moving on to using the index in our Generative AI application, let's look at the contents of the index to reinforce understanding of just what a vector index is:
We can query the vector index for some text by sending a POST /indexes/{index name}/docs/search
REST call to the search service with payload such as this:
The response will give a list of vectors that are semantically related to the query's input text. The following object represents a single match from the set of highest-ranking index matches.
To test the entire approach, we developed a Streamlit web app in Python. The app allows users to enter prompt text and receive a response from the OpenAI LLM.
The whole procedure that supports the user experience is described below: