Azure Applied AI Services : Computer Vision and NLP Workloads on Azure

Computer Vision on Azure is the use of Azure AI Services and associated tools to analyze and comprehend visual content such as photographs and videos. The goal is for computers to comprehend and extract useful information from visual data.

Azure AI Services

  • Azure AI Services analyzes and understands images and videos.
  • It provides features like object detection, image recognition, image tagging, and OCR.
  • Developers can extract insights from visual data and perform tasks like detecting objects and extracting text from images.
  • Computer Vision simplifies the development of applications with advanced visual analysis capabilities on Azure.

Custom Vision



  • Azure AI Services for creating custom image classification and object detection models.

  • It offers a user-friendly interface for training and deploying models tailored to specific domains or tasks.

  • Users can train models using their own labeled datasets and easily integrate them into their applications.


Azure AI Document Intelligence


  • Azure service for extracting text, key/value pairs, and table data from structured documents.

  • It is specifically designed for forms, invoices, receipts, and other structured documents.

  • Azure AI Document Intelligence utilizes machine learning algorithms to automatically analyze and extract structured data from scanned documents.

  • It simplifies data extraction processes and enables automation in document processing workflows.

Face Service



  • Azure Cognitive Services for facial detection, recognition, and analysis.
  • The Face service provides features like facial recognition, emotion detection, and face verification.

  • Developers can utilize the Face service to build applications that analyze and understand facial characteristics.


Natural Language Processing (NLP) on Azure uses Azure AI Services and related tools to analyze, comprehend, and generate natural language content. NLP aims to help computers understand and process human language in a meaningful way.

  • Azure Bot Service

  • Platform for building, deploying, and managing chatbots.

  • Supports the development of intelligent chatbots which understand natural language and engage in dialogues.

  • Integrates with NLP services to enable natural language understanding and dialog management.

  • Facilitates messaging platforms, allowing chatbots to be deployed across various channels.


Language Understanding (LUIS)


  • Enables developers to incorporate language understanding capabilities into their applications.

  • Allows developers to create custom models for intent recognition and entity extraction.

  • This helps in interpreting and understanding user input/intents, enhancing the application’s ability to process and respond to natural language queries.
  • Text Analytics


  • An Azure service that utilizes natural language processing techniques.

  • It allows developers to extract valuable insights from text data.

  • To gain a deeper understanding from textual data and automate text processing tasks for efficient analysis.

  • Text Analytics capabilities

    • Entity recognition – Identifies and classifies named entities in text.

    • Key phrase extraction – Identifies essential words and phrases that summarize the content of the text.

      • It involves identifying and extracting the most important and relevant phrases or terms from a given text.

  • Sentiment analysis determines the emotional tone and attitudes expressed in a piece of text.
    Pre-trained model service
  • Sentiment analysis and opinion mining are two ways of detecting positive and negative sentiment.

    • Using sentiment analysis, you can get sentiment labels (such as “negative”, "neutral,” and “positive”) and confidence scores at the sentence and document-level.

    • Opinion Mining provides granular information about the opinions related to words (such as the attributes of products or services) in the text.


  • Language detection: it Identifies the language in which a piece of text is written.
  • Speech

      • Also known as text-to-speech, this is the process of creating artificial speech from text. It’s used in a variety of applications, including reading for the visually impaired, voiceovers in videos, and as the ‘voice’ of AI personal assistants like Alexa.

  • Speech Recognition

      • Technology that converts spoken language into written text. It’s used in voice-activated virtual assistants, voice-to-text processing software, in-car systems, and more.

      • Speech Synthesis Markup Language (SSML) is an XML-based markup language that you can use to fine-tune your text-to-speech output attributes such as pitch, pronunciation, speaking rate, volume, and more. It gives you more control and flexibility than plain text input.

        • The Audio Content Creation tool lets you author plain text and SSML in Speech Studio.

        • The Batch synthesis API accepts SSML via the inputs property.

        • The Speech CLI accepts SSML via the spx synthesize –ssml SSML command line argument.

        • The Speech SDK accepts SSML via the “speak” SSML method across the different supported languages.


Common Artificial Intelligence Method

  • Decision Trees

    • It classifies into pre-defined categories based on learned rules from labeled training data.
  • Predicting

    • The process of using a trained model to make a prediction about unseen or future data based on the patterns the model learned from the training data.
  • Clustering

    • It segregates groups with similar traits and assigns them into clusters.