Voice Recognition System is something that has been dreamt about and worked on for decades. It has become a popular concept from past few years. From individuals to organizations, this technology is broadly used for various advantages it provides. In this post we will discuss about what is Voice Recognition System, how it works, it’s types, architecture, applications, advantages and disadvantages.

What is Voice Recognition System

Voice Recognition Technology is basically the task of identifying what is being uttered by a speaker in text form. The utterance can be an isolated word or sentence or may even be a paragraph. The algorithm implemented as a computer program converts a speech signal to a sequence of words.

Voice Recognition System

Fig. 1 – Introduction to Voice Recognition system

Digital Assistants such as Amazon’s Alexa, Google’s Google Assistant, Apple’s Siri and Microsoft’s Cortana are making a huge difference in daily life by changing the way people interact with their devices, homes, cars, and jobs. These technologies allow us to interact to a computer or device that interprets what we’re saying and respond to our question or command.

Fig.2 shows typical block diagram of Voice Recognition System where the input speech undergoes Acoustic Modeling where the speech is transformed in to statistical representations of Vectors which is computed from Voice signal. Then the speech (Word or Sentence) is searched and matched with the data in the system and outputs the Recognized Utterance.

What is Voice Recognition System

Fig. 2 – Typical Block Diagram of Speech (Voice) Recognition System

Types of Voice Recognition System

They are of two types:

  • Text Dependent Voice Recognition System
  • Text Independent Voice Recognition System

Text Dependent Voice Recognition System

These systems require the speaker to say a predetermined word or phrase (known as “Pass Phrase”). This Pass Phrase is then compared to an already captured sample.

Text Independent Voice Recognition System

These systems are trained to recognize a person without a Pass Phrase. But they require longer speech inputs from the speaker in order to identify vocal characteristics.

Architecture of Voice Recognition System

The architecture of the system consists of following modules:

  • Speech Capturing Device
  • Digital Signal Processor Module
  • Pre-processed Signal Storage
  • Reference Speech Patterns
  • Pattern Matching Algorithm

Architecture of Voice Recognition System

Fig. 3 – Architecture of Voice Recognition System

Speech Capturing Device

Speech Capturing Device is a microphone that converts sound waves into electrical signals and an Analog to Digital Converter (ADC) that digitizes the analog signals to obtain the data, that the computer can understand.

Digital Signal Module

This module performs processing on the raw speech signal like frequency domain conversion, restoring only the required information etc.

Pre-processed Signal Storage

This storage stores pre-processed Voice.

Reference Speech Patterns

The system consists of predefined Voice sample which is used as a Reference for matching.

Pattern Matching Algorithm

The unknown speech signal is compared with the Reference Speech Pattern to find the actual words or the pattern of words.

How does Voice Recognition System Work

This System works by recording a voice sample of a person’s speech through Speech Capture Device like Microphone. The Voice is nothing but analog signal is passed through noisy communication channel. Analog to Digital Converter (ADC) converts the analog signal into digital data by Sampling and Digitization process.

Then the system filters the unwanted noise and divides it into different frequency bands and normalizes the sound. This is done as the users do not always speak at the same speed and volume. Hence sound has to be adjusted to match with the templates that are pre-stored in the database of the system.

Working Principle of Voice Recognition

Fig. 4 – Working of Speech (Voice) Recognition System

For large vocabulary Speech Recognition like long Sentences,  is decomposed into sub-word sequence. This process is called Segmentation. This process is carried out on the signal where the signal is divided into segments and further processed by the Signal-Processing module that extracts Feature Vectors. These extracted Vectors form the input to the Decoder.

Acoustic Model, Pronunciation Model and Language Models are used by the Decoder to generate the word sequence which matches with the input Feature Vectors. Voice Recognition System use statistical modeling systems which use probability and mathematical functions to determine the most likely outcome.

The Speech Decoder decodes the acoustic signal X into a word sequence W*, which is close to the original word sequence W. It is represented by the equation of statistical Speech Recognition given by:

image

where;

image

Applications of Voice Recognition System

The applications of Voice Recognition Technology include:

Workplace

Applications of Speech Recognition System in the workplace include:

  • Search for documents or reports on your computer
  • Create tables or graphs using data
  • Print documents on request
  • Start video conferences
  • Schedule meetings
  • Make travel arrangements

Banking

Applications of Speech Recognition system in banking include:

  • Fetch information regarding your transactions, balance without having to open your cell phone
  • Make payments
  • Receive information about your transaction history

Marketing

Voice System has the potential to add a new way marketers reach their consumers. With speech recognition, there will be a new type of data available for marketers to analyze.

Healthcare

Applications of Speech Recognition System in healthcare include:

  • Quickly finding information from medical records
  • Workers can be reminded of instructions or processes
  • One can ask queries related to an disease from home
  • Less time inputting data
  • Improved workflows

Internet of Things (IoT)

One of the most important applications of Voice Recognition System in the internet of things is in cars. Examples of digital assistants applications in car are:

  • Listen to messages hands-free
  • Control your Radio
  • Assist with guidance and navigation
  • Respond to voice commands

Advantages of Voice Recognition System

The advantages of Voice Recognition Technology include:

  • Speech Recognition Technology is helping people by allowing people with disabilities to type and operate computers.
  • It is easy and fast.
  • This System is easy to use over the phone or other speaking devices and thus it is useful.
  • Speech Recognition System is quite reasonable.
  • Accidents while driving due to texting is very common. With Speech Recognition Technology, people will be able to write text and create email without diverting their eyes from road. Hence Automobile Safety is assured.

Disadvantages of Voice Recognition System

The disadvantages of Voice Recognition Technology include:

  • Lack of Accuracy and Misinterpretation- While Voice Recognition Technology recognizes most words in English language, it still struggles to recognize names and slang words. It also cannot differentiate between homophones such as “their” and “there”.
  • Time Costs and Productivity- No doubt technology can speed up process but in case of Voice Recognition System, user may have to invest more time than expected. Users have to review and edit to correct errors. Some programs adapt to your voice and speech patterns over time; this may slow down your workflow until the program is up to speed. You’ll also have to learn how to use the system.
Also Read:
Facial Recognition System - How it Works, Architecture & Applications
Analog to Digital Converter (ADC) - How it Works, Types, Applications
Test Equipment – Importance, How it Works, Types, Application, Precaution