Types of Voice Recognition

Explaining the Different Types of Voice Recognition

Voice recognition can be separated into various types depending on how capable it is.  Sometimes it is divided into different types based on the applied use of the voice recognition.  We will discuss both.

Types of Voice Recognition by Application:

Command Based Voice Recognition

Voice recognition used to audibly provide commands to a computing system is the simplest application of voice recognition functionality.

  It also requires the least amount of computing power since it does not do much.  

In command based voice recognition the voice recognition function understands a limited number of command words, usually a few dozen or so.  Speaking the specific command word activates that function.  This can be combined with identifiers that are added to the dictionary through a file structure or address book or the like to further allow commanding of specific computing assets. 

For example a command may be to “OPEN” a “PROGRAM”.  Those command words could then be used with given identifiers, such as the program name, to open a specific program.  Or when used with an address book the command may be “CALL” and the identifier may be the person’s “NAME”.  Both of which would be annunciated for hands free calling.

Discrete Voice Recognition/Isolated Word Voice Recognition

Discrete voice recognition is a voice recognition process where a pause must separate each word spoken.

  Voice recognition programs often have difficulty determining when a word starts and when it stops.  The pause, or lack of sound, provides that cue to the software.  

Discrete Voice Recognition is one of the simpler applications of voice recognition as it only performs voice to text conversion one word at a time.

  At least that is how it started out.  More advanced versions of Discrete voice recognition can handle converting a phrase or “sample” containing multiple words.  This is usually one sentence at a time with a pause in between.

Discrete voice recognition is typically used for dictation applications as it converts the words you say to text.  It serves as an accurate stenography service at the touch of a button.  For those of you that don’t remember the subject of stenography, dictation used to be recorded on a tape and sent to someone who would transcribe it onto paper (usually through a typewriter) manually.

Discrete Voice Recognition is also referred to as Isolated Word Voice Recognition, though that may be a misnomer for the more advanced versions of Discrete Voice Recognition.

Command based voice navigation is typically a discrete voice recognition as well as it requires a pause between each command or identifier.

Continuous Voice Recognition/Connected Words Voice Recognitions

Continuous voice recognition is a type of voice recognition that does not suffer from the start/stop issue discrete voice recognition does.  Continuous voice recognition can determine the beginning and ending of a word without the need for an audible break.


Continuous voice recognition is a complex form of voice recognition and requires much more processing power than discrete voice recognition.

Continuous voice recognition is also referred to as connected words voice recognition as it does not require any substantial pause between words.  Instead the speaker can speak more naturally and the voice recognition software will still understand where a word begins and where it ends.

Spontaneous Speech Voice Recognition/ Natural Speech Voice Recognition

Spontaneous speech voice recognition is a more advanced for of continuous voice recognition with the added ability to understand non-words.  All of those hums and aahs that people utter while they are mumbling though a spontaneous or non-rehearsed sentence can be understood by the spontaneous speech voice recognition software for exactly what it is, a non-word, and it can be omitted from the speech to text conversion or command line.

The benefit of spontaneous speech voice recognition is that you do not have to change they way you talk to interface with the computer.  Instead it understands you just the way you are.

Spontaneous speech voice recognition is also referred to as natural speech voice recognition as it supports the natural way you speak.

Speaker Dependent Voice Recognition

Speaker dependent voice recognition is a type of voice recognition that is dependent on the person speaking.  It requires training to become more accurate at speech to text conversion.  Training is often done through a series of speech to text conversion samples that are then corrected by the speaker.  The result is a voice recognition system that understands your specific accent and voice.

Speaker dependent voice recognition systems may be discrete or continuous voice recognition systems.

Speaker Independent Voice Recognition 

Speaker independent voice recognition is a type of voice recognition that is the opposite of speaker dependent voice recognition.  It does not require training by the speaker and it can understand speech from a wide variety of speakers.

Speaker independent voice recognition systems require more processing capability than speaker dependent systems and may not be as accurate with speech to text conversion as a well trained speaker dependent voice recognition system is.

Natural Language Voice Recognition

Natural language voice recognition is actually a misnomer.  It is an add on to continuous or spontaneous speech voice recognition in terms of response.  Natural language refers to the ability of the computer to understand a question or command said in a natural way, not the structured way a command voice recognition system may require.  It usually refers to the ability for the computer to respond to the speaker with natural language as well, almost as if you are carrying on a conversation.

Natural language voice recognition is what powers the modern virtual assistant like Siri or Alexia