An Introduction to Continuous Speech Recognition Software

Stated in the simplest terms, continuous speech recognition (SR) software takes voice input through the user's microphone and uses this input to "type" words into a document displayed on the computer screen. (This document may be saved and used just like any other file.) It is called "continuous" because the user is expected to dictate in a conversational manner, uttering entire phrases with brief pauses between. This is different than "discreet" speech recognition software, where words must be uttered individually.

The primary use of SR is word processing. Producers of SR make varying claims of compatibility of SR with other types of applications, including popular productivity application suites, web browsers and email clients. Users report varying degrees of success when using SR with other applications, finding that certain SR products handle certain situations in compatible applications better than others.

Most SR packages are intended to run on Microsoft's Windows operating system and include their own "native" word processing application that resembles Microsoft's Word Pad application (included with Windows) with extra menus and toolbar buttons added for accessing the unique features and functions of the SR application. SR packages are also available for Mac and Linux operating systems.

Users must go through a process of "enrollment" before being able to dictate a document in the word processor. This process "trains" the SR software to recognize the user's voice and speech. Several paragraphs of text displayed on the screen by the SR software must be read by the user with the microphone on. This process may take 15-30 minutes or more. Because the SR software is recording the sounds users make when pronouncing words during the enrollment process, care should taken to enunciate all words precisely. Once the enrollment process is complete, the user may begin dictation sessions.

The microphone is part of a headset. A flexible, adjustable band goes over the top of the head and supports a boom that holds the microphone. The boom is usually flexible so that the microphone may be properly positioned close to the mouth but to one side so that inadvertent noises, such as the "pop" when using words with the letter P, are reduced. The headset keeps the microphone in the proper position while allowing the user some degree of movement.

The function of SR software may be divided into three broad areas: dictation, correction and editing, and command and control. Some SR products perform tasks within these areas better than others, and the extent to which users may utilize functions and features within these areas exclusively by voice varies from one SR product to another. All will require some keyboard or mouse input to accomplish certain tasks - none are 100% hands-free.

During a dictation session, users simply say what they wish to appear in the document. Inadvertent utterances are to be avoided, as the SR software will attempt to interpret these and insert them into the document. It cannot "guess" what the user meant to say and meant not to say. Recognition errors will occur. Users play an extremely important roll in preventing misrecognitions by enunciating each word with precision. This doesn't mean that users must dictate more slowly, loudly, or one word at a time. In fact, any of these would be counter-productive. What it does mean is that users will get better recognition accuracy if they concentrate on pronouncing every syllable of every word clearly, without any slurring or laziness.

Correction is the process of replacing a misrecognized word with the intended word. Each SR product has a specific procedure for accomplishing this task. In contrast, when users change their mind about which words they would like to use in their document, replacing existing words with new words is editing. Each SR product also has a specific procedure for editing.

Command and control includes using menus, changing settings in dialog boxes and similar tasks. SR products vary in their ability to accomplish these tasks with voice input. Keystrokes or mouse movements and clicks may be required.

There are some issues that are unique to the use of SR software. Although the vocabularies of SR applications are very large, an unusual proper noun may require the user to take the extra step of adding it to the vocabulary for better future recognition. Homonyms and acronyms may also present unique challenges.

SR software should not be installed on marginal systems. A minimum system for satisfactory use with the native word processing application would be a processor of not less than 600 MHz, RAM not less than 256 MB, and audio components of high quality. If the intended use includes using SR within other applications, a more powerful computer will be required. Given good hardware, users who develop good enunciation habits should have satisfactory experiences with SR software.