The following tips and tricks provide information to increase your productivity or to facilitate your voice-application development tasks. They are listed in alphabetical order.
Barge-in is turned on by default. You may want to turn it off, when you do not want the caller to interrupt your prompt.
It is sometimes helpful to make the first prompt of an application have Barge-in off, to help the speech recognition system adjust itself to background noise levels without causing a false barge-in interrupting itself. If your voice application is being used in a noisy environment, it may also be helpful to turn off barge-in for the entire application.
The Listen element lets you repeat a caller's input. It can then be converted to a SpeakAs data type. Note the difference between the SpeakAs types Digit and Number:
Example:
■ Digit:
9488 =nine -four - eight - eight
■ Number:
9488 = nine thousand four hundred and eighty eight
There can only be one handler for an event in each scope. The compiler will halt with a fatal error if it finds more than one handler in the same scope. It is allowed (and often useful) to have handlers in different scopes for the same event.
Here are some suggestions for reading back information to the caller:
· Use SpeakAs libraries for most common information types
· When reading back data that uses numbers, you generally need to cover three conditions:
○ No objects:
You don’t have any emails
○ One object:
You have one email
○ Everything else:
You have <number> emails
You can use the condition attribute on Speak Items to achieve this, or use branching logic with the Route object.
Phrase length is a factor when writing grammars for speech-recognition applications. Longer phrases tend to be easier for a recognizer to handle than shorter phrases.
Example:
■ Easier to recognize:
more options
■ Harder to recognize:
help
When writing grammars for speech recognition applications or when you choose prompts or other alternatives during voice-application design, you should consider sound. Some sounds that are very difficult for recognizers to distinguish:
● The sounds sand fare two that are hard to tell apart on the phone.
Phone lines only pass sound between 400 and 3400 Hz. This makes it hard to distinguish some sounds in language because a part of their information is lost on the telephone line.
● The letters of the alphabet (that is, spelling words by saying their letters aloud) are harder for a recognizer to handle than spoken words.
Many of the letters of the alphabet are spoken with the same vowel and rhyme: b, c, d, e, g, t, and v all sound similar to a recognizer because of this.
Most phone connections today are digitized and sampled at 8 kHz. This results in lower quality sound. This does not apply necessarily apply to pure VOIP (Voice over IP) calls in certain scenarios, but any time a call interfaces with a wired or wireless telephony network or traditional telephones you will encounter this limitation. Skype, for example, provides better quality sound with a much higher sampling rate.