Whether we care to admit it or not, we all have an accent of some form. Vocal coaches tend to highlight the pronunciation and length of vowels as the key differentiator between accents, although education and ethnicity may also affect the enunciation of certain sounds. As a consequence, no two voices will ever sound identical. A fact that may pose a problem for chatbots.
Voice-based audiovisual and wearable technology is supposed to seamlessly understand what people are saying, whether the speaker is a Texan pensioner or a Nebraskan teenager. One oft-quoted report has identified 24 American English dialects, from San Francisco Urban to Pennsylvanian German-English. Then there are the equally diverse dialects of British English, Canadian English’s Scottish-tinged vowels, and a plethora of dialects courtesy of people speaking English as a foreign language.
Voice to voiceless
So how can the developers of chatbots ensure their that algorithmic software understands local accents? Clearly, no algorithm can accurately cater to every voice in America, let alone those of tourists and immigrants. The big tech firms have all developed their own conversational platforms to try and optimize natural-language processing – IBM’s Watson, Microsoft’s LUIS, and so forth. These commercially available tools harness immense processing power, supporting natural language user goals with machine learning updates and ongoing research. Yet none is entirely foolproof while harnessing them as a third party quickly becomes expensive.
If you’re looking to develop your own web speech API for chatbots, the following stages should maximize accessibility:
1. Restrict user inputs.
Remove freeform input fields like “tell us why you’re calling today”. Names are notoriously difficult to pronounce and spell, so identify callers using account or cell phone numbers instead. The tongue position and lip roundings used to say “yes” and “no” are very distinct among all English speakers, so restrict responses to these unambiguous inputs wherever possible.
2. Develop an extensive audio sample sound base for transcription.
Acquire accents from across America (and beyond), repeating common phrases like “I need to cancel my booking”. These should be expressed both colloquially and formally, since many people will use slang and jargon – “booking” rather than “reservation” or “appointment”, for instance. Make sure that your sound base isn’t skewed towards wealthier or more educated individuals since a chatbot ought to be universally accessible.
3. Focus beta testing on the opening sentence.
The first input field is the most significant in terms of successfully negotiating any interaction. Certain trigger words are of such importance, they deserve particular focus when beta testing the system. If an algorithm detects the word “complain”, for example, it should immediately redirect the inquiry to a human operator. Indeed, nothing prepares chatbots for real-world deployment more effectively than testing among colleagues, friends, and relatives.
4. Incorporate instructions which ask users to speak more slowly after two failed attempts.
People will probably be frustrated by this point, causing them to enunciate each word with deliberate articulacy. This simplifies the chatbot’s job, ensuring syllables and words don’t run into each other. After a third failed attempt, ensure the system redirects to an operator or advisor.