eightolives - Trains Speech Recognition

eightolives' Trains Speech Recognition Experiment

The Speech Recognition Experiment is trying to ascertain if voice commands can be usefully applied to the eightolives Trains tool. The tool's display of schedules, status, alerts, maps, menus and touch-sensitive areas may be too busy for many people. Speech synthesis, already applied in Audio Mode, uses a floating 5 button interface to simplify use and aid the sight-impared.

Speech Recognition has the potential to further simplify Audio Mode's menu navigating and search capabilities. A user yelling at his phone may not be appreciated in "quiet cars" or buses, but a richer spoken command set can potentially quickly acheive objectives.

The state of the art

Eightolives Trains is classified as a Progressive Web App (PWA) which is an interactive web page that can run on any browser. Speech Recognition for web browsers has been standardized, but is not yet implemented on all browsers. The standard is mostly implemented on the Chrome browser, and on the iOS platform (ios14+). It does not work when the platform is offline (speech data is sent to Google or Apple for processing).

To allow use of other browsers and permit offline operation, a second Speech Recognition implentation, PocketSphynx, an open source, all Javascript recognizer, is included in this experiment. This tool is considered mature and our use will be as-is.

Concept of Operation

The objective is to have Speech Recognition extend Audio Mode. This would have the user speak a command and the Audio Mode synthesis/state machine respond.

Normal Audio Mode displays 5 floating buttons - one in each corner and one on the center of the right side. The upper right button is used to initially enable Audio Mode and then functions as the Status/Yes button. The upper left is the List/No/Next function button. The lower left is the Quiet/Reset button. The center right side button is the Menu/Help/Back button. The lower right is the catch-all, multi-function ALT/Search/Monitor button. Audio Mode functions are executed by exercising those buttons and getting voiced responses.

You can start Speech Recognition from the quiet/reset state by pressing the menu button once followed by pressing the ALT button. Additionally, the tri-bar (hamburger) menu has a checkbox to enable Speech Recognition. The Preferences menu has a checkbox on whether to use the standard if available.

With Speech Recognition the lower right, multi-function button acts as a "push to talk" button mostly. Other buttons retain their Audio Mode functionality. Pressing the multi-function button will enable the microphone for 1.5 seconds capturing the spoken command. While active, the button will have a red background. The timed enable window reduces power consumption and minimizes noise in the processing.

The basic commands perform the equivalent of pressing one of the five buttons. For the Status/Yes button the commands are STATUS, STAT or YES. For the List/No/Next button the commands are LIST, NO or NEXT. For the Quiet/Reset button the commands are QUIET or RESET. For the Menu/Help/Back button the commands are MENU, HELP or BACK. For the Alt/Search/Monitor button the commands are ALT, SEARCH or MONITOR. Other commands include NEARBY, PLATFORMS, ALERTS, CONNECTIONS, HERE, SELECT.

Test Platforms

In developing the code the test platforms are Firefox and Chrome on a Linux laptop, an iPhone running Safari and Chrome on iOS 14.

Early Observations

Chrome on Linux performed best. Firefox on Linux works, but the available speech voice was not good. Chrome on iOS did not work. Safari on iOS14 works.

The idea of a voice command mapping to one of the floating buttons seems not so advantageous as you must press the ALT (push to talk) button and then say the command, get the processing result and see if it's a valid result vs. just pushing the right button. The advantage comes when selecting a menu command which would normally require many button presses.

The PocketSphynx recognizer was more error prone but does work.