You are here

Technical report | A Spoken Dialogue System for Command and Control


 A speaker-independent Spoken Dialogue Systems (SDS) is proposed as a more natural interface for human–computer interaction than the traditional point-and-click method. This report describes the objectives, development and initial implementation of an SDS within a prototype command environment. It includes design decisions and solutions to problems encountered during development as well as comments regarding ongoing and planned future research that have emerged through these activities.

Executive Summary

We report on activities undertaken in the research program 'Smart And Rapid Information Access for Joint Command and Control (C2)' one of whose goals was to enhance the spoken natural language (SNL) control of technologies and information for Deployable Joint Forces Headquarters (now Headquarters 1st Division) staff. The exploitation of SNL for C2 has multiple benefits, including improved 'naturalness' of interaction between headquarters staff and automated systems, such as those presented within a Livespace. Livespaces are dynamic, highly configurable, context-aware, smart-room environments designed for synchronous distributed collaboration [1]. For instance, a Spoken Dialogue System (SDS) can enhance human–computer interaction through the integration of voice-operated control of systems. However, this integration with 'traditional' interaction modalities needs to be done in a manner that is appropriate to the task and to ensure that it enhances, rather than hinders, a human's control of systems, access to information and workflow. Further, natural language-aware systems are arguably currently under-utilised in part due to the complexity of modelling and processing contextualised SNL.This report describes our approach to, and actual development of, a speaker-independent SDS for control of devices and the presentation of automated briefs in the Livespace. We summarise here the engineering and development methodologies. These include requirements for implementing the SDS, such as SNL speech recognition and generation, as well as grammar and dialogue-management modules. We analyse the Natural Language Processing issues from an engineering and implementation perspective. This is because we believe that it is only on the basis of a functioning prototype that we can effectively assess its benefit to C2 and potential contribution to the capabilities of our clients.

The SDS described here permits user-initiated voice control, and state-querying, of devices such as computers, displays, lights and data-projectors in a technology-enhanced setting, such as the Livespace command environment currently developed within Command, Control, Communication & Intelligence Division (C3ID). Our SDS also enables staff to receive visual and audio briefs through synthesised speech at a time of their choice, and control them with their own SNL voice commands. The SDS integrates off-the-shelf Automatic Speech Recognition (ASR) and Text-To-Speech (TTS) output with sophisticated hand-authored natural language grammars which interpret SNL commands and queries. The human–computer interface is steered by a dialogue-management system that generates and coordinates real-time synthesised speech and device-update responses.

We have designed a modular, scalable, flexible, and robust SNL-aware system. To this end, we exploit both commercial and open-source tools and, where necessary, develop hand-authored components. Our SDS is designed to be increasingly sophisticated and responsive to users, and adaptable to relevant developments in language technologies and software developments, thereby providing clients with a state-of-the-art system that is powerful, efficient, reliable and commensurate with advances in command-centre technologies around the world.

Key information


Adam Saulwick, Jason Littlefield and Michael Broughton

Publication number


Publication type

Technical report

Publish Date

October 2012


Unclassified - public release


Computational Linguistics; Natural Language Processing; Speech Recognition; Spoken Dialogue Systems