I am currently working in two projects related to dialogue:
IM2.MDM
(NCCR Project)
The IM2.MDM (Interactive Multimodal Interface
Management, Multimodal Dialogue Management) is an Individual Project
within the (IM)2 National Centre of
Competence in Research
(Switzerland).
The goal of the MDM project (IP) is to define a framework for
multimodal dialogue understanding by a computer program, which is both
theoretically grounded and efficiently implementable. Our IP then aims
at integrating this computational dialogue model with lower level
language processing tools, in order to provide a generic set of tools
for dialogue management by a computer agent.
INSPIRE
INSPIRE
(Infotainment management with speech interaction via remote-microphones
and telephone intefaces) http://http://www.inspire-project.org
Inspire is a 2 ½-year project contributing to the creation of
smart
home environments. Its main objective is the integration of a
multilingual, interactive, natural, speech dialogue-based assistant for
wireless command and control of home appliances (e.g. consumer
electronics). Emphasis is given on infotainment (information and
entertainment) equipment and services.
Spoken interfaces based upon VoiceXML prompt users with synthetic speech and understand simple words or phrases, using a defined dialog model specific for the application. As the technology improves we can look forward to richer natural language conversations. There is now an emerging interest in combining speech interaction with other modes of interaction. Multimodal interaction will enable the user to speak, write and type, as well as hear and see using a more natural user interface than today's single mode browsers.
More in Multimodal Interaction Activity
The Wizard of Oz experiment is a method used to help the developers verify their dialog models.
People using the system believes that they are interacting with a real system, while actually there is a human who is controlling it. A text to speech synthetizer is used. Data is saved and analyzed later, in order to revise the dialogue model.
The person who acts as the Wizard of Oz can actually do all the job, or just partly. Doing all the job is quite useless and difficult, as he has no time to react fast while emulating the system.
We propose a Wizard Of Oz to test the dialogue model and the grammars used. The models have to be implemented in VoiceXML. An extremely simple grammar has to be done also, but it may indicate only the semantic pairs to be used in order the model to be useful. The person controlling the system, the wizard, does not need to control the model, only the grammars result.
The experiment begins here. No speech recognizer is used. The VoiceXML interpreter runs the model and when speech input is needed, the interpreter informs to the wizard of all the active grammars, with the semantic pairs available. The wizard listens to the user speech and selects the response, that it is sent back to the VoiceXML interpreter. This will be usefull to validate the model, and with the information saved, a better grammar can be built.
In a second phase, the system uses the speech recognizer with the revised grammars, and propose to the wizard the solution found. He can accept or modify the solution. The models and the grammars can be revised again until found a good solution.
After a solution is found, the Wizard of Oz experiment is disabled and the system is ready to run for explotation.
You can download the
working implementation of the VoiceXML Woz, voicexmlwoz2003-04-25.zip
This is work in progess, suggestions are welcomed.
Here you can see an screen-shoot. The VoiceXML interpreter runs the model, and whenever a speech input is needed, the Woz Server shows the active grammars and the wizard is asked to listen the user and select the appropiat semantic result.
VoiceXML can be extended to use some other input devices. For instance, in the SmartHome application, a pointer device could be used to choose a light and ask it to switch it on. The device sends some input to the interpreter and with the help of a grammar, this information can be parsed and produce a semantic result.
On the other hand, passive devices can help in contextualizing. If the user says "Switch on this light", and the following grammar is used:
public <main> = [<politeness>] <command> [<politeness>] <object> [<politeness>]; |
the interpreter needs to contextualize "this", and can ask a passive pointer device to which light the user is pointing to.
We have designed an input component for VoiceXML interpreter based on Web Services. It takes the input from the speech recognizer, from the keyboard and also from all specified input devices. When there is an input from any of them, it passes it to the interpreter. The interpreter then just uses the active grammars to fit the input and produce a semantic result.
Like this, the input devices can give information that is semantically equivalent to a specific phrase spoken or typed by the user.
For the passive devices, the dialog model needs to call them to contextualize. Again, a solution based on Web Services has been implemented. The added input devices, like the pointer devices, can implement a Web Service.
The dialog model, using the VoiceXML object tag, calls the specified Web Service asking for some information and it immediately receives the response, which is used to contextualize.
Thus, a web browser can also be an active device, calling the VoiceXML Web Service. At the same time, it can also be a passive device, implementing a Web Service (using an IndirectWebServiceProxy) that VoiceXML can call at anytime to contextualize.

It is now available to download two demos concerning the Inspire and
IM2.MDM projects: dialogue2003-03-05.zip
Here you can see two screenshoots:



Author is grateful to Pavel Cenek from the Laboratory of Speech and Dialogue at the Faculty of Informatics, in MU Brno for his support using his VoiceXML 2.0 browser, now called OptimTalk.