(Reproduced from The Computer Bulletin, May 2003, pp24-25 by permission of the British Computer Society, for educational purposes only. Copies may not be made or distributed for direct commercial advantage. Copyright The British Computer Society, 2003.)
It pays to talk
Voice recognition systems can improve customer service and cut costs - as long as user needs are kept firmly in mind. Joe McCool reports on latest technology developments and emerging applications.
FALSE PROMISES and over-selling have created difficulties for voice recognition systems, not least problems of perception. But things are now ripe for exploitation, according to a recent conference supported by the UK's Institute of Physics.
Systems have been installed by the likes of the UK government to keep track of criminals, by British Airways and Virgin Trains, British Telecom, Odeon Cinemas, and financial services companies including American Express, Abbey National, Halifax and Lloyds TSB.
As with many IT developments, end-user issues and existing ways of doing things have affected the take-up of voice recognition.
For example, betting shops in Australia, the Far East and the USA have used voice recognition for years, according to Australian supplier VeCommerce. Punters in those countries have always had to complete a structured betting slip detailing the course name, the time of the race, the name of the horse and so on, but no such structured dialogue existed in the UK, and this has slowed acceptance.
However, VeCommerce has now secured contracts from Ladbrokes, Bet Direct and others in the UK. Bet Direct installed a voice recognition betting system for the Cheltenham meeting in March: operations executive Phil Morgan told the conference that when punters called they had the choice of dealing with a human operator or a voice activated computer.
'Our motivation is very simple: we want to increase revenue,' he said. 'Punters typically wait to the last minute, so there's a queue on the phone line.' The new system will greatly reduce queueing and increase the number of bets placed. The company has had no problems with accuracy, and studies have put customer satisfaction at 95%.
The system will be expanded to cater for all forms of gambling. It will be especially useful for prize and account enquiries, releasing busy phone lines for other uses.
Bet Direct's system is certainly taking users into account. Some systems have failed to attract users because they are based on menus, or lists of options; Bet Direct's system saves the users work. Instead of trying to guide a caller through a series of menus, it looks for key words. When it hears 'Epsom', for example, it focuses on that course. If it hears '2.30' it can identify the race, 'Lucky Lad' the horse, and so on.
Return on investment in integrated voice recognition systems can be high, Krystyna Hirsham from Philips told the conference. The company, a pioneer in the field, has recently created the world's biggest speech recognition company in a joint venture with Scansoft, and provided Sweden with its first voice recognition directory service. Initial system costs, modelled on 1,000 ports, 50% automated, with 300,000 30-second calls a day and a 200,000-word vocabulary, were put at $3.37m (£2.1m), with an annual cost of $843 (£527) per port over four years. Calls cost an estimated $0.02 each. This is contrasted with human agent costs of $31,250 (£19,530) a year and call costs of $0.24.
Future enhancements will include personal address books, marketing campaigns, enhanced services such as 'where is the nearest florist' or 'I am looking for a garage in...', Yellow Pages, call completion, and the storing of user profiles.
In the UK the Gloucestershire Hospital National Health Service Trust, which has 7,000 staff, has cut lost calls by 80% after installing a voice recognition switchboard and receptionist system from UK company Telephonics. The Trust wanted to avoid touchtone telephones and multilayer menus. Manager Pat Mooney told the conference that all calls were now answered within government guidelines, 24 hours a day. The system integrates with voicemail, paging systems, call loggers and least-cost routing, and works with almost any master directory, such as Microsoft Exchange or Active Directory. It has replaced four operators and will nay for itself in a few months. Pat Mooney has now been asked to instigate an integrated voice recognition strategy for the whole county.
One impression from the conference is that the technology is still simplistic. There is no involvement of artificial intelligence or anything approaching the conversation of the HAL computer in the film 2001: A Space Odyssey. Scott McGlashan of the World Wide Web Consortium's Voice Browser Group said current technology was coming under pressure because of the need for shorter application cycles and the limitations of existing business systems. There was also a poor fit with the internet world of XML and HTTP.
Dr McGlashan highlighted the two technologies jostling for position: VoiceXML and Speech Application Language Tags (SALT).
SALT is oriented more to multiple modes: voice combined with the web. It targets speech applications across a whole range of devices, including phones, personal digital assistants, tablet computers and desktop PCs. Since many devices also contain displays, multimode interactions are a key focus.
VoiceXML is designed for telephone applications. It was developed to allow the specification of voice recognition applications in a mark-up language. It is getting a lot of backing from European radio stations, which hope to build the technology into their content and journalist management systems.
Convergence of the two has been promised in the form of VoiceXML version 3.0. Work on this has begun and a first working draft is expected this year.
Guntbert Markefka of T-Mobile Deutschland was willing to be more adventurous. He predicted that thanks to the work of linguist Noam Chomsky automatic speech recognition and natural language understanding would start to make an impact soon. Noam Chomsky's Generative Transformational Grammar Paradigm had reduced 122,000 grammar points to around 10,000. Processing time had been cut dramatically.
Technology was already in place for speaker verification, in applications like managing accounts, money transfer and home shopping.
Guntbert Markefka predicted that by 2008 systems would be able to classify speakers by age and gender. On the way, systems will emerge which will be able to identify individuals by their voice. Current systems of this type are only effective among small groups of speakers, with fewer than 20 members.
Dr Markefka said the German automatic speech recognition and natural language understanding system for operator procedures was already much more effective than the old system based on touchtone phones. False inputs have been reduced from 19% to less than 1%.
T-Mobile's next aim is to combine live voice recognition with natural language understanding and touchtone phones. Dr Markefka did not underestimate the tasks ahead -- understanding irony, jokes and poetry would remain a challenge -- but he said, 'Automatic speech recognition and natural language understanding can be done. What we need is a paradigm shift. So let's get on with it.'
Voice systems in action
Speech recognition is being used by financial services group Lloyds TSB to operate an employee share scheme. The system from UK specialist SRC is being used by Lloyds TSB Registrars.
'Speech recognition is simple for users and much more cost-effective than paper systems', says manager Chris Bush.
The service asks callers for their reference number, surname and national insurance details before providing options such as how much to invest or how to receive their dividends. SRC developed a unique grammar, with 'many thousands of words', to support the system, which took 10 weeks to complete. It is hosted remotely by SRC.
'Callers have been impressed by how clever the system is in recognising what they say,' Chris Bush says. 'Some staff say they didn't realise they were speaking to an automated system.'
Joe McCool, a BCS Graduate member and a Chartered Engineer,
is a freelance consultant.
VoiceXML:
www.w3c.org/TR/voicexm12O,
www.voicexml.org
SALT:
www.saltforum.org
VoiceXML and SALT are compared at
www.speechtechmag.com/issues/7-3/cover/
742-1.htmI.
(Reproduced from The Computer Bulletin, May 2003, pp24-25 by permission of the British Computer Society, for educational purposes only. Copies may not be made or distributed for direct commercial advantage. Copyright The British Computer Society, 2003.)
| Page maintained by: | Dr Peter Coxhead | |
| Content last updated: | 27 March 2004 | |
| Converted from: | NLPA-X-VoiceRecArticle |