Enhanced broadcast services for the deaf community
There are approximately 70 000 severely and profoundly deaf people in the UK who rely on sign language as the primary means of communication. Many deaf people, often those born deaf, find signing is the only language they can follow to keep abreast of programme content. This article describes a virtual human signing system developed by the European collaborative project named Virtual Signing: Capture, Animation, Storage and Transmission (ViSiCAST) (www.visicast.co.uk). Led by the ITC under its technology research programme, this project has successfully developed an avatar-based signing system for broadcast, internet and ‘over the counter’ type applications.
Promoting access to television programmes for deaf or hard of hearing people is an important objective of the Independent Television Commission. With annual targets set by the ITC, we have seen subtitling on Channel 4 reach 80% of total transmission time in 2001 with ITV following closely behind at over 75%. There are approximately 70 000 severely and profoundly deaf people in the UK who rely on sign language as the primary means of communication. Many deaf people, often those born deaf, find signing is the only language they can follow to keep abreast of programme content.
In recognition of the needs of these people, the Government has set a 10- year target of 5% of programmes on digital terrestrial television (DTT) services to include sign language presentation or interpretation.
At present, these services use an ‘open signing’ approach, where a sign language interpreter forms an integral part of the programme picture (Figure 1). The disadvantage of this approach is that viewers without hearing loss can find the interpreter distracting, and because of this broadcasters are often reluctant to transmit signing at peak viewing times. The ITC has set interim sign language targets working up to the 5% requirement. As these targets rise there is growing interest in also introducing a ‘closed signing’ approach, where the image of the sign-language interpreter can be turned on and off by the viewer (Figure 2). A disadvantage of this approach is that it requires the transmission of two programme feeds (one from the actual programme and a second for the signed commentary), requiring extra transmission capacity.
Recent advances in multimedia technology have created an opportunity to use a ‘virtual human’ sign language interpreter, in the form of an animated avatar (in computing, this term is used to mean a virtual reality icon representing a person). The advantage of this approach is that only the positioning information needed to activate the avatar in the receiver (face, body, hands) needs to be transmitted, reducing the required bandwidth by up to a factor of ten compared with a video approach. More significantly, such an approach promises to open up many more programmes (eventually all those that have been subtitled) for sign language access by use of automated translation from subtitles into sign language gestures and movements.
This article describes a virtual human signing system developed by the European collaborative project named Virtual Signing: Capture, Animation, Storage and Transmission (ViSiCAST) (www.visicast.co.uk). Led by the ITC, under its technology research programme, this project has successfully developed an avatar-based signing system for broadcast, internet and ‘over the counter’ type applications.
Sign language and its users
For people who are born deaf or have become deaf before learning a spoken language, it is often very difficult for them to learn to speak and to read and write. Sign language provides the only viable alternative; everything that can be expressed in spoken language can be expressed in sign language. Specific rules are used on how a particular sign is performed, how signs are inflected, and how signs are combined to form full sentences.
There are a number of different variants of sign languages, ranging from natural languages (such as British Sign Language – BSL), which are the preferred languages of native (prelingually) deaf people, through to increasingly artificial forms of sign language (such as Sign Supported English – SSE). SSE has often found favour by those who were not prelingually deaf and by less experienced hearing sign language interpreters in education.
The ViSiCAST system
A final goal for the broadcast application of the virtual human signing system is automated translation from text subtitles (accompanying very high proportions of television programmes) into sign language, thus providing a wider choice of programmes for those who rely on this form of access. Techniques to translate in real-time, however, from English into natural forms of sign language (such as BSL) are still not yet fully mature.
In view of this, the ViSiCAST project has also implemented a simplified system which captures the movements and gestures of a human sign language interpreter and then codes these for low bandwidth transmission and subsequent reconstruction, to be performed by a high quality avatar in the receiver. A block diagram of the complete ViSiCAST approach, including the simplified system, is shown in Figure 3.
The simplified approach
In the simplified approach for broadcasting, a human sign language interpreter produces motion-captured sign sequences to accompany the broadcast TV programme, as in the process shown at the top half of Figure 3. This involves a few simple steps:
To provide the data needed to animate the virtual human in the receiver, the gesture movements of the human sign language interpreter are recorded in the form of motion capture.
Data is captured using individual sensors for the hands, body and face (Figure 4). This is because natural sign languages, such as BSL, communicate efficiently with hand-shape position and movements, body posture and facial expression, which we have found to be particularly important in the avatar representation of BSL.
Data-gloves, which have sensors to record finger and thumb positions, are used to record hand shapes. Magnetic sensors also record the wrist, upper arm, head and upper torso positions in three-dimensional space relative to a magnetic field source. A video face tracker, consisting of a helmet-mounted camera with infrared filters, surrounded by infrared light emitting diodes, records facial expression. Reflectors are positioned at regions of interest such as the mouth and eyebrows. The various sensors are sampled at between 30 and 60 Hz.
The sign language interpreter could in principle perform long sequences of signing for live broadcast. In practice, the data of the moves is usually recorded and edited, as would be a soundtrack. The real-time animation software allows edited sections to be sequenced without creating ‘jump cuts’.
The automated translation approach
In parallel with this motion capture system, the ViSiCAST project is also developing a more flexible sign generation system for the automatic translation of natural language into animation. This part of the system cannot yet reliably operate in the broadcast environment, but it is already able to translate successfully in real time some simple sentences.
The task of translating textual content into sign language is decomposed into the following sequence of transformations (as shown by the bottom half of Figure 3):
- The English text (perhaps from subtitles) is parsed to a semantic representation. This can be done in software, depending on the complexity of the language. For example, ‘The boy bought the book’ may be signed as: BOY .BOUGHT .WHAT .BOOK
- As the facial expression and body posture contribute to the meaning and ease with which sign language is understood, the notation needs to define all aspects of sign language gesture, manual and non-manual (i.e. including facial gesture). For example, BOY .BOUGHT .WHAT .BOOK has to be signed with a questioning facial expression up to the sign WHAT.
- This sequence is then translated to a computer-based gesture notation language. This computer language defines hand motions together with simultaneous changes of finger direction and any necessary facial expressions.
- The signing gesture notation can then be used to drive the virtual human (avatar) animation, as outlined in the next section.
Transmission of signing
Standard ways of representing the virtual human signing data and efficiently transferring this across different platforms are required.
ViSiCAST has developed two formats – the ‘bones animation format’ (baf) for the simplified system based on motion capture, and the signing gesture markup language (SiGML) for the transmission of synthetically derived animation based on ‘notation’.
The data streams drive a virtual human, which we have named Visia (Figure 6).
Visia can be driven by a stream of information about bone rotations, positions and lengths (although it is primarily the rotations that change from one frame to another). This information is similar, whether produced by motion capture or notation.
A ‘skeleton’ is wrapped in, and elastically attached to, a texture-mapped three-dimensional polygon mesh that is controlled by a separate thread that tracks the ‘skeleton’ (Figures 5 and 7). The visual quality of the avatar is improved to make it more ‘photo-real’: for example the opening image shows an avatar head developed in this way from two photographic images of the person who was the original model for the Visia character.
The data-rate needed for transmission of the signing information is approximately 50 kbit/s for the motion capture approach and 20 kbit/s for the notation-based approach. Both of these use significantly less bandwidth than bandwidth-reduced video.
Visia is illustrated above right, signing a sample of broadcast material (Figure 8).
Applications and display
Television and broadcast transmission
BBC Research & Development has developed its first demonstrator for broadcast closed signing based on motion capture.
The signing possibilities using the ViSiCAST capture and broadcast technology were shown publicly at the International Broadcasting Convention in September 2002.
In co-operation with the UK Post Office (PO), ViSiCAST has also been exploring the possibilities of increasing access to customer services nationwide through signing.
The ViSiCAST avatar used in this application (named ‘TESSA’) won the British Computer Society IT Award and Gold Medal and was successfully exhibited at the Science Museum, London in Summer 2001 (and at ACM One, the major conference of the Association for Computing Machinery in San Jose). A second trial is now underway in five UK Post Offices. The TESSA application aims to aid communication between a Post Office counter clerk and a deaf person by translating the clerk’s speech to sign language and displaying it using an avatar. Speech recognition is used to facilitate the development of an ‘unconstrained’ system, in a Post Office environment which now has over 350 different signs available for use.
Evaluations of this Post Office system by members of the Deaf community (in conjunction with the Royal National Institute for the Deaf) have shown that TESSA’s signing is easily understood by BSL users, who are enthusiastic about how useful TESSA may be in the future. Encouraging reports have been broadcast on national television – TESSA was featured very favourably on the BBC’s See Hear programme and the Children’s TV programme Blue Peter.
Signed weather forecast on the Internet
In collaboration with the Netherlands Dovenschap (deaf society), the ViSiCAST project has launched an Internet weather forecast application. Interest shown by deaf users has been very encouraging. Daily forecasts of Dutch weather can be accessed at URL: http://www.dovenschap.org/weerbericht .html
Signs have been captured for sign language of the Netherlands, German sign language and British sign language, each with a native interpreter of that language.
The ViSiCAST weather forecast application allows sections of signing to be built-up including variables such as temperatures, weather types and wind directions. In this application, the avatar is truly 3-dimensional, i.e. the user can enlarge, reduce and turn it. This is done with simple mouse movements, and can be done both when the avatar is moving and when she is standing still. This is an important advantage because forward movements in sign language are sometimes difficult to perceive when presented in two dimensions.
The ViSiCAST project is also developing a multimedia package to assist in the learning of sign language, where the user can build a phrase to be signed interactively by the avatar.
Conclusions and futures
The work described here has demonstrated that virtual humans can achieve acceptable signing for television, point of sale and Internet applications. Further work in the area of automatic language translation and understanding is required to achieve the project’s ultimate goal of generating high-quality BSL signing directly from subtitles. When this is achieved it will greatly increase access to television programmes for the profoundly deaf.
Verlinden, C. Tijsseling, H. Frowein (5–7 November 2001), ‘Sign language on the WWW’, Proceedings of 18th International Symposium on Human Factors in Telecommunication (HFT2001) achieving the John Karlin Award for best paper at the conference, K. Nordby, M. Krosby (ed) pp.197–204, ISBN 82-423- 0401-7, Bergen, Norway.
M. Wells (15 September 2001) ‘So what else can virtual humans do?’, International Broadcasting Convention (IBC), Amsterdam
R. Elliott, J.R.W. Glauert, J.R. Kennaway, I. Marshall (ASSETS 2000), ‘The development of language processing support for the ViSiCAST project’, 4th International ACM SIGCAPH Conference on Assistive Technologies.
Gary Tonge FREng
Director of Technology Independent Television Commission
Project Manager for New Media Independent Television Commission
Gary Tonge is the ITC’s Director of Technology, having previously been Director of Engineering and Controller of Engineering from the inception of the ITC in 1990. His current priorities include facilitating the switchover to digital TV, and helping build the ITC’s expertise in new media developments. Gary has a PhD in applied mathematics and a BSc in electronics from Southampton University and is a Fellow of The Royal Academy of Engineering, the Royal Television Society and of the Institution of Electrical Engineers.
Michele Wakefield manages the delivery work of the nine EU partners in ViSiCAST. As Project Manager for New Media, Michele contributes to the ITC’s strategic understanding of changing technologies across the converging markets of broadcast, computing and telecommunications. Previously, Michele worked for six years in BT, leading the e-commerce division's strategy and planning, alongside project managing ebusiness solutions to meet the needs of commercial banks and retailers.