4:05 pm Feb 20 - by Tom Thoren – Technograph writer
Thomas Huang, professor of electrical and computer engineering, demonstrates facial recognition software.
University researchers are exploring innovative ways for people to interact with computers, such as reading and responding to users’ emotions, automatically identifying the elements of multimedia and reducing the amount of data transfer in video communications.
Thomas Huang, professor of electrical and computer engineering, heads the human-computer interaction branch of research at Beckman Institute with about 20 graduate students. It includes four major areas: biometrics, video event detection, human-computer interfacer and multimedia searches on the Web.
Much of the group’s research involves “soft biometrics,” Huang said, which identifies a person’s gender, age and other characteristics.
The software compares the user’s face to an image database that includes men and women of all ages and emotions. Certain signs tip off the software, such as wrinkles on an old person’s face. While the computer can’t say with 100 percent certainty whether a person is young or old, happy or sad, it can be more sure based on the extensive database.
Other algorithms focus on scanning the contents of videos and images. Video event detection can analyze a video and automatically identify behavior such as consumers perusing or picking up certain items in a store. Surveillance video could also benefit from this technology by noticing suspicious activities.
“We’re engineers, so we’re always driven by the applications,” Huang said. “But of course, we are also interested in the theoretical aspects. By studying the applications, we are often led to some interesting theoretical questions, which we will study.”
This scanning software is also part of the multimedia database research, which focuses on how to process, store and make use of the vast amount of information on the Internet, said Vuong Le, one of Huang’s graduate student researchers. The current standard is to search with keywords, but Huang’s group is exploring ways to use different query techniques to both search and label multimedia information.
For example, if you wanted to search the Internet for an image of an apple, you would search “apple.” But that could mean the fruit or perhaps the technology giant. But by presenting the text along with an image of the fruit, the search engine would know which other images to scan for.
The group’s research also handles how search engines can label and store those images. A person may label an image of the fruit as an “apple.” But that method relies on somebody programming labels for every piece of multimedia and labeling it correctly and completely for all multimedia information to be searchable. With better querying techniques, software can automatically apply labels without any need for human programming. This could also be beneficial for surveillance and homeland security.
Human-computer interface covers ways to communicate with computers and “virtual agents,” or avatars, through means other than the traditional mouse and keyboard, Huang said.
“We want to complement that with voice and also maybe gesture to command the computer,” he said.
This includes the computer sensing the user’s mood and reacting accordingly in order to provide a better user experience. Like the gender and age scanning, the computer can also pick up on key signatures of various moods. Huang has worked with psychologists to improve the facial recognition software by looking for the six universal emotions: happiness, surprise, anger, disgust, fear and sadness. Huang said that once his teams get these gestures nailed down, they plan to move on to cognitive states such as boredom, distraction and exhaustion.
“When you have a transaction between person and the computer, or virtual agent, how the computer should respond should depend on the state of the human’s emotional cognitive state,” Huang said. “If the person is not happy, the computer should try to do something.”
Analyzing a user’s facial expressions is one thing, while taking that information and recreating it for another user is an entirely different task, Huang said.
The software can also analyze the two-dimensional image of one user and create a three-dimensional model that the user can send to someone else’s computer or mobile device. But it’s not just like sending a large video file. Instead, Huang’s group is researching creative ways that can send much less information by relying on key points on a person’s face that connect to make up a three-dimensional image.
By scanning key points on a user’s face, the computer can analyze for certain gestures and moods, again by comparing to a database of similar facial movements. It reads movements from thousands of key locations on the face while interpolating movements on the points in between.
While also sending the voice audio, the computer can send less of the facial movement information and instead let the audio drive how the lips move. This is one of the many ways the program can save on data transfer, which could be useful for weaker computing machines, such as cellphones. The computer that tracks and renders the face must still have strong computing capabilities, however.
Because the system is not sending a video of the first user, it can, for example, assign the movement data to someone else’s face. Huang and Le have a demonstration that does this very thing with President Obama’s face.
While that is more of a gimmick, the group is focused on making advances that lead directly to practical uses.
“We want to look into applications which are hopefully good for the society,” Huang said. “So if you look in the key applications — health, education, entertainment and also commercial transactions — in almost all of these, we need human-computer interaction.”
No comments yet!