TechWatch: Can Technology Capture a Meeting?

Did you ever walk away from a meeting wondering what was agreed to? Or how the group came to a particular conclusion? Or maybe you missed the meeting completely, and wondered how a basically intelligent group of people could reach such a terrible decision!

With the development and integration of many new technologies, these questions could be answered. Technologies are being integrated to allow the “capturing” of meetings. This includes body language, facial expressions, what was said, how it was said, and who said it. Further, the technology will allow this content to be searched, so it wouldn’t be necessary to listen and watch forty five minutes of material to find the information of interest.

The technologists I talked with who are developing this capability were enthusiastic about the possibilities. Maybe we all should be. But I am reminded of a quote from Neil Postman, the sociologist/historian who has written so provocatively on issues of technology in society. He said, “If you add technology to a system, you don’t get the old system plus technology, you get a new system.” Translated to this particular scenario, the meeting you missed but was recorded, is likely a different meeting than would have taken place without the recording capability. We need to look at why this is, but first, let’s look at the technological possibilities.

Many technologies come together to create the meeting capture capability. Effective recording of both audio and video is a necessary step. Very high performance microphones (lots of them to pick up conversation in different corners of the room) are now available to avoid the unnatural step of requiring every person in the room to speak into a particular microphone. Location devices, even individual microphones for the participants, will help tie a voice to a particular person. So will voice recognition systems.

Since spoken speech is different than written text, tools are also being developed to take out the “uh’s,” identify the speaker with the text, and provide readable written material. The technology even enables accurate summarization of the material. More about this work can be seen at

Multiple cameras capturing videos of the meeting are another ingredient. It is important to capture gestures, body language and facial expressions. These cameras must be in strategic locations around the room, and managed to capture both the speakers and the reaction of others. The goal is to do all of this automatically, without the aid of an operator.

Capturing emotional response in video corresponding to emotion in the voice may make it possible to annotate the text transcript with such notes as “John Doe said angrily.” All of this material can be time and date stamped to create an accurate record for someone to look at later.

Finally, there is the requirement to take all of this “data” and make it “searchable.” How did Bill react to the discussion on layoffs? Who raised the issue on product quality? Did Sally really know about the “off shore” accounts at this particular date?

One piece of technology now widely available is the speech to text translators. Reducing a voice transcript to written text is becoming better every day. There are now tools out there (eg. ViaVoice and Dragon) that are actually quite useful. Perfect? Of course not. But neither are humans perfect in their understanding of what another person has said.

One of the important keys to this technology is understanding context, and this requires much more than word-by-word transcription. The classic problem of distinguishing between “recognize speech” and “wreck a nice beach” is not solved without some managing of context, and even this is difficult. In our conversation with Dan Ling from Microsoft in Ethix 14, we discussed this problem. When the tape from our conversation was transcribed, the human transcriber stumbled over this conversational distinction as well.

Natural Language processing researchers (linguists with deep computational knowledge) have been adding semantic context to computational linguistics for years, and the results are much improved today. Having the text stream from the meeting now enables someone to search what was said, looking for “off shore accounts” for example. These then work much like an Internet search engine.

Work is also continuing on video search. The original motivation for this research was to enable TV news stations to search large archives of video for particular scenes or significant discussions. Again, the products are still fairly crude here, though strongly aided by the ability to search text streams by word or subject.

DARPA (the Defense Advanced Research Project Agency) and NIST (the National Institute of Standards and Technology) are both funding research projects and standards for the various pieces, and the integration of these technologies, for a deployable “meeting capture” system.

An obvious question could be raised on the accuracy of such systems. Could they greatly distort the results of the meeting with poor transcription, inaccurate summaries, or misidentification of speakers? Probably, but compared with what? How well do we really remember what was said, by whom? How well do humans summarize a meeting? In the component of this problem, creating summaries of large volumes of text for understanding by humans, tests showed slightly better than 90% accuracy, which may sound as if it is not acceptable. In this limited test case, humans had about 80% accuracy for the same task. So it is important to gain accuracy, but the benchmark is not 100% before it is useful.

Suppose all of this technology could be put together with acceptable levels of accuracy.

Is this the answer to our problem of capturing a meeting? Perhaps for certain types of meetings it is, but certainly not in general. One senior executive I talked with about this technological possibility said “That’s the last thing I would want.” Was he afraid of trying something new or did he have something more profound in mind? I think it is the latter, and here’s why.

Imagine a meeting in the Fall of 2001 between Enron and Andersen officials regarding off shore accounts. Would any official speak his or her mind knowing the meeting was being captured? Perhaps some would, as the Watergate tapes with Richard Nixon seemed to demonstrate. But most would change their behavior, going back to the comment of Postman at the beginning of this discussion. With the capture capability in place this would be a different meeting than it would be without it.

The problem goes deeper than simply covering possibly illegal activity. An executive team is engaged in a strategy session. Effective strategy must include scenarios planning.
The best scenarios are ones that are “way out there.” What would happen if we closed a major line of business? What would happen if we created all of our software in Russia? What are the potential safety problems with this design? What if we closed two major plants by the end of next year? Any of these scenarios, when captured, could be used out of context to create a significantly different meaning for those reviewing it later. One of the pitfalls of digital capture is the capability to edit digital data, so not only content but context must be preserved with integrity.

Do all of these concerns suggest that meeting capture is not a good area of research and even if successful may not be deployable? That would be too hasty a conclusion. Often research motivated by one problem becomes useful for another. In the history of the development of the VCR (invented in the early 1950s, but widely and successfully deployed only around 1980) we can see this shift. The VCR had its target market as the small number of large television stations that could use the technology in news production. No one envisioned the movie at home marketplace.

So in meeting technology we see once again some big lessons we need to keep learning about technology.

The introduction of technology changes the system in which it is deployed.

There are often unintended consequences in creating new systems, and these consequences can make or break the market.

Thinking about the scope of uses of a new technology may help narrow the focus of what is being considered, and offer insights on potential successes.

I think meeting capture technology will offer significant value, but it may not be for capturing all, or even most, meetings. It also may offer promise in areas no one now envisions.


Al Erisman is executive editor of Ethix, which he co-founded in 1998.
He spent 32 years at The Boeing Company, the last 11 as director of technology.
He was selected as a senior technical fellow of The Boeing Company in 1990,
and received his Ph.D. in applied mathematics from Iowa State University.