Too Much Data?

We have unprecedented availability to data today, and yet are polarized over many very important issues. Yet data is at the heart of each side’s argument. How can this be?

Here is just a small sample of big issues that have become increasingly rancorous:

  • Food (Genetically modified seeds, local vs. global production, the carbon cost in food distribution.)
  • Health care (How do we provide needed health care in a fair, cost effective way?)
  • Climate change (what is the truth about the reality of the problem and what we should do about it?)
  • The economy (Were the “bail outs” helpful or harmful, are we facing inflation or deflation ahead, is national debt necessary or deadly, is the recovery “V” shaped, “U” shaped, “W” shaped, or something else?)

Availability of Data

A recent article in The Economist (“The Data Deluge,” February 25, 2010) offered this attempt to quantify how much data is available to us:

“Everywhere you look, the quantity of information in the world is soaring. According to one estimate, mankind created 150 exabytes (billion gigabytes) of data in 2005. This year, it will create 1,200 exabytes [roughly the same as the number of grains of sand on the earth]. Merely keeping up with this flood, and storing the bits that might be useful, is difficult enough. Analysing it, to spot patterns and extract useful information, is harder still. Even so, the data deluge is already starting to transform business, government, science and everyday life. It has great potential for good — as long as consumers, companies and governments make the right choices about when to restrict the flow of data, and when to encourage it.”

The article went on to describe the challenges and opportunities presented to us by this amount of data:

  • Finding important data in this incredible “haystack.”
  • Protecting our own private information that becomes a part of this available data.
  • Gaining competitive advantage for business by effectively using the insight from this data.

But I want to consider a different issue. Why, with all of this data, can’t we find common ground on the crucial issues?

The Challenge

I admit I don’t know the answer to this question, but offer four suggestions. I invite readers to offer others.

We Can’t Find the Right Data

Many of us have witnessed at least a television version of a legal trial where the judge orders that a firm provide certain data to one of the lawyers, and they respond by shipping a truck-full of data. They literally respond by burying the other side in a fog of data. This is certainly possible. And with exabytes of data, we might expect it would be tough to find what we are looking for.

But with today’s technology, the ability to search for information that exists in digital form has made this much less a problem. I would suggest that generally this is less an issue today even with the growing volume of data.


A sister problem to being lost in large volumes of information is one of the coping skills people have acquired. Simply look at only parts of that sea of information, specifically the parts that you believe you will agree with. It is a common thing these days to find people who go only to the sources that fit with their biases. I just read a posting referring to CNN as the “Communist News Network,” and have seen similarly biased references by liberals to Fox News Channel. As a result, we get a part of the picture and are ignorant of another view that we might want to consider.

Andrew Shapiro wrote The Control Revolution more than 10 years ago. He argued that as information grew, individuals might limit what they looked at, becoming more and more ignorant.

In the newspaper era, a person interested in following the debate on health care would scan the paper, and in the process might read a totally unrelated article about an important local issue related to governance in their own community. In this way, they would become broadly aware of other issues. When they use a search engine to read about only health care, the same breadth drops away. When they read only sources they know in advance share their own point of view, they lose the broad perspective even on health care. And so these rancorous discussions take place between people who have no common data between them.

Shooting the Messenger

Related to bias, some people will not listen to a message from someone they don’t agree with, regardless of what is said. I have heard ideas from Sarah Palin dismissed out of hand because of the source. Recently, someone told me they rejected the ideas about global warming because Al Gore talked about them, and Gore is a hypocrite. This may be the easy way out, but we can learn something even from those we don’t agree with. I sense the same kind of thing is happening in the political discussions, where people will just tune out when the other side is speaking.

I have had coffee recently with two former students on different days. They told me what they valued from my class more than any content is the challenge to “read from both sides.” They have been presently surprised that the only way to create a third way to look at an issue is to carefully look at both sides that are being presented. I agree with Greg Page, CEO of Cargill and the subject of the current Ethix Conversation when he says that we can even learn from the shrill voices that force us to think more carefully about our own suppositions.

I recently heard the president of the Republican Party in Washington, D.C., state respond to an accusation that they were sitting on the sidelines and not offering any alternate solutions (in this case to the problem of dealing with the state’s multibillion-dollar debt). He said they were actively engaged, pounding the table and standing in the path of the oncoming train. We need to understand this is different from creating a solution. A similar charge has been made that the Democrats are forcing a solution to health care without considering the issues raised by the Republicans.


Unfortunately, we can’t solve the problems by just dealing with the improved approaches to dialogue outlined above. In most of the really tough issues mentioned at the beginning of this discussion, the answers are not directly contained in the exabytes of data out there. Take the economic issues, for example. We don’t have the luxury of trying a “bail out” and if that doesn’t work, do a rewind and eliminate the bailout. Also, we don’t have the luxury of ignoring global warming and later saying, “Whoops, let’s back up to 2010 and try a different strategy.”

So in all of these cases we use computer-based models to predict as well as we can. We also use history to find a situation like the one we are dealing with and try the approach that seems closest to the current situation. History is also a form of a model. Computer models are never perfect. History doesn’t contain a world as interconnected or fast moving as the one we live in today.

If we could talk with each other, we could then move the discussion to developing the parameters of the model, or discussing the similarity and differences between historical scenarios. It is here that data is available in the exabyte haystack.

But there is a problem with this as well. The models themselves are very complex, and generally it takes a well trained expert to understand the questions that are used to probe the assumptions of the model. This means people without the training would need to trust those with the training, and this is very difficult.

I will illustrate this with an example from the food issue that Michael Pollan explores in the Omnivore’s Dilemma. Talking about the concern for animal suffering in industrialized food he asks, “So is it possible to slaughter animals on an industrial scale without causing them to suffer? In the end each of us has to decide for himself … . For my part, I can’t be sure, because I haven’t been able to see for myself,” [p. 330]. Yet later in the book he talks about a better practice when he goes hunting for wild animals for his meat. Rather than an experienced expert in the slaughterhouse shooting the animal from a distance of 7 feet, he takes a single lesson and shoots a running pig from a much greater distance. “The pig thrashed briefly, attempting to lift her head…” [p. 352].

That he doesn’t even notice the irony in the two passages is not surprising. Something we do ourselves seems somehow better than what is unknown and out of our control. This is why so many people feel safer driving a car rather than riding in an airplane, in spite of the data that demonstrates flying is vastly safer.

Fundamental complexity combined with lack of personal involvement in the solution creates a level of distrust that is difficult to overcome. Perhaps this, too, is part of the issue in communication about the major issues that face us. Just as the data about flying won’t convince someone, neither will any information about these issues in our complex world be convincing either.


Simply having more data, even being able to find our way through it, is not enough to answer the complex questions of our day. It reminds me of the line from the poem, “The Rime of the Ancient Mariner” by Samuel Taylor Coleridge written more than 200 years ago: “Water, water, everywhere, nor any drop to drink.”

Rather, there needs to be a record of trust and a broad awareness of key issues from multiple sources. Our technology seems to be working against us here.


Al Erisman is executive editor of Ethix, which he co-founded in 1998.
He spent 32 years at The Boeing Company, the last 11 as director of technology.
He was selected as a senior technical fellow of The Boeing Company in 1990,
and received his Ph.D. in applied mathematics from Iowa State University.

2 Responses to “Too Much Data?”

Share Your Thoughts