The Difference Between Information and Data (and Why It Matters)
Data is just the geek word for information, right?
If I were to provide information about the room in which I write this, I might say that it’s 10 feet by 8 feet with a 12-foot ceiling. You’d realize that it’s a comfortable but not overly large space. To put that information into a database, you would use software to enter each dimension into the appropriate cell and save it to your device’s hard drive.
Although this description seems straightforward, the information I just conveyed to you and the corresponding data in a database differ in important respects. Without understanding the distinction, we will always struggle to think accurately about data ownership, privacy, and even cybersecurity.
Identify the Processor
There was a time when data was merely information. As a matter of etymology, the 1640s Latin-derived term “datum” translates as “fact.” Stretching into the 1900s, the plural, “data,” denoted facts collected for future reference. This history does, in part, explain today’s confusion.
With the advent of computers, however, a linguistic shift occurred. Data now refers specifically to that stuff our PCs, smartphones, and other devices create, process, copy, store, and sometimes delete (if only accidentally and when least advantageous to a project you’re working on). Information and data differ based on who or what processes and stores it:
- Information is perceived and processed by human minds.
- Data is a digitized version of such human-usable information for use by devices.
To become data, text, sound, imagery, you name it, is encoded as binary patterns of ones and zeroes. Those patterns then exist as physical entities, the microscopic pits on a DVD or the radio waves carrying data from your phone to the nearest cell tower.
Data objects may be “quantum small” but nonetheless they’re completely and entirely physical. They’re as physical as a brick or a lamp or an old school vinyl record.
Of course, when a computer presents data to us in one human-accessible form or another—as we listen to a Spotify playlist, for example—it is processed by our minds as information. But the data—the pits or frequency oscillations representing J.S. Bach’s Suite for Solo Cello No. 2 in D Minor—remain physical things. The light waves transmitted over fiber-optic cable and the copies of the binary code that exist on Spotify’s servers are separate from the music we hear.
Thinking Clearly About Data Ownership
I’ve made a big deal about conflating the words “data” and “information” because this habit is fundamental to our misunderstandings about data ownership. The confusion over these terms is being leveraged by powerful interests, whether knowingly or not, to promulgate the misperception that our personal data will never be truly private or really ours to control.
That’s because taming information is a fool’s errand. Tell me a juicy bit of gossip and swear me to secrecy and I still might whisper it to a friend tomorrow over coffee, who might then share it with a spouse who relates it to a reporter and suddenly the amusing story is front page news.
Our human mores—the rules and customs by which we live—attempt to disincentivize me from sharing private information in this manner. If I’m discovered, I may be considered untrustworthy and cut off from future secrets, for instance. But information itself has no inherent properties that prevent it from spreading uncontrollably. Horses and barn doors and all that.
Physical stuff is a much different animal or, well, thing. Physical entities are predictable and controllable. Toss a ball and it will go from here to there, not around the galaxy. Heat water to 100 degrees Celsius and it will boil. Cool that same water down to 0 degrees Celsius and it will freeze into a solid. In other words, science.
Data objects are physical things, created and controlled by software. Software dictates the if, when, where, how, and for how long these data objects can be accessed, changed, copied, or transmitted. In fact, you can’t perform any of those functions except through software. You might delete data on a hard drive by holding a magnet nearby, but this simply highlights data’s physical nature.
The problem is that the vast majority of software doesn’t take advantage of available physical controls to restrict data access, prevent data copying, and so on. Instead, the tech giants wooing us with their visions of the digital future make promises about safeguarding personal information (an inevitably flawed endeavor)—while saying nothing about the personal data they vacuum up. Thought leaders, company CEOs, and policymakers lean on rules, processes, laws, certifications, and other fallible elements of the human realm to assure us they’re doing something about privacy and cybersecurity. They’re essentially asking us all to stop at various societally agreed-on “red lights” related to collecting data, using data, etc., with outcomes anyone could predict—some obey the rules, some bend them, and some ignore them entirely.
The good news is that by leveraging the physicality of data, we can do more than ask that our preferences about how our personal data is used be followed—software developers can make it happen, by default and by design. We simply need to demand that they do so.