This article is an excerpt from the Shortform book guide to "Everybody Lies" by Seth Stephens-Davidowitz. Shortform has the world's best summaries and analyses of books you should be reading.
Like this article? Sign up for a free trial here .
Why are words important? How does data use words?
Words have always been used as data, but big data allows researchers to study more words, especially on search engines. Without words, we wouldn’t be able to use search engines to discover the truth about people.
Learn why words are important, according to Seth Stephens-Davidowitz in his book Everybody Lies.
Words, Words, Words
There’s nothing new about using words as data. As Stephens-Davidowitz points out, linguists, social scientists, historians, and others have studied words and word usage for a long time. But big data dramatically enhances the type and volume of words researchers can study.
Stephens-Davidowitz further explains why words are important in relation to data findings. He bases most of his insights and arguments on search terms used in Google and other search engines. Search engines—and the databases of information they compile as users use them—are a relatively recent invention and represent a new source (or variety) of data for researchers and analysts.
Stephens-Davidowitz points out that computers also make it easy to analyze large volumes of text and/or speech that would be difficult or impossible to deal with manually. For example, he cites a study of word frequency in Facebook status updates and shows how word usage breaks down among gender and age lines—for example, perhaps unsurprisingly, he demonstrates that college-age people post about “studying” during the “semester” whereas 20-somethings drink “beer” when they aren’t “at_work.” Similarly, he explains that researchers can use sentiment analysis to determine the overall emotional tone of a body of text.
(Shortform note: While Stephens-Davidowitz is excited about how academic researchers can use text analysis—for example, he cites studies that use sentiment analysis to map narrative trajectories in works of fiction—most of the practical application of these techniques seems to take place in the business world. For example, businesses use text analysis and sentiment analysis to gauge customer interest and reactions, detect problems early, and improve customer service.)
A Brief History of Web Search
Just how new is search data? The first search engine, Archie, was launched in 1990 as a relatively simple tool for searching public file servers (before Archie, the internet was indexed by hand). Throughout the 1990s, several new search engines emerged that allowed users to search the whole web like we’re used to today. In 1998, Google introduced better search methods and other improvements and quickly became the predominant search engine worldwide—today, over 90% of web searches are run through Google.
It’s unclear whether earlier search engines compiled search data the way Google does—and early on, even Google didn’t publish their search data regularly. But in any case, it’s hard to imagine Stephens-Davidowitz’s research before Google. Even if earlier search engines had collected search data, that data would be spread across a handful of companies instead of centralized in one place. Likewise, none of those prior search engines had anywhere near the user base Google has today—and it’s the sheer amount and currency (volume and velocity) of information that makes Google’s data useful for the kinds of studies in Everybody Lies.
———End of Preview———
Like what you just read? Read the rest of the world's best book summary and analysis of Seth Stephens-Davidowitz's "Everybody Lies" at Shortform .
Here's what you'll find in our full Everybody Lies summary :
- How people confess their darkest secrets to Google search
- How this "big data" can be used in lieu of voluntary surveys
- The unethical uses and limitations of big data