In this episode of the Lex Fridman Podcast, Edward Gibson delves into the intricacies of language and grammar. He explores dependency grammar and its role in understanding the cognitive costs of language production and comprehension. Gibson explains how languages naturally have short dependencies to minimize these cognitive burdens.
The discussion also examines cultural differences between languages, highlighting how certain indigenous societies lack precise terms for quantities and colors – reflecting their unique perceptions of the world. Gibson contrasts this with the complexities of legalese, advocating for simpler legal writing. The episode further explores the capabilities and limitations of large language models in replicating the full nuances of human language and meaning.
Sign up for Shortform to access the whole episode summary along with additional materials like counterarguments and context.
Edward Gibson presents dependency grammar as a framework that reveals the connections between words within sentences, forming tree structures to represent these relationships. He appreciates this approach for its ability to illustrate cognitive processing costs, as longer dependencies within sentences increase the difficulty of language production and comprehension. Gibson supports his viewpoint with experimental evidence, highlighting that dependency lengths are critical factors in gauging the cognitive load of language usage.
Gibson proposes that languages inherently have short dependencies to lower the cognitive burdens associated with sentence production and comprehension. He notes that while languages could be structured to have even shorter dependencies, a balance is maintained to keep the language learnable and rules consistent. Short dependencies result in simpler sentences, like those in Hemingway's writing, which are easier to produce and understand. The tendency towards short dependencies is seen as a linguistically universal feature to minimize cognitive costs.
Observing the differences in how cultures use language, particularly in terms of color and number terminology, Gibson examines the disparities between industrialized and non-industrialized societies. He discusses the Pirahã and Tsimane languages of the Amazon, which lack specific number words, relying instead on terms like few and many to represent quantities. Gibson posits that the absence of precise counting words can affect the abilities of a society and reflects the practical needs that influence language development. The difficulty in translating concepts from languages with these gaps reveals deep cultural variations in how different communities perceive and articulate the world.
Gibson critiques the complexity of legal language, pointing out that legalese is fraught with center embedding, which disrupts comprehension and recall—even among lawyers. He discovers that a significant proportion of sentences in legal texts involve definitions nested within subject-verb constructs, contributing to the difficulty in understanding. This stands in stark contrast to natural language tendencies, highlighting how legal documents, perhaps unintentionally, prioritize a certain performative complexity. Gibson advocates for simpler legal writing, arguing that it is feasible and preferable.
The conversation with Lex Fridman reveals Gibson's skepticism about Large Language Models (LLMs) being comprehensive theories of language due to their significant size and lack of conciseness. While acknowledging their syntactical capabilities, he contends that LLMs do not truly grasp meaning. He references shortcomings from earlier AI research and emphasizes that understanding meaning remains a vital challenge for LLMs. Fridman and Gibson express that the complexity of human language—as evidenced by Noam Chomsky's observations—is not yet fully replicated by LLMs that primarily focus on the form of language rather than its semantic substance.
1-Page Summary
Edward Gibson sheds light on dependency grammar, emphasizing its effectiveness in illustrating the cognitive processing involved in language comprehension and production.
Edward Gibson explains dependency grammar as a system of connections between words that make up sentences. Fridman clarifies the central concept, stating that in dependency grammar, each word is linked to just one other word, forming a tree structure. This structure elucidates the relationships and distances between words.
Gibson is fond of dependency grammar because it transparently shows the lengths of dependencies between words. These lengths are key indicators of cognitive processing costs—the longer the dependency, the harder it is to produce and understand the sentence. He further explains that trees can represent the cognitive processing cos ...
Dependency grammar
Edward Gibson explains that languages are structured to have short dependencies between words to minimize the difficulty in sentence production and comprehension.
Lex Fridman and Edward Gibson discuss the theory that most languages favor short dependencies to reduce cognitive processing costs. Gibson underscores that languages optimize for shorter dependency lengths compared to a control set constructed from random scramblings of sentence elements. Fridman and Gibson explore the impact of these short dependencies on the simplicity of sentence structure and ease of understanding.
Gibson discusses that while languages could potentially minimize dependency lengths even further, they also maintain regular rules to facilitate learnability. He notes that although this is somewhat shakier territory, the concept of dependency grammar can explain the presence of short dependencies, as languages aim for harmonic word order to minimize production difficulty and potential comprehension confusion.
Gibson empha ...
Why languages have short dependencies
Edward Gibson’s observations concentrate on the variances in color and number terminology among different cultures, revealing intriguing contrasts between industrialized societies and remote, non-industrialized communities.
While the content does not specifically discuss counting differences between industrialized and remote cultures, Edward Gibson explores the concept of number representation in language and its variability among cultures.
Gibson shares insights into the languages of isolated communities, such as the Tsimane and Pirahã of the Amazon, which lack words for exact counting. The Pirahã language, for example, does not include words for 'one,' 'two,' 'three,' or 'four,' and instead uses quantifiers like few, some, and many. These quantifier words do not represent specific numbers; rather, they are used contextually to indicate approximate quantities.
Although people from these cultures can perform exact matching tasks with a small number of objects by sight, their ability to do so diminishes with larger quantities due to the lack of specific number words. For example, they can match quantities accurately up to about three or four, but can only estimate when dealing with larger numbers such as eight. This indicates that without number words to count with, precision in tasks is compromised.
Gibson suggests that the absence of words for exact counts can limit a society's capabilities. He hypothesizes that the invention of a counting system within a culture may emerge from practical needs, such as farming, where keeping track of a number of animals necessitates a counting system ...
Cultural Differences Between Languages
Edward Gibson delves into the intricate nature of legal language found in contracts and laws, often referred to as "legalese," and its impact on comprehension.
Edward Gibson discusses why legalese is notoriously difficult to understand compared to other types of professional texts, including academic texts. Gibson, along with Eric Martinez, evaluated various contracts and found that center embedding, or the placement of nested structures within a sentence, is rampant in legalese and contributes significantly to its complexity. Not only does this practice hinder comprehension, but it also negatively impacts recall.
Lawyers, who regularly deal with legalese, experience poor recall and understanding when reading sentences with center-embedded clauses. Interestingly, when presented with non-center-embedded versions of texts, both legal professionals and laypeople show a preference, suggesting that simplifying the structure could benefit all readers.
Gibson remarks on the high incidence of center embedding in legal texts, where clauses intervene between subjects and verbs, a practice much more common than in other texts. Approximately 70% of sentences in contracts and laws feature a center-embedded clause, which is significantly higher than the 20% rate found in other types of writings. Such a high prevalence of center embedding makes legal language uniquely challenging.
One specific issue that Gibson criticizes is the insertion of definitions within the sentence, disrupting the syntactic flow between subject and verb. He acknowledges that simplifying the legal material to avoid center embedding is quite feasible while still conveying the same information. By extracting definitions from within the sentence, legal texts could become more understandabl ...
Legalese as an exception
Edward Gibson and Lex Fridman discuss the capabilities and limitations of Large Language Models (LLMs) in replicating human language and understanding.
During the conversation, Fridman and Gibson consider the proficiency of LLMs at handling the form of language and discuss how well these models imitate the structure and syntax of human language.
Despite their ability to predict what's good and bad in the English language, Gibson notes that LLMs might not be great theories due to their size. He implies a preference for more concise theories. Moreover, both Fridman and Gibson touch upon the idea that LLMs might use formalisms like dependency grammar to model language form; however, it's not clear to what extent they capture meaning or understand language.
Gibson shares that AI in the field of natural language during the '80s did not impress him, as it seemed more like a set of hacks rather than a real theory. He notes that syntax is a comparatively easier challenge than meaning, which LLMs still struggle to grasp. Gibson emphasizes that while LLMs handle form well, they fail to understand meaning, a sentiment echoed by Fridman as they discuss language models' limitations.
One sign of these limitations is that large language models can be tricked because they do not understand what is happening in a given interaction. Using the Monty Hall problem to illustrate, Gibson explains that LLMs can’t remember or integrate specific knowledge, defaulting to the most fa ...
Large language models
Download the Shortform Chrome extension for your browser