Read I Can Hear You Whisper Online
Authors: Lydia Denworth
The explosion in brain-imaging technologyâthe “supercool” MEGs and fMRIs and PETsâmade this new era of granularity possible by allowing scientists to record the activity of tens of thousands of cells firing at the same time. Until the mid-1990s, most data on the brain came from injuries, deficits, lesions, and so on. Data from deficits, though, would never have given you what Poeppel and Walker produced in just five minutes of watching my brain on sound, or what they could get if we had carried on with speech sounds. “I can stick you in there right now and we can do âah's,' âoo's,' and âee's' all day long until we map your space,” says Poeppel. “We can map the timing, we can do the spatial analysis. We can do a 3-D reconstruction. Everyone's a little different, but there's a high degree of consistency. Any auditory stimulus will have this cascade of responses.”
In other words, everything from a beep to a recitation of
Macbeth
sends waves of electrical pulses rippling through the brain along complex though predictable routes on a schedule tracked in milliseconds. Along the way, the response to a beep gets less complicated because a beep is, well, less complicated than Macbeth's guilty conscience. But any word from “apple” to “zipper” takes less than half a second to visit all the lower and higher processing centers of the brain, and in that fraction of a second neuroscientists have a pretty good idea of the several stops the word makes and the work done by the brain along the way. I now understood how the P1 and N1 got their names.
Each point where the response is distinctive and concentrated has been labeled with an N or a P depending on whether the wave is usually negative-going or positive-going at that point (counterintuitively to me, convention dictates that negativities go up and positivities go down), and then the N or P is given a number to indicate either how many milliseconds were required to get there or how many major peaks have preceded it. Taken together, these responses have helped us understand far more about the steps in the intricate dance the brain performs as it converts sound to language.
In a hearing adult, in the first fifteen milliseconds, the signal, or sound, is in the brain stem. In this early and relatively small section of the brain, the sound will be relayed through the cochlear nuclei, the superior olive, the lateral lemniscus, and the inferior colliculus in the midbrain, subdividing and branching each time onto one of several possible parallel pathways, until it reaches the early auditory cortex. “What has been accomplished so far?” asks Poeppel. “All these nuclei have done an enormous amount of sophisticated analysis and computation.” The superior olive, for instance, is the first place where cells are sensitive to information coming from both ears and begin to integrate and compare the two, the better to figure out where the sound originated. “Doing things like localizing . . . is not necessarily accomplished, but the critical agreements are already calculated for you at step two,” says Poeppel. “It's pretty impressive.” It was these early distinctive responses that Jessica O'Gara was looking for in Alex when she gave him an auditory brain stem response test, designed to see if the auditory path to the brain is intact.
If the route is clear, as it is in a typically hearing person, the sound reaches the primary auditory cortex in the temporal lobe somewhere around twenty milliseconds and spends another few milliseconds activating auditory areas such as Heschl's gyrus. The P1 is usually at about sixty milliseconds, and the N1, as Poeppel and Walker just recorded in me, peaks around one hundred milliseconds. It's still mainly an auditory response, as the brain continues to identify and analyze what it just heard, but the brain is also starting to take visual information into account. Already, at one hundred milliseconds, higher-order processing is under way. The more unpredictable the soundâa word you rarely hear, for instanceâthe bigger the amplitude of the N1 because the harder you're working to make sense of it. Conversely, the response is more muted for words we hear all the time or sounds we expect, such as our own speech.
Around two hundred milliseconds, where the P2 or P200 occurs, the brain starts to look things up. It is beginning to compare the arriving sound with what it already knows by digging into stored memories and consulting its mental dictionary. As befits this more complex task, the signals are now firmly in the higher regions of the brain, and the neural activity is far more widespread. This is also the point where information arriving from other systems truly converges. However a word is perceivedâwhether you hear it, read it, see it, or touch itâyou will begin processing it fully at this same point.
The ability to recognize words, to acknowledge the meaning found in the dictionary, is known as lexical processing and happens between two hundred and four hundred milliseconds, which is very late in brain time. As was true earlier, the amplitude of the N400 reflects the amount of work the brain is being asked to do: It is larger for more infrequent or unfamiliar words and for words that vary from many others by only one letter, such as “hat,” “hit,” “hot,” and “hut.” By six hundred milliseconds, the brain is processing entire sentences, and grammar kicks in. The P600 is thought to reflect what neuroscientists call “repair and reanalysis,” because it is elicited by grammatical errors and by “garden path sentences” (the kind that meander and dangle their modifiers: “The broker persuaded to sell the stock was tall”) andâin a non-linguistic exampleâby musical chords played out of key.
But all of that is only half the story. The cascade of responses that begins with the ear and leads all the way to the ability to follow a poorly constructed sentenceâto know that it's the broker who is tallâis known as
bottom-up processing. It starts with the basic input to any senseâraw dataâand ends with such higher-level skills as reasoning and judgment and critical thinkingâin other words, our expectations and knowledge. Neuroscientists now believe that the process is also happening in reverse, that the cascade flows both ways, with information being prepared, treated, and converted in both directions simultaneously, from the bottom up and from the top down.
This idea amounts to a radical rethinking of the very nature of perception. “Historically, the way we intuitively think about all perception is that we're like a passive recording device with detectors that are specialized for certain things, like a retina for seeing, a cochlea for hearing, and so forth,” says Poeppel. “We're kind of a camera or microphone that gets encoded somehow and then magically makes contact with the stuff in your head.” At the same time, many of the big thinkers who pondered perception, beginning with Helmholtz (him again), knew that couldn't be quite right. If we reached for a glass or listened to a sentence, didn't it help to be able to anticipate what might come next? In the mid-to-late twentieth century, a handful of prominent researchers proposed models of perception that suggested instead that we engaged in “active sensing,” seeking out what was possible as we went along. Most important among these was Alvin Liberman at Yale University, whose influential
motor theory of speech perception fell into this category. He proposed that as we listen to speech, the brain essentially imagines producing the words itself. Liberman's elegant idea and other such ideas did not gain much traction until the past decade, when they suddenly became a hot topic of conversation in the study of cognition. What everyone is talking about today is the brain's power of prediction.
That power is not mystical but mathematical. It reflects the data-driven, statistical approach that informs contemporary cognitive science and defines the workings of the brain in two ways:
representations and computations. Representations are the equivalent of a series of thumbnail images of the things and ideas we have experienced; everything in our mental hard drive, like the family photographs stored in your computer. How exactly they are stored remains an open questionâprobably not as pictures, though, because that would be too easy. Computations are what they sound like: the addition, subtraction, multiplication, and division we perform on the representations, as if the brain begins cropping, rotating, and eliminating red-eye. They are how we react to the world and, crucially, how we learn. “The statistical approach makes strong assumptions about what kinds of things a learner can take in, process, âchunk' in the right way, and then use for counting or for deriving higher-order representations,” says Poeppel.
On one level, prediction is just common sense, which may be one reason it didn't get much scientific respect for so long. If you see your doctor in the doctor's office, you recognize her quickly. If you see her in the grocery store dressed in jeans, you'll be slower to realize you know her. Predictable events are easy for the brain; unpredictable events require more effort. “Our expectations for what we're going to perceive seem to be a critical part of the process,” says Greg Hickok, a neuroscientist who studies
predictive coding among other things at the University of California, Irvine, and regularly collaborates with Poeppel. “It allows the system to make guesses as to what it might be seeing and to use computational shortcuts. Perception is very much a top-down process, a very active process of constructing a reality. A lot of that comes from prediction.”
Predictive coding has real implications for Alex, Hickok points out. “Someone with a degraded input system has to rely a lot more on top-down information,” he says. “If you analyze sensory input roughly, you test against more information as it's coming in. Let me look and see if it matches.” Anyone who reads speech is using prediction, guessing at context from the roughly one-third of what is said that can be seen on the mouth and using any other visual cues he can find. Those who use hearing aids and implants still have to fill in gaps as well. No wonder so many deaf and hard-of-hearing children are exhausted at the end of the day.
But top-down processing can be simple, too. If a sound is uncomfortably loud, for instance, it is the cortex that registers that fact and sends a message all the way back to the cochlea to stiffen hair cells as a protective measure. The same is true of the retina, adjusting for the amount of light available. “It's not your eye doing that,” says Poeppel. “It's your brain.” Then he beats rhythmically on the desk with a pencil: tap, tap, tap, tap. “By beat three, you've anticipated the time. By beat four, we can show you neurophysiologically exactly how that prediction is encoded.”
“Helmholtz couldn't do that,” I point out.
“We have pretty good theories about each separate level,” Poeppel agrees, citing the details of each brain response as an excellent example of fine-grained knowledge at work, but “how is it that you go from a very elementary stimulation at the periphery to understanding in your head âcat'? We don't know. We're looking for the linking hypothesis.”
I realize it's the same question that Blair Simmons asked at the cochlear implant workshop in 1967: How do we make sense of what we hear? And I have to acknowledge that, as Eric Kandel said, mysteries remain.
 â¢Â â¢Â â¢Â
What kind of information does a sound have to carry in order to set this auditory chain in motion? That was another question Simmons asked. With Alex in mind, I wonder what happens if
Macbeth
is too quiet or garbled or otherwise distorted. I put this basic question to
Andrew Oxenham, an auditory scientist at the University of Minnesota, who studies the auditory system in people both with and without hearing loss.
“What cochlear implants have shown us,” he tells me, is just how well we can understand speech in “highly, highly degraded situations. Think about the normal ear with its ultrafine-frequency tuning and basilar membrane that [has] thousands of hair cells, and you think about replacing that with just six or even four electrodesâso we're going from hundreds, possibly thousands, of independent frequency channels down to four or six. You'd say: Wow, that's such a loss of information. How is anyone ever going to perceive anything with that? And yet people can understand speech. I guess that's shown us first of all how adaptable the brain is in terms of interpreting whatever information it can get hold of. And secondly, what a robust signal speech is that you can degrade it to that extent and people can still extract the meaning.”
This makes sense, Oxenham adds, from an evolutionary point of view. “You want something that can survive even in very challenging acoustic environments. You want to be able to get your message across.” Engineers often think about redundancy, he points out, and like to build things with a belt-and-suspenders approach, the better to ensure success. “What we have learned is that speech has an incredible amount of redundancy,” says Oxenham. “You can distort it, you can damage it, you can take out parts of it, and yet a lot of the message survives.”
 â¢Â â¢Â â¢Â
Just how true that is has been shown in a series of experiments designed to test exactly how much distortion and damage speech can sustain and still be intelligible. Back in David Poeppel's office at NYU, he walks me through a fun house of auditory perception to show me what we've learned about the minimal amount of auditory information necessary for comprehension.
He begins with
sine waves, a man-made narrow-band acoustic signal that lacks the richness and texture of speechâa beep, really. On the whiteboard in his office, Poeppel sketches out a spectrogram, a graph depicting the range of frequencies contained in sound over time, with squiggly lines representing bands of energy beginning at 100, 900, 1,100, and 2,200 Hz and then shifting.
“This is me saying âibex,'” he says.
“If you say so,” I answer. I wouldn't have known the details, but by now I know those bands of energy are the defining formantsâthe same features that Graeme Clark and his team used to design their first speech processing program.