The Bell Curve: Intelligence and Class Structure in American Life (47 page)

Read The Bell Curve: Intelligence and Class Structure in American Life Online

Authors: Richard J. Herrnstein,Charles A. Murray

Tags: #History, #Science, #General, #Psychology, #Sociology, #Genetics & Genomics, #Life Sciences, #Social Science, #Educational Psychology, #Intelligence Levels - United States, #Nature and Nurture, #United States, #Education, #Political Science, #Intelligence Levels - Social Aspects - United States, #Intellect, #Intelligence Levels

BOOK: The Bell Curve: Intelligence and Class Structure in American Life
3.69Mb size Format: txt, pdf, ePub

E
XTERNAL
E
VIDENCE OF
B
IAS.
Tests are used to predict things—most commonly, to predict performance in school or on the job. Chapter 3 discussed this issue in detail. You will recall that the ability of a test to predict is known as its validity. A test with high validity predicts accurately; a test with poor validity makes many mistakes. Now suppose that a test’s validity differs for the members of two groups. To use a concrete example: The SAT is used as a tool in college admissions because it has a certain validity in predicting college performance. If the SAT is biased against blacks, it will
underpredict
their college performance. If tests were biased in this way, blacks as a group would do better in college than the admissions office expected based just on their SATs. It would be as if the test underestimated the “true” SAT score of the blacks, so the natural remedy for this kind of bias would be to compensate the black applicants by, for example, adding the appropriate number of points onto their scores.

Predictive bias can work in another way, as when the test is simply less reliable—that is, less accurate—for blacks than for whites. Suppose a test used to select police sergeants is more accurate in predicting the performance of white candidates who become sergeants than in predicting the performance of black sergeants. It doesn’t underpredict for blacks, but rather fails to predict at all (or predicts less accurately). In these cases, the natural remedy would be to give less weight to the test scores of blacks than to those of whites.

The key concept for both types of bias is the same:
A test biased against blacks does not predict black performance in the real world in the same way that it predicts white performance in the real world.
The evidence of bias is
external
in the sense that it shows up in differing validities for .blacks and whites. External evidence of bias has been sought in hundreds of studies. It has been evaluated relative to performance in elementary school, in secondary school, in the university, in the armed forces, in unskilled and skilled jobs, in the professions. Overwhelmingly, the evidence is that the major standardized tests used to help make school and job decisions
27
do not underpredict black performance, nor does the expert community find any other general or systematic difference in the predictive accuracy of tests for blacks and whites.
28

I
NTERNAL
E
VIDENCE OF
B
IAS.
Predictive validity is the ultimate criterion for bias, because it involves the proof of the pudding for any test. But although predictive validity is in a technical sense the decisive issue, our impression from talking about this issue with colleagues and friends is that other types of potential bias loom larger in their imaginations: the many things that are put under the umbrella label of “cultural bias.”

The most common charges of cultural bias involve the putative cultural loading of items in a test. Here is an SAT analogy item that has become famous as an example of cultural bias:

RUNNER:MARATHON

envoy:embassy

marty:massacre

oarsman:regatta

referee:tournament

horse:stable

 

The answer is “oarsman:regatta”—fairly easy if you know what both a marathon and a regatta are, a matter of guesswork otherwise. How would a black youngster from the inner city ever have heard of a regatta? Many view such items as proof that the tests must be biased against people from disadvantaged backgrounds. “Clearly,” writes a critic of testing, citing this example, “this item does not measure students’ ‘aptitude’ or logical reasoning ability, but knowledge of upper-middle-class recreational activity.”
29
In the language of psychometrics, this is called
internal
evidence of bias, as contrasted with the external evidence of differential prediction.

The hypothesis of bias again lends itself to direct examination. In effect, the SAT critic is saying that culturally loaded items are producing at least some of the B/W difference. Get rid of such items, and the gap will narrow. Is he correct? When we look at the results for items that have answers such as “oarsman:regatta” and the results for items that seem to be empty of any cultural information (repeating a sequence of numbers, for example), are there any differences?
30
Are differences in group test scores concentrated among certain items?

The technical literature is again clear. In study after study of the leading tests, the hypothesis that the B/W difference is caused by questions with cultural content has been contradicted by the facts.
31
Items that the average white test taker finds easy relative to other items, the average black test taker does too; the same is true for items that the average white and black find difficult. Inasmuch as whites and blacks have different overall scores on the average, it follows that a smaller proportion of blacks get right answers for either easy or hard items, but the order of difficulty is virtually the same in each racial group. For groups that have special language considerations—Latinos and American Indians, for example—some internal evidence of bias has been found, unless English is their native language.
32

Studies comparing blacks and whites on various kinds of IQ tests find that the B/W difference is not created by items that ask about regattas or who wrote
Hamlet,
or any of the other similar examples cited in criticisms of tests. How can this be? The explanation is complicated and goes deep into the reasons why a test item is “good” or “bad” in measuring intelligence. Here, we restrict ourselves to the conclusion:
The B/W difference is wider on items that appear to be culturally neutral than on items that appear to be culturally loaded.
We italicize this point because it is both so well established empirically yet comes as such a surprise to most people who are new to this topic. We will elaborate on this finding later in the chapter. In any case, there is no longer an important technical debate over the conclusion that the cultural content of test items is not the cause of group differences in scores.

“M
OTIVATION TO
T
RY.
” Suppose that the nature of cultural bias does not lie in predictive validity or in the content of the items but in what might be called “test willingness.” A typical black youngster, it is hypothesized,
comes to such tests with a mindset different from the white subject’s. He is less attuned to testing situations (from one point of view), or less inclined to put up with such nonsense (from another). Perhaps he just doesn’t give a damn, since he has no hopes of going to college or otherwise benefiting from a good test score. Perhaps he figures that the test is biased against him anyway, so what’s the point. Perhaps he consciously refuses to put out his best effort because of the peer pressures against “acting white” in some inner-city schools.

The studies that have attempted to measure motivation in such situations have generally found that blacks are at least as motivated as whites.
33
But these are not wholly convincing, for why shouldn’t the measures of motivation be just as inaccurate as the measures of cognitive ability are alleged to be? Analysis of internal characteristics of the tests once again offers the best leverage in examining this broad hypothesis. Two sets of data seem especially pertinent.

The first involves the
digit span
subtest, part of the widely used Wechsler intelligence tests. It has two forms:
forward digit span,
in which the subject tries to repeat a sequence of numbers in the order read to him, and
backward digit span,
in which the subject tries to repeat the sequence of numbers backward. The test is simple in concept, uses numbers that are familiar to everyone, and calls on no cultural information besides knowing numbers. The digit span is especially informative regarding test motivation not just because of the low cultural loading of the items but because the backward form is twice as g-loaded as the forward form—that is, the backward form is a much better measure of general intelligence. The reason is that reversing the numbers is mentally more demanding than repeating them in the heard order, as readers can determine for themselves by a little self-testing.

The two parts of the subtest have identical content. They occur at the same time during the test. Each subject does both. But in most studies the black-white difference is about twice as great on backward digits as on forward digits.
34
The question arises: How can lack of motivation (or test willingness or any other explanation of that type) explain the difference in performance on the two parts of the same subtest?
35

A similar question arises from work on reaction time. Several psychometricians, led by Arthur Jensen, have been exploring the underlying nature of
g
by hypothesizing that neurologic processing speed is implicated, akin to the speed of the microprocessor in a computer.
Smarter people process faster than less smart people. The strategy for testing the hypothesis is to give people extremely simple cognitive tasks—so simple that no conscious thought is involved—and to use precise timing methods to determine how fast different people perform these simple tasks. One commonly used apparatus involves a console with a semicircle of eight lights, each with a button next to it. In the middle of the console is the “home” button. At the beginning of each trial, the subject is depressing the home button with his finger. One of the lights in the semicircle goes on. The subject moves his finger to the button closest to the light, which turns it off. There are more complicated versions of the task (three lights go on, and the subject moves to the one that is farthest from the other two, for example), but none requires much thought, and everybody gets almost every trial “right.” The subject’s response speed is broken into two measurements: reaction time (RT), the time it takes the subject to lift his finger from the home button after a target light goes on, and movement time (MT), the time it takes to move the finger from just above the home button to the target button.
36

Francis Galton in the nineteenth century believed that reaction time is associated with intelligence but could not prove it. He was on the right track after all. In modern studies, reaction time is correlated with the results from full-scale IQ tests; even more specifically, it is correlated with the
g
factor in IQ tests—in some studies,
only
with the
g
factor.
37
Movement time is much less correlated with IQ or with
g.
38
This makes sense: Most of the cognitive processing has been completed by the time the finger leaves the home button; the rest is mostly a function of small motor skills.

Research on reaction time is doing much to advance our understanding of the biological basis of
g.
For our purposes here, however, it also offers a test of the motivation hypothesis: The consistent result of many studies is that white reaction time is faster than black reaction time, but black movement time is faster than white movement time.
39
One can imagine an unmotivated subject who thinks the reaction time test is a waste of time and does not try very hard. But the level of motivation, whatever it may be, seems likely to be the same for the measures of RT and MT. The question arises: How can one be unmotivated to do well during one split-second of a test but apparently motivated during the next split-second? Results of this sort argue against easy explanations that appeal to differences in motivation as explanatory of the B/W difference.

U
NIFORM
B
ACKGROUND
B
IAS.
Other kinds of bias discussed in Appendix 5 include the possibility that blacks have less access to coaching than whites, less experience with tests (less “testwiseness”), poorer understanding of standard English, and that their performance is affected by white examiners. Each of these hypotheses has been investigated, for many tests, under many conditions. None has been sustained. In short, the testable hypotheses have led toward the conclusion that cognitive ability tests are not biased against blacks. This leaves one final hypothesis regarding cultural bias that does not lend itself to empirical evaluation, at least not directly.

Suppose our society is so steeped in the conditions that produce test bias that people in disadvantaged groups underscore their cognitive abilities on
all
the items on tests, thereby hiding the internal evidence of bias. At the same time and for the same reasons, they underperform in school and on the job in relation to their true abilities, thereby hiding the external evidence. In other words, the tests may be biased against disadvantaged groups, but the traces of bias are invisible because the bias permeates all areas of the group’s performance. Accordingly, it would be as useless to look for evidence of test bias as it would be for Einstein’s imaginary person traveling near the speed of light to try to determine whether time has slowed. Einstein’s traveler has no clock that exists independent of his space-time context. In assessing test bias, we would have no test or criterion measure that exists independent of this culture and its history. This form of bias would pervade everything.

To some readers, the hypothesis will seem so plausible that it is self-evidently correct. Before deciding that this must be the explanation for group differences in test scores, however, a few problems must be overcome. First, the comments about the digit span and reaction time results apply here as well. How can this uniform background bias suppress black reaction time but not the movement time? How can it suppress performance on backward digit span more than forward digit span? Second, the hypothesis implies that many of the performance yardsticks in the society at large are not only biased, they are all so similar in the degree to which they distort the truth—in every occupation, every type of educational institution, every achievement measure, every performance measure—that no differential distortion is picked up by the data. Is this plausible?

Other books

The Storycatcher by Hite, Ann
Company Ink by Samantha Anne
For the Sake of Elena by Elizabeth George
Driving to You (H1.5) by Marquita Valentine
Imperial Woman by Pearl S. Buck
Pulse by Hayes, Liv
Rumble Tumble by Joe R. Lansdale
The Man Who Stalked Einstein by Hillman, Bruce J., Ertl-Wagner, Birgit, Wagner, Bernd C.
Down by the River by Robyn Carr