Read Forensic Psychology For Dummies Online
Authors: David Canter
You can evaluate a test’s validity in two broad ways. One is the simple process of seeing what it does, called
face validity.
If the test asks questions that can be right or wrong, it’s measuring intelligence or some aspect of general knowledge. If it asks about your feelings towards religion, it’s measuring attitudes towards religion. If it asks about your drinking habits, it’s probably picking up something relevant to alcoholism.
But face validity can be misleading. For example, measuring instruments that look as if they’re of great relevance to criminality can turn out to be quite invalid. An interesting illustration of this problem is that many people assume that a lack of sophistication in moral reasoning is the hallmark of a criminal, but until this is proven this belief is merely a hypothesis. Many tests show that criminals can have their own moral perspective, which you may not share, but it’s not necessarily less sophisticated than yours.
Therefore, a second way to evaluate a test’s validity is known as
construct validity
. What ideas or ‘constructs’ is the test claiming to measure? This can be examined by comparing results using it with results from associated procedures that have similar constructs. For example, intelligence tests are supposed to give some indication of how well a person does at school or college, and so the results can be compared with examination marks. A perfect relationship isn’t expected because many other things can interfere with how well you do at school besides your intelligence, but at least some reasonable relationship indicates whether the test does what it says on the tin. An IQ test wouldn’t be of academic interest, if the scores people obtained on it didn’t relate reasonably closely to a person’s educational achievements.
To take a more extreme example, if serial criminals didn’t on average have higher psychopathy scores than people who lead blameless lives, you wouldn’t take the measure of psychopathy (that I describe in Chapter 10) very seriously.
Measuring validity by comparison with other assessments is a bit of a chicken and egg problem. In the early stages of the development of a test, its relationship to other measures raises questions about its additional value. Only over time, as the test becomes more widely used, does a history of associations build up to show its utility in a variety of different situations.
The tests listed in the earlier Table 9-1 (with the exception of the peculiar Szondi test) have been used over many years in many different situations. Consequently, plenty of examples exist of how useful they’ve been as well as illustrations of what they assess beyond the face validity of the test items themselves.
Standing up over time: Test robustness
Although you don’t find test robustness listed in textbooks on psychological tests, I think that it’s the attribute that leads to tests being used instead of being left on the shelf. By
test robustness,
I mean how easy they are to use and how difficult to misuse. Can they really stand up to being used in many different situations by many hundreds of different sorts of people without the results being compromised?
Although thousands of psychological tests have been developed over the last century or more, relatively few are in very wide use. These tests have demonstrated reliability, validity and robustness and are the ones that people have found most useful.
Achieving precision: The need for norms
Achieving precision in something as subjective and fluid as a person’s psychology is clearly problematic. With, for example, temperature, you can define fixed points for the benchmarks of measurement, such as when water freezes or boils. Variations have obvious meanings and have well understood implications. But how do you weigh how intelligent, extrovert or psychotic a person is? Faced with these questions, psychologists came up with a deceptively simple answer – compare the person’s results on the test with others in the relevant population.
The distribution of scores achieved on a test by a population of people who’ve taken it’s called the
norms
for a test.
This process of comparing an individual’s scores with norms is what makes these measuring instruments different from the sorts of informal questionnaires found in magazines, where journalists create arbitrary score values and give interpretations. The use of norms also distinguishes these measuring instruments from public opinion polls in which the interest is solely in the proportion of a given population who agree with a specified opinion.
The determination of the norms for a test, and the establishment of how scores vary from the average for a particular population, is known as the
standardisation
of a test. I describe this aspect in more detail in the earlier section ‘Standardising psychological tests’, where I illustrate how IQ norms were used in the defence of Daryl Atkins. IQ measures are a good example of standardised psychological tests because they’re so highly developed and widely used. Indeed, many of the principles of their use, especially the calibration of scores by comparison with norms, are applied to many other forms of psychological measurement.
To understand the applicability and utility of any psychological measurement, therefore, you need to know what norms are being used to calibrate it. Unlike IQ measures, some tests aren’t calibrated against the average for a relevant population but by comparison with one or more subgroups. This comparison may be done, for example, by establishing the scores that people diagnosed with particular mental illnesses get, or people who’ve done well in particular jobs. Those comparison scores provide benchmarks for assessing other people.
The appropriateness of a given test’s norms and how well their validity is established is a crucial aspect of their value. In particular, norms may not be appropriate in places different from where the test was originally developed. For instance, an indicator of psychopathy developed in the USA may have little value in countries with very different cultures, such as India, Nigeria, or Russia. Until the test is translated and standardised in those different contexts, its use may be counterproductive.
Creating and Giving Psychological Tests
Not just anyone can invent a psychological test or administer it. Creating such measuring instruments isn’t the same as a journalist thinking up questions for a magazine to indicate ‘how good you are in bed’! Nor are psychological tests like opinion poll surveys in which you’re asked a single question such as, ‘Would you vote for the president if he stood again?’ from which percentages across representative samples are used to test the public mood.
Anyone giving a psychological test has to know something about how it was developed and how the results can be interpreted. The test has to be given under special conditions that relate to its intended use and the background to the test. A major industry is involved in creating tests and standardising them, and then setting up training courses for people who want to use the tests.
Very broadly, three categories of test exist that determine who can administer them:
Tests that can be used by anyone with a little background knowledge, such as general attitude surveys.
Tests that require some university qualification in psychology, such as general personality measures.
Tests that require specific training in their use and application. All the tests listed in the earlier Table 9-1 are of this kind. Some tests may require intensive training over many months, whereas others may require only a few days training.