Read Manufacturing depression Online
Authors: Gary Greenberg
Frances Kelsey, an FDA physician-bureaucrat, was not so sure. In 1960, when Richardson-Merrell applied for a license to sell thalidomide in the United States, she asked the company to supply more information.
She was concerned about reports
of peripheral neuritis, irreversible nerve damage in patients taking thalidomide, and
pointed out that there were contradictions in the safety data that might bear on this effect. She also noted that the company had not provided information on the drug’s effect on the developing fetus—not even on whether or not it crossed the placental barrier—a crucial absence given the fact that the company was pushing the drug as a remedy for women made jittery by their newly discovered pregnancies. Merrell was much slower to respond to Kelsey than they were to distribute 2.5 million doses of the drug to more than 1,200 doctors for them to use on a trial basis. Twenty thousand American patients received the drug.
Working for Merrell, as for most pharmaceutical companies at the time, was pure gravy. The drugs were free, the doctors didn’t have to report results if they didn’t want to, and even if they did, they wouldn’t have to go to all the bother of gathering data or writing up the results.
Merrell’s medical director
, like medical directors at most drug firms, was glad to provide them with completed manuscripts attesting to the drug’s effectiveness and ready for their signature and to send them on to the medical journals, where they would become part of the record establishing the value of the drug.
Even as Merrell was ramping up its marketing efforts, however, trouble was brewing. Doctors in Australia, England, and Germany were seeing not only peripheral neuritis, but something much more disturbing in the offspring of their thalidomide patients: a sudden increase in cases of phocomelia, a birth defect in which limbs fail to develop, and which leaves infants with hands and feet growing directly from their shoulders and hips. By the time epidemiological and animal studies, conducted over Grünenthal’s objections, had confirmed the link between thalidomide and phocomelia, thousands of European children had been born with massive deformities. In March 1962, Merrell, after two years of heated argument with Kelsey, finally withdrew its application for thalidomide.
The European tragedy might have passed unnoticed in the United States, where fewer than twenty thalidomide babies were born. But Kefauver’s staff recognized the opportunity in the debacle
and, three days after his proposal hit the Senate floor in July 1962, they informed a
Washington Post
reporter about what had happened overseas.
The story was reported on the front page
. In short order legislators were falling over one another to do something about the drug industry, and there just happened to be a bill ready for their approval. The Kefauver-Harris Drug Amendments to the Pure Food and Drug Act, their efficacy clause intact, passed in 1962.
The new law had absolutely nothing to do with thalidomide. Even Roman Hruska—the Nebraska senator who had once defended a Richard Nixon Supreme Court nominee who had been called mediocre by insisting that there was a place for mediocrity in public life—could see that “
thalidomide was already barred
and the public was protected under the 1938 act.” But no matter. Kefauver had gotten his way.
For the first time
pharmaceutical companies were required to prove to the FDA that their drugs worked in order to get a license to sell them.
Turning this requirement to corporate advantage was easier than you might think, thanks in part to Justice Holmes. The new law had to address his original worry about congressional reaching into the realm of opinion. This meant that it wasn’t enough for Congress to say that science had made it possible to sort out fact from opinion; it had to specify how those facts would be established.
The answer was that “substantial evidence
…consisting of adequate and well-controlled investigations…by experts qualified by scientific training and experience” would establish the efficacy of a drug.
That seemingly innocuous phrase—“substantial evidence”—contained a huge break for drug companies. Lawmakers had considered a different standard—the
preponderance of evidence
. The difference, as one senator put it, was that to require only substantial proof meant that a drug could be deemed effective “
even though there may be preponderant evidence
to the contrary based upon equally reliable studies.” Especially after the FDA determined that
two independent trials with statistically significant results in favor of the drug constituted substantial evidence, this meant that a drug up for approval could have as many do-overs as a drug company wanted to pay for. So long as the research eventually yielded evidence of efficacy, the failures would remain off the books. This is why antidepressants have been approved even though so many studies have shown them to be ineffective.
That wasn’t the only way that Kefauver-Harris turned into a sweet deal for the drug companies. They also had in mind a way to address the requirement for adequate and well-controlled investigations: the randomized clinical trial, the method used by my doctors at Mass General. This approach, as the industry soon figured out, could easily be made to say more than it really said and do something quite different from what it was intended to do. Both the RCT and the statistics used to assay its outcome are much better at telling scientists when a treatment
doesn’t
work than when it does, to
disprove
rather than to prove drug efficacy.
The eagerness among drug doctors to get more out of the RCT than it is equipped to provide features in the earliest attempts to sell it as the method for verifying drug efficacy. In explaining why he thought regulators should adopt the RCT,
Louis Lasagna cited a momentous event
in medical history. In 1747, Lasagna recalled, the ship’s doctor on the HMS
Salisbury,
James Lind, decided to check out an old and unproven theory that acids would cure scurvy. Since the
Salisbury
was returning to England after a long time at sea, it had no shortage of subjects. Lind divided a dozen scurvy sailors into six pairs. “
Their cases were as similar
as I could have them,” he later wrote. “They lay together in one place and had one diet common to all.” Lind randomly assigned each pair to one of six treatments, which included a dose of vinegar and a garlic concoction—and, fatefully, oranges and lemons. Most of the sailors stayed ill, but within a week, one of the citrus-eaters was so well that he “was appointed nurse to the rest of the sick,” and by June 16, when the
Salisbury
pulled into Plymouth, the other was fully recovered.
That was a clever experiment. But even if he had randomly chosen the sailors who received the fruit and tried to keep their other conditions equal, Lind’s trial was not really controlled. He did not account for a crucial variable—the possibility that the placebo effect had cured the sailors. He knew who was getting which treatment, and he had a stake in the outcome. Even if his reports were honest, his belief might have been contagious, his enthusiasm the cause of the cure’s success. He couldn’t say with certainty that something in the fruit had cured the sailors because he did not control for credulity—his or his patients’.
That might seem like an unfair criticism—after all, doctors of the time didn’t know that most of their medicines were placebos—but the confounding power of the placebo effect was understood by at least one eighteenth-century scientist. In 1784, Benjamin Franklin was living in Paris when Louis XVI tapped him to head a scientific commission investigating a claim that had all of Europe in a stir. Franz Anton Mesmer was telling people that he had discovered a force in the universe as real and important as gravity. He called it “animal magnetism,” and in parlors across the continent he was demonstrating how a physician could harness it in the service of healing. Patients swore that their rheumatism, skin ailments, asthma, and nervousness had been cured by Mesmer. The mesmerism craze alarmed the king, and he charged Franklin with the task of determining whether or not animal magnetism really existed.
Franklin and Mesmer had different ideas
about how to answer this question. Mesmer suggested an experiment much like Lind’s: take two patients with the same disease, mesmerize one of them and not the other, make all other conditions equal, and see who fared better. But Franklin, the wily rationalist, understood that there was a bigger obstacle to the truth: self-interest, especially that of the doctor and his patient. He designed a test to eliminate human subjectivity from the experiment.
Franklin’s proposal also eliminated Mesmer, who, upon hearing of it, withdrew from the proceedings. He sent another mesmerist
to Franklin’s house on the appointed night, willing patients in tow. In front of the commission, he focused the magnetism on parts of their bodies. Asked to locate where he was directing the energy, the patients—all women—responded accurately. Then they were blindfolded. When the mesmerist repeated the procedure, the women located the sensations, according to the commission’s report to the king, “
at hazard, and in parts very distant
from those which were the object of magnetism.” Other variations of the blindfold test yielded the same results. “It was natural to conclude,” Franklin told the king, “that these sensations, real or pretended, were determined by the imagination.”
Lind claimed to have
proved
that citrus cured scurvy, but Franklin seems to have understood that this was more than an experiment could say, and that there is an inexhaustible supply of variables, known unknowns and unknown unknowns alike, that might have been at work in mesmerism. He controlled for the one he deemed most likely—imagination—and when he did so, there was a difference in outcome. Or, to put it another way, he started with the idea that there would be no difference between a blindfolded and a non-blindfolded treatment—and
disproved
it. He didn’t
prove
anything; his conclusion from the proceedings might have been natural, but it was also inferential.
This may seem like a distinction without a difference, especially when you consider the different purposes of these experiments: Lind’s to ratify and Franklin’s to debunk. But what Franklin seems to have understood was that enthusiasm for a treatment wasn’t just another variable in the pursuit of scientific knowledge. It was the enemy. Self-interest, hope, the ineffable qualities of the doctor-patient relationship—in short, subjectivity—would always haunt our attempts to understand the world, and the role of the experiment was to rein in its effects, whether by tying on a blindfold or acknowledging the limits of a controlled experiment to establish the truth.
The modern RCT is much more sophisticated than Franklin’s
experiment. But it begins with the same recognition of our limited ability to circumscribe credulity and follows the same logic; it starts with a null hypothesis—that the treatment won’t work—attempts to confirm it, draws inferences from the results, and then tries to strengthen those inferences by replicating the experiment. In citing Lind, Lasagna seems to have forgotten that
an RCT is much more suited
to disproving than to proving, that it can only give us probabilities, that its primary purpose is negative: to rain on an experimenter’s parade, to put the kibosh on therapeutic enthusiasm rather than to inflame it.
In 1928, twenty-five years before Lasagna touted the virtues of the blinded RCT, and just a year before the stock market crashed, nervous investors wondered if something was wrong in the house of Morgan. Anne Morgan, sister of J.P. and usually no less retiring than the rest of the family, had suddenly turned up as a paid spokesman for Old Gold cigarettes. She reported to newspaper readers that she had “
taken the blindfold test
, smoked four brands of cigarets [
sic
], and found that ‘the smoothness’ of one cigaret [
sic
] was ‘so obvious.’” Miss Morgan urged other smokers to repeat the experiment in the privacy of their own parlors. Advertisers had evidently figured out that by reassuring the consumer that his tastes were not a figment of his imagination, that his own ever-unreliable subjectivity had been neutralized, they could build brand loyalty. They had hit upon a way to use the blindfold test to stoke enthusiasm rather than to curb it. (Miss Morgan’s finances, as it turned out, were sound; she apparently donated her thousand-dollar honorarium to charity.)
A decade later
, Cornell scientist Harry Gold (no relation to Old) was trying to figure out whether or not xanthines, a group of stimulants that included caffeine, really deserved their reputation as a remedy for angina. He realized that he couldn’t trust the data provided by doctors studying the question. They asked leading questions, assigned patients nonrandomly to get the drug or placebo,
and interpreted ambiguous results in a way that favored the drugs. It wasn’t enough, Gold concluded, to control for patients’ credulity; doctors also had to be placed in the dark. Experiments had to be double-blind, Gold said.
He cited the Old Gold campaign
as the inspiration for the method’s name.
Gold was using the blindfold test as Franklin had intended—to impose restraint. He would probably be discomfited to see the ease with which his method has been used to create certainty rather than to limit uncertainty. But perhaps no one would be more upset at the way that RCTs have become one of the drug industry’s greatest marketing tools than the British geneticist who invented the mathematical language in which RCTs are reported—a language intended, like the experiments themselves, to rule out rather than to rule in.
Ronald Aylmer Fisher, who was so blind that he had to do his calculations in his head, developed modern statistics while working for an agricultural research institute just after the First World War.
Fisher was trying to sort out fact from opinion
when it came to crop yields. A farmer could plant two different varieties of grain and at harvesttime reap a much bigger crop from one of them, but most scientists understood that this didn’t necessarily mean one strain was more vigorous than the other. Maybe the soil varied from plot to plot, or the exposure to sunlight, or the population of varmints. Without isolating those variables, there was no way to know if the change was due to what the farmer did on purpose or to what just happened to occur.