From an entirely unexpected quarter, comes a story of the shortcomings of computer modelling, in this case in the America's cup. It is a great example of how models reflect the biases of their authors. In this case, the author assumed that the fastest upwind path was the shortest path (ie with the shallowest possible tacks). It turns out that with the changing technology of boats, particularly the hydrofoil, a longer but higher velocity path was more optimal, but the model refused to consider that solution because it was programmed not to.
Archive for the ‘Science’ Category.
Want to Make Your Reputation in Academia? Here is an Important Class of Problem For Which We Have No Solution Approach
Here is the problem: There exists a highly dynamic, multi- multi- variable system. One input is changed. How much, and in what ways, did that change affect the system?
Here are two examples:
- The government makes a trillion dollars in deficit spending to try to boost the economy. Did it do so? By how much? (This Reason article got me thinking about it)
- Man's actions increase the amount of CO2 in the atmosphere. We are fairly confident that this has some warming effect, but how how much? There are big policy differences between the response to a lot and a little.
The difficulty, of course, is that there is no way to do a controlled study, and while one's studied variable is changing, so are thousands, even millions of others. These two examples have a number of things in common:
- We know feedbacks play a large role in the answer, but the system is so hard to pin down that we are not even sure of the sign, much less the magnitude, of the feedback. Do positive feedbacks such as ice melting and cloud formation multiply CO2 warming many times, or is warming offset by negative feedback from things like cloud formation? Similarly in the economy, does deficit spending get multiplied many times as the money gets respent over and over, or is it offset by declines in other categories of spending like business investment?
- In both examples, we have recent cases where the system has not behaved as expected (at least by some). The economy remained at best flat after the recent stimulus. We have not seen global temperatures increase for 15-20 years despite a lot of CO2 prodcution. Are these evidence that the hypothesized relationship between cause and effect does not exist (or is small), or simply evidence that other effects independently drove the system in the opposite direction such that, for example, the economy would have been even worse without the stimulus or the world would have cooled without CO2 additions.
- In both examples, we use computer models not only to predict the future, but to explain the past. When the government said that the stimulus had worked, they did so based on a computer model whose core assumptions were that stimulus works. In both fields, we get this sort of circular proof, with the output of computer models that assume a causal relationship being used to prove the causal relationship
So, for those of you who may think that we are at the end of math (or science), here is a class of problem that is clearly, just from these two examples, enormously important. And we cannot solve it -- we can't even come close, despite the hubris of Paul Krugman or Michael Mann who may argue differently. We are explaining fire with Phlogiston.
I have no idea where the solution lies. Perhaps all we can hope for is a Goedel to tell us the problem is impossible to solve so stop trying. Perhaps the seeds of a solution exist but they are buried in another discipline (God knows the climate science field often lacks even the most basic connection to math and statistics knowledge).
Maybe I am missing something, but who is even working on this? By "working on it" I do not mean trying to build incrementally "better" economics or climate models. Plenty of folks doing that. But who is working on new approaches to tease out relationships in complex multi-variable systems?
After years of being demonized by friends and family for saying that the moon is not bigger when it is on the horizon, that it is just an optical illusion, I am happy to be vindicated
I thought the various explanations were fascinating, though I think the commenter's suggestion that it is a glitch in the matrix is the most compelling.
Hat tip: Tom Kirkendall
A reader sends me a story of global warming activist who clearly doesn't know even the most basic facts about global warming. Since this article is about avoiding appeals to authority, so I hate to ask you to take my word for it, but it is simply impossible to immerse oneself in the science of global warming for any amount of time without being able to immediately rattle off the four major global temperature data bases (or at least one of them!)
I don't typically find it very compelling to knock a particular point of view just because one of its defenders is a moron, unless that defender has been set up as a quasi-official representative of that point of view (e.g. Al Gore). After all, there are plenty of folks on my side of issues, including those who are voicing opinions skeptical of catastrophic global warming, who are making screwed up arguments.
However, I have found over time this to be an absolutely typical situation in the global warming advocacy world. Every single time I have publicly debated this issue, I have understood the opposing argument, ie the argument for catastrophic global warming, better than my opponent. In fact, I finally had to write a first chapter to my usual presentation. In this preamble, I outline the case and evidence for manmade global warming so the audience could understand it before I then set out to refute it.
The problem is that the global warming alarm movement has come to rely very heavily on appeals to authority and ad hominem attacks in making their case. What headlines do you see? 97% of scientists agree, the IPCC is 95% sure, etc. These "studies", which Lord Monkton (with whom I often disagree but who can be very clever) calls "no better than a show of hands", dominate the news. When have you ever seen a story in the media about the core issue of global warming, which is diagnosing whether positive feedbacks truly multiply small bits of manmade warming to catastrophic levels. The answer is never.
Global warming advocates thus have failed to learn how to really argue the science of their theory. In their echo chambers, they have all agreed that saying "the science is settled" over and over and then responding to criticism by saying "skeptics are just like tobacco lawyers and holocaust deniers and are paid off by oil companies" represents a sufficient argument.** Which means that in an actual debate, they can be surprisingly easy to rip to pieces. Which may be why most, taking Al Gore's lead, refuse to debate.
All of this is particularly ironic since it is the global warming alarmists who try to wrap themselves in the mantle of the defenders of science. Ironic because the scientific revolution began only when men and women were willing to reject appeals to authority and try to understand things for themselves.
** Another very typical tactic: They will present whole presentations without a single citation. But make one statement in your rebuttal as a skeptic that is not backed with a named, peer-reviewed study, and they will call you out on it. I remember in one presentation, I was presenting some material that was based on my own analysis. "But this is not peer-reviewed" said one participant, implying that it should therefore be ignored. I retorted that it was basic math, that the data sources were all cited, and they were my peers -- review it. Use you brains. Does it make sense? Is there a flaw? But they don't want to do that. Increasingly, oddly, science is about having officially licensed scientists delivery findings to them on a platter.
Former vice president Al Gore on Monday called for making climate change "denial" a taboo in society.
“Within the market system we have to put a price on carbon, and within the political system, we have to put a price on denial,” Gore said at the Social Good Summit New York City.
Incredibly, the suggestion of introducing taboos and penalties in a scientific debate is coming from the side that claims to be the great defenders of science.
Kevin Drum preahces against the evils of teen tanning, which he follows with a conclusion that obviously Republicans are evil for opposing a tanning tax
Indoor tanning, on the other hand, is just plain horrifically bad. Aaron Carroll provides the basics:indoor tanning before age 25 increases the risk of skin cancer by 50-100 percent, and melanoma risk (the worst kind of skin cancer) increases by 1.8 percent with each additional tanning session per year. Despite this, the chart on the right shows the prevalence of indoor tanning among teenagers. It's high! Aaron is appalled:
This is so, so, so, so, so, so, so bad for you. Why don’t I see rage against this in my inbox like I do for diet soda? Why can’t people differentiate risk appropriately?
And who would fight a tax on this?
I am not going to get into the argument here (much) about individual choice and Pigovian taxes (by the way, check out the comments for a great example of what I call the Health Care Trojan Horse, the justifying of micro-regulation of our behavior because it might increase government health care costs).
I want to write about risk. Drum and Carroll are taking the high ground here, claiming they are truly the ones who understand risk and all use poor benighted folks do not. But Drum and Carroll repeat the mistake in this post which is the main reason no one can parse risk.
A key reason people don't understand risk is that the media talks about large percent changes to a small risk, without ever telling us the underlying unadjusted base risk. A 100% increase in a risk may be trivial, or it might be bad. A 100% increase in risk of death in a car accident would be very bad. A 100% increase in the risk of getting hit by lightning would be trivial.
In this case, it's probably somewhere in between. The overall lifetime risk of melanoma is about 2%. This presumably includes those with bad behavior so the non-tanning number is likely lower, but we will use 2% as our base risk understanding that it is likely high. The 5-year survival rate from these cancers (which by the way tend to show up after the age 60) is 90+% if you are white -- if you are black it is much lower (I don't know if that is a socio-economic problem or some aspect of the biology of darker skin).
So a teenager has a lifetime chance of dying early from melanoma of about 0.2%. A 50% increase to this would raise this to 0.3%. An extra one in one thousand chance of dying early from something likely to show up in old age -- is that "so, so, so, so, so, so, so bad"? For some yes, for some no. That is what individual choice is all about.
But note the different impacts on perception.
- Statement 1: "Teen tanning increases dangerous melanoma skin cancer risk by 50".
- Statement 2: "Teen tanning adds an additional 1 in 1000 chance of dying of skin cancer in old age."
Both are true. Both should likely be in any article on the topic. Only the first ever is included, though.
I spent years, before I burned out on the task, picking over bad climate studies, and at the time reached the conclusion that there was something about the climate science field that was anomalous, tolerating so much bad science, bad sampling methodology, and bad statistical approaches.
However, now I am coming to the conclusion that perhaps most studies in every field are dominated by this same crap. Here is an example, from the NTSB on busses.
I am happy to see the public school system coming in for much-deserved criticism. I don't have anything to add to this article that I have not already said about schools many times. But I want to make one complaint about a chart used in the blog post:
SAT scores are a terrible metric for measuring academic performance over time.
First, I am not at all convinced that the test scoring does not shift over time (no WAY my son had a higher score than me, LOL).
But perhaps the most important problem is that all students don't take the SAT -- it is a choice. Shifts in the mix of kids taking the test -- for example, if over time more kids get interested in college so that more marginal academic kids take the test -- then the scores are going to move solely based on mix shifts. Making this more complicated, there is at least one competitive test (the ACT) which enjoys more popularity in some states than others, so the SAT will represent an incomplete and shifting geographic mix of the US. Finally, as students have gotten smarter about this whole process**, they gravitate to the ACT or the SAT based on differing capabilities, since they test in different ways.
To me, all this makes SAT scores barely more scientific than an Internet poll.
** If you have not had a college-bound student recently, you will have to trust me on this, but parents can spend an astounding amount of time trying to out-think this stuff. And that is here in flyover country. Apparently private school parents on the East Coast can be absurd (up to and including hiring consultants for 6 figures). A few years ago it was in vogue to try to find your kid a unique avocation. Violin was passe -- I knew kids playing xylophone and the bagpipes. A friend of mine at a high profile DC private school used to have fun with other parents telling them his son was a national champion at falconry, the craziest thing he could make up on the spur of the moment at a cocktail party. Other parents would sigh enviously, wishing they had thought of that one for their kid.
Glenn Reynolds linked this titillating headline:
NINE PERCENT OF YALE STUDENTS SURVEYED SAY THEY’VE ACCEPTED MONEY FOR SEX
Of course, when you read the article (of course I clicked through, I have no pride), you find that:
- The sample size is approximately 40
- The sample was from a group of people who self-selected to attend a seminar by the owner of a sex-toy business
The "3% who participated in bestiality" is actually 1 person out of 40 who have a self-selected interest in pushing sexual boundaries. With a little larger sample size, a bit poorer math, and a bit more work goal-seeking to a desired outcome, this might almost meet the standards of climate research.
Which is all a relief to me -- after 30+ years of being a Yale hater, I was afraid I might have to admit it was a more interesting place than I thought.
Anthony Watt has a nice catalog of past predictions of doom (e.g. running out of oil, food, climate issues, etc). It really would be funny if not such a serious and structural issue with the media. I would love to see someone like the NY Times have a sort of equivalent of their reader advocate whose job was to go through past predictions published in the paper and see how they matched up to reality. If I had more time, it is the blog I would like to start.
Update: One of his readers Dennis Wingo took the resource depletion table from Ehrlich's Limits to Growth and annotated it -- the numbers in red show the resources Ehrlich predicted we should already run out of.
However, rather than ever, ever going back and visiting these forecasting failures and trying to understand the structural problem with them, the media still runs back to Ehrlich as an "expert".
We already have way too many time standards, including:
- TAI, time based on an atomic clock, which ignores all motion of the Earth
- UT0 and UT1, time based on precise measurement of the Earth’s rotation
- GPS, the time standard used by GPS satellites
- UTC, the standard used in computing, which is like TAI but with leap seconds to keep it in sync with Earth
- TDT, TBT, TCB, and TCG, which are all even worse
This leads to all kinds of little headaches, particularly for programmers. For example, the clock in your smartphone’s GPS is 16 seconds out of sync with the phone’s system clock. This is because the system clock uses Coordinated Universal Time (which has leap seconds), but GPS time doesn’t. They were in sync in January of 1980 and probably never will be again.
I have never been convinced that IQ tests have really distinguished core intelligence from education. I scored much better on IQ tests after I practiced and read about how to tackle certain types of problem.
It is for this reason that I have always assumed the Flynn effect to be due to education, not changes in native intelligence.
Soon to be the subject of a Michael Bay film, I am sure. Can you point a machine gun at the ground in order to fly.
Six Italian scientists and an ex-government official have been sentenced to six years in prison over the 2009 deadly earthquake in L'Aquila.
A regional court found them guilty of multiple manslaughter.
Prosecutors said the defendants gave a falsely reassuring statement before the quake, while the defence maintained there was no way to predict major quakes.
The 6.3 magnitude quake devastated the city and killed 309 people.
It took Judge Marco Billi slightly more than four hours to reach the verdict in the trial, which had begun in September 2011.
The seven - all members of the National Commission for the Forecast and Prevention of Major Risks - were accused of having provided "inexact, incomplete and contradictory" information about the danger of the tremors felt ahead of 6 April 2009 quake, Italian media report.
This is what I call the layman's "CSI" view of science, which assumes that certainty is possible in analyzing and forecasting complex systems. I am not going to blame the victim here, but I will note that scientists have to some extent made this situation far worse by insisting that they have levels of certainty they do not have, particularly in highly charged political debates (e.g. economics and climate).
Harvard physicist Luboš Motl argues it will give scientists roughly the same incentives doctors have in areas with lots of malpractice suits:
The verdict de facto lionizes crackpots who were screaming that there had to be a large earthquake and they just happened to be right in that case – while isomorphic and sometimes the very same crackpots are wrong in 99.9% of other cases in which they cry wolf – and it condemns the scientific method. They are wrong in 99.9% of cases because their predictive framework has nothing to do with science – it's all about a psychopathological paranoia – but even a broken clock is right twice a day.
The lesson for the scientists is clear: If you are a scientist who is qualified in a discipline that has implications for the safety of people, you must always recommend precautionary measures to be taken even if you conclude that the probability that something bad will happen is tiny. Italy may expect much more hysteria in various similar science-related situations than it has had so far because a court has declared a war on everyone who is honest and balanced.
Can you imagine that this sick logic would be applied e.g. to surgeons? Surgeons could spend 6 years in prison after every death of a patient whom they or others were optimistic about. It's just insane. People sometimes die, natural catastrophes sometimes occur, and it's just impossible to identify a human culprit in most cases. Only if a professional makes a mistake in which he or she has demonstrably violated some established and functional rules to reduce the risk – and whether or not this was the case may only be determined by another expert – he or she could be considered co-responsible for the deaths.
MSNBC has worked hard to be the official TV channel of the "reality-based community" which so often lectures us skeptics on how we are all anti-science and stuff. (source)
The author of XKCD has a site now that answers odd science questions. Here is mine: If, at a mass of over 200 pounds, Felix Baumgartner was indeed be accelerated faster than light and pointed at the Earth, what would happen?
Isaac Asimov has a short story mystery something like this, with a pool ball accelerated to light speed.
This is pretty cool -- what look like rounded river rocks and sedimentary conglomerates on Mars.
Quick - in your last fill up, how much did you pay for gas? About how many gallons did you use?
If you are like most people, you can probably come pretty close to this. I paid somewhere just north of $4.00 for about 18 gallons.
OK, second set of questions: On your last electric bill, how much did you pay per KwH? How many KwH did it take to run your dishwasher last night?
Don't know? I don't think you are alone. I don't know the answers to the last questions. Part of the reason is that gas prices are posted on every corner, and we stare at a dial showing us fuel used every time we fill up. There is nothing comparable for electricity -- particularly for an electric car.
I understand some inherent appeals to electric cars. They are fun to drive, kind of quiet and stealthy like KIT from Knight Rider. They are really torquy and have nice acceleration. There is no transmission and gear changing. All cool and awesome reasons to buy an electric car.
However, my sense is that the main appeal of electric cars is that because we don't see the fuel price on the corner, and because we don't stare at a spinning dial as electrons are flowed into the car, we pretend it is not costing us anything to fill up. Out of sight is out of mind. Heck, even experienced car guys who should know better take this attitude. Popular Mechanics editor Jim Meigs wrote to Glenn Reynolds, re: the Volt:
Others might like the notion of going a month or two without filling the tank
This drives me crazy. Of COURSE you are filling the freaking tank. You are just filling the lead-acid (or lithium-ion) one with electrons rather than filling the hollow steel one with hydrocarbon molecules. The only difference is that you don't stand there watching the meter spin. But that should not mean that we pretend we are not filling the car and paying a cost to do so.
By the way, if you have read me before, you know I also have a problem with the EPA equivalent mileage standards for electric cars, which basically inflate the numbers by a factor of three by ignoring the second law of thermodynamics. This fraudulent mileage number, combined with the EPA's crazy-high new mileage standards, represents an implicit subsidy, almost a mandate, for electric cars that gets little attention. And that will have zero effect on energy usage because the numbers are gamed.
I used to scoff at how Ayn Rand turned the word "humanitarian" in the Fountainhead into a term of derision. I didn't think it was justified to assume anyone adopting the humanitarian title had to be evil. Surely, for example, Andrew Carnegie with his philanthropy and opposition to war could be considered a positive humanitarian?
But maybe she was on to something. At least as far as Greenpeace is concerned:
According to the World Health Organization between 250,000 to 500,000 children become blind every year due to vitamin A deficiency, half of whom die within a year of becoming blind. Millions of other people suffer from various debilitating conditions due to the lack of this essential nutrient.
Golden Rice is a genetically modified form of rice that, unlike conventional rice, contains beta-Carotene in the rice kernel. Beta-Carotene is converted to vitamin A in humans and is important for eyesight, the immune system, and general good health. Swiss scientist and humanitarian Dr. Ingo Potrykus and his colleagues developed Golden Rice in 1998. It has been demonstrated in numerous studies that golden rice can eliminate vitamin A deficiency.
Greenpeace and its allies have successfully blocked the introduction of golden rice for over a decade, claiming it may have “environmental and health risks” without ever elaborating on what those risks might be. After years of effort the Golden Rice Humanitarian Project, led by Dr. Potrykus, The Rockefeller Foundation and others were unable to break through the political opposition to golden rice that was generated directly by Greenpeace and its followers.
To their credit, Bill and Melinda Gates are giving it another try.
I suppose I should have guessed this, but it never occurred to me. There seems to be a problem with growing weed resistance to herbicides that is entirely parallel to growing antibiotic resistance of certain germs.
Most folks, and I would include myself in this, have terrible intuitions about probabilities and in particular the frequency and patterns of occurance in the tail ends of the normal distribution, what we might call "abnormal" events. This strikes me as a particularly relevant topic as the severity of the current drought and high temperatures in the US is being used as absolute evidence of catastrophic global warming.
I am not going to get into the global warming bits in this post (though a longer post is coming). Suffice it to say that if it is hard to accurately directly measure shifts in the mean of climate patterns given all the natural variability and noise in the weather system, it is virtually impossible to infer shifts in the mean from individual occurances of unusual events. Events in the tails of the normal distribution are infrequent, but not impossible or even unexpected over enough samples.
What got me to thinking about this was the third perfect game pitched this year in the MLB. Until this year, only 20 perfect games had been pitched in over 130 years of history, meaning that one is expected every 7 years or so (we would actually expect them more frequently today given that there are more teams and more games, but even correcting for this we might have an expected value of one every 3-4 years). Yet three perfect games happened, without any evidence or even any theoretical basis for arguing that the mean is somehow shifting. In rigorous statistical parlance, sometimes shit happens. Were baseball more of a political issue, I have no doubt that writers from Paul Krugman on down would be writing about how three perfect games this year is such an unlikely statistical fluke that it can't be natural, and must have been caused by [fill in behavior of which author disapproves]. If only the Republican Congress had passed the second stimulus, we wouldn't be faced with all these perfect games....
Postscript: We like to think that perfect games are the ultimate measure of a great pitcher. This is half right. In fact, we should expect entirely average pitchers to get perfect games every so often. A perfect game is when the pitcher faces 27 hitters and none of them get on base. So let's take the average hitter facing the average pitcher. The league average on base percentage this year is about .320 or 32%. This means that for each average batter, there is a 68% chance for the average pitcher in any given at bat to keep the batter off the base. All the average pitcher has to do is roll these dice correctly 27 times in a row.
The odds against that are .68^27 or about one in 33,000. But this means that once in every 33,000 pitcher starts (there are two pitcher starts per game played in the MLB), the average pitcher should get a perfect game. Since there are about 4,860 regular season starts per year (30 teams x 162 games) then average pitcher should get a perfect game every 7 years or so. Through history, there have been about 364,000 starts in the MLB, so this would point to about 11 perfect games by average pitchers. About half the actual total.
Now, there is a powerful statistical argument for demonstrating that great pitchers should be over-weighted in perfect games stats: the probabilities are VERY sensitive to small changes in on-base percentage. Let's assume a really good pitcher has an on-base percentage against him that is 30 points less than the league average, and a bad pitcher has one 30 points worse. The better pitcher would then expect a perfect game every 10,000 starts, while the worse pitcher would expect a perfect game every 113,000 starts. I can't find the stats on individual pitchers, but my guess is the spread between best and worst pitchers on on-base percentage against has more than a 60 point spread, since the team batting average against stats (not individual but team averages, which should be less variable) have a 60 point spread from best to worst. [update: a reader points to this, which says there is actually a 125-point spread from best to worst. That is a different in expected perfect games from one in 2,000 for Jared Weaver to one in 300,000 for Derek Lowe. Thanks Jonathan]
Update: There have been 278 no-hitters in MLB history, or 12 times the number of perfect games. The odds of getting through 27 batters based on a .320 on-base percentage is one in 33,000. The odds of getting through the same batters based on a .255 batting average (which is hits but not other ways on base, exactly parallel with the definition of no-hitter) the odds are just one in 2,830. The difference between these odds is a ratio of 11.7 to one, nearly perfectly explaining the ratio of no-hitters to perfect games on pure stochastics.
Great idea, and consistent with my growing skepticism of all published research given a general bias towards positive results.
If you’re a psychologist, the news has to make you a little nervous—particularly if you’re a psychologist who published an article in 2008 in any of these three journals:Psychological Science, the Journal of Personality and Social Psychology,or the Journal of Experimental Psychology: Learning, Memory, and Cognition.
Because, if you did, someone is going to check your work. A group of researchers have already begun what they’ve dubbedthe Reproducibility Project, which aims to replicate every study from those three journals for that one year. The project is part of Open Science Framework, a group interested in scientific values, and its stated mission is to “estimate the reproducibility of a sample of studies from the scientific literature.” This is a more polite way of saying “We want to see how much of what gets published turns out to be bunk.”
I have written a number of times before that having only a few page-limited scientific journals is creating a bias towards positive results that can't be replicated
During a decade as head of global cancer research at Amgen, C. Glenn Begley identified 53 “landmark” publications — papers in top journals, from reputable labs — for his team to reproduce. Begley sought to double-check the findings before trying to build on them for drug development.
Result: 47 of the 53 could not be replicated. He described his findings in a commentary piece published on Wednesday in the journal Nature.
This is not really wildly surprising. Consider 20 causal relationships that don’t exist. Now consider 20 experiments to test for this relationship. Likely 1 in 20 will show a false positive at the 95% certainty level — that’s what 95% certainty means. All those 1 in 20 false positives get published, and the other studies get forgotten.
Actually, XKCD did a better job of making this point. It's a big image so I won't embed it but check it out.
Also, Kevin Drum links a related finding that journal retractions are on the rise (presumably from false positives that could not be replicated or were the results of bad process).
In 1890, there were technological and cost reasons why only a select few studies were culled into page-limited journals. But that is not the case today. Why do we still tie science to the outdated publication mechanism. Online publication would allow publication of both positive and negative results. It would also allow mechanisms for attaching critiques and defenses to the original study as well as replication results. Sure, this partially breaks the academic pay and incentive system, but I think most folks are ready to admit that it needs to be broken.
This is a pretty well-known non-secret among about anyone who does academic research, but Arnold Kling provides some confirmation that there seems to be a tremendous bias towards positive results. In short, most of these can't be replicated.
A former researcher at Amgen Inc has found that many basic studies on cancer -- a high proportion of them from university labs -- are unreliable, with grim consequences for producing new medicines in the future.
During a decade as head of global cancer research at Amgen, C. Glenn Begley identified 53 "landmark" publications -- papers in top journals, from reputable labs -- for his team to reproduce. Begley sought to double-check the findings before trying to build on them for drug development.
Result: 47 of the 53 could not be replicated. He described his findings in a commentary piece published on Wednesday in the journal Nature.
"It was shocking," said Begley, now senior vice president of privately held biotechnology company TetraLogic, which develops cancer drugs. "These are the studies the pharmaceutical industry relies on to identify new targets for drug development. But if you're going to place a $1 million or $2 million or $5 million bet on an observation, you need to be sure it's true. As we tried to reproduce these papers we became convinced you can't take anything at face value."...
Part way through his project to reproduce promising studies, Begley met for breakfast at a cancer conference with the lead scientist of one of the problematic studies.
"We went through the paper line by line, figure by figure," said Begley. "I explained that we re-did their experiment 50 times and never got their result. He said they'd done it six times and got this result once, but put it in the paper because it made the best story. It's very disillusioning."
This is not really wildly surprising. Consider 20 causal relationships that don't exist. Now consider 20 experiments to test for this relationship. Likely 1 in 20 will show a false positive at the 95% certainty level -- that's what 95% certainty means. All those 1 in 20 false positives get published, and the other studies get forgotten.
To some extent, this should be fixable now that we are not tied to page-limited journals. Simply require as a grant condition that all findings be published online, positive or negative, would be a good start.