In the most recent edition of PhysicsWorld, there are two articles that on the face of it have little to do with each other: one is about Jan Hendrik Schön, the physicist formerly famous for creating the first organic superconductor and the first single-molecule transistor, and now most famous for having simply made up all of those results out of thin air, the greatest kind of scientific fraud in physics.
The other article is about how the internet is transforming scientific communications, looking at which new means of scientific communication failed (such as Physics Comments and scientists contributing to Wikipedia -- although Scholarpedia is taking off quickly at the moment, probably because its signed and peer-reviewed authorship model is more in line with academic customs than Wikipedia's semi-anarchistic one) and which succeeded (the arXiv) in making the dissemination of scientific results quicker and more transparent.
At first glance these two topics appear to have little to do with each other. At second glance, however, they are closely intertwined.
Schön's deception was only possible because the researchers who tried and failed to replicate his results didn't have access to his primary data. Once doubts had been raised over the appearance of two completely identical graphs supposedly representing two completely different sets of experimental data, Schön's primary data were subjected to close scrutiny and were found to be non-existent -- his lab books had been destroyed, and his samples were damaged beyond recovery.
This raises the question whether it would have been possible to even contemplate such a fraud in an environment where scientists are genuinely expected to hide nothing, and in particular to make their primary data publicly available after publication.
The more radically open schemes, such as the open notebook science proposed and practiced by fellow scientific blogger Jean-Claude Bradley, where raw data are being made public before publication, are unlikely to take off largely because of concerns over the enormous plagiarism potential. But once results have been published and priority has thus been established by the original authors, there is no immediately obvious reason not to allow other researchers to perform their own analyses of the primary data, either to confirm (or possibly to refute) the original analysis, or to use their own methods to obtain results from the data that the original authors didn't (either because they weren't interested or because they didn't have the relevant analysis methods at their disposal). Some access controls are needed, of course, in order to ensure that the later researchers will duly acknowledge the use of the original group's datasets.
It is hard to see how a fraud like the Schön case could have occurred under a scheme like this; the groups who wasted years on trying to replicate his results to no avail would likely have realised the fraud if they had had access to Schön's lab books.
Just like with the arXiv (which started out as a specialised High Energy Physics preprint server and now has revolutionised publishing in most of physics and mathematics, and in some parts of computer science, biology and finance), particle physicists are pushing ahead with schemes to open access to raw data: in lattice QCD with dynamical fermions, the most computationally expensive step is generating the configurations of gauge fields that are then further analysed to obtain answers for the masses and other properties of hadrons.
Many groups that have very interesting ideas of what particles and phenomena to study with which new methods simply cannot afford to generate their own unquenched ensembles of gauge configurations (we are talking many Teraflop-months here), and would be stuck with the quenched approximation (which amounts to ignoring the effects of dynamical quarks) if it wasn't for the fact that an increasing number of collaborations make their ensembles available after performing their own initial analysis.
Configurations have been available for a while at The Gauge Connection (the name is a pun that only particle theorists will appreciate), and are now quickly beginning to be available on the International Lattice Data Grid (ILDG). This way the many CPU cycles that have been invested in generating these ensembles are put to even better use by enabling other groups to run their analyses on them.
Just like in the case of the arXiv, it may take a while for other disciplines to follow suit, but it appears likely that if and when more and more scientists choose to make their raw data public after publication (and those that don't therefore become increasingly subject to suspicion by their peers), a fraud case like that of Jan Hendrik Schön will become quite impossible at some point in the future.
Comments
Georg von Hippel | 05/18/09 | 03:35 AM
Well, it turns out there are two such sites, named exactly the same way. The one you were referring to, and the blog by Marco...
Cheers,
T.
Cheers,
T.
Tommaso Dorigo | 05/18/09 | 04:32 AM
arXiv might be a good model for any site. Some people love it and others think the absolute rubbish thrown on there drags down the reputation of it overall. I think that a somewhat cavear emptor mentality is key but for the most part a smart audience acts as moderation of quality.
No tenure or grant committee looks down on a candidate because they put something on arXiv even though some real crap has been on there. Likewise, I think Schön would have been tripped up faster in a place like this than as actually happened - namely, a group of scientists who were looking to get private sector funding and used his work as an example of how cutting edge their research was had to actually document it (because capitalists don't get as easily star-struck as big media peer review journals).
No tenure or grant committee looks down on a candidate because they put something on arXiv even though some real crap has been on there. Likewise, I think Schön would have been tripped up faster in a place like this than as actually happened - namely, a group of scientists who were looking to get private sector funding and used his work as an example of how cutting edge their research was had to actually document it (because capitalists don't get as easily star-struck as big media peer review journals).
Hank Campbell | 05/18/09 | 10:27 AM
arXiv might be a good model for any site.
I'd be interested in knowing how some long-time arXiv users feel about how it's changed over the years - is there more crap there now? What about the sheer volume of material? How do you physicists who read arXiv select what to read?
In genomics, I have a hard keeping up with all of the published stuff. It doesn't help that in genomics, the culture is frequently such that paper titles are overly general and abstracts too vague and detail-free - it's hard to focus in on the relevant stuff by just looking at titles and abstracts.
I think the biomedical research community is larger than the physics community (at least by grant dollars, the NIH is bigger than NSF, NASA, and DOE combined, plus DOE funds a lot of genomics research) - do some of these open science models run into problems as the research community gets larger?
Michael White | 05/19/09 | 11:59 AM
The volume of rubbish has actually been steadily decreasing for a while now, because new submitters without an official academic affiliation need an endorsement from an established researcher in order to submit to the arXiv for the first time, and because there is a moderation system that will filter out clearly unsuitable contributions. Details can be found here and here.
As for the volume of stuff, that can be a problem, but that is generally solved by splitting the subject areas down further -- see the news near the top here. Being able to tell whether a paper is likely worth reading from the abstract is important, however.
As for the volume of stuff, that can be a problem, but that is generally solved by splitting the subject areas down further -- see the news near the top here. Being able to tell whether a paper is likely worth reading from the abstract is important, however.
Georg von Hippel | 06/01/09 | 06:57 AM
A model like this can work for physics at least. I would be suspicious of any scientist who was not willing to share their data after they have published their findings.
This can even work for the theoretical side of things. For example the online Journal "Living Reviews in Relativity". I like their policy/concept of a "living article"... They say
"The key feature of Living Reviews is the concept of a living article. This means that we will take advantage of the ease of editing electronic material to keep articles as up-to-date as we can."
They encourage updates and errata to be published along with the articles. None of that would be a bad idea.
The reason things like this don't catch on in physics is because of institutional inertia. Old traditional thinking sees many online resources as unreliable just because they are online.
This can even work for the theoretical side of things. For example the online Journal "Living Reviews in Relativity". I like their policy/concept of a "living article"... They say
"The key feature of Living Reviews is the concept of a living article. This means that we will take advantage of the ease of editing electronic material to keep articles as up-to-date as we can."
They encourage updates and errata to be published along with the articles. None of that would be a bad idea.
The reason things like this don't catch on in physics is because of institutional inertia. Old traditional thinking sees many online resources as unreliable just because they are online.
Hontas Farmer | 05/19/09 | 17:02 PM











I'm no particle theorist, but I appreciated Marco Frasca's pun the second I read his blog's title!
That said, I took a 120-hour QFT course 13 years ago, but I forgot most of it... Culture is what remains when one forgets things.
Cheers,
T.