Track your comments!
[x]


When you register, comments on your articles and replies to your comments appear here. Register Now!

Sign in to your account
[x]

Not a Scientific Blogging member yet?

Register Now for a Free Scientificblogging.com Account

  • Customize your profile with pictures, banner, a blogroll and more.
  • Leave comments on articles, add other members to your friend lists, chat with people on the site.
  • Write blog posts that can be seen by hundreds of thousands of readers.

It's free and it only takes a minute!

Already a Scientific Blogging member?

Sign In Now

Banner
By Tommaso Dorigo | June 7th 2009 11:32 AM | 13 comments | Print | E-mail | Track Comments
.

More A Quantum Diaries Survivor articles

All

About Tommaso Dorigo

I am an experimental particle physicist working with the CMS experiment at CERN and the CDF experiment at Fermilab. In my spare time I play chess, abuse the piano, and aim my dobson telescope at... Full Bio

It is a well-known fact that it is much easier to measure a physical quantity than to correctly assess the magnitude of the uncertainty on the measurement: the uncertainty is everything!

A trivial demonstration of the above fact is the following. Consider you are measuring the mass of the top quark (why, I know you do it at least once a week, just to keep mentally fit). You could say you have no idea whatsoever of what the top mass is, but you are capable of guessing, and your best guess is that the top mass is  twice the mass of the W boson: after all, you have read somewhere that the top quark decays into a W boson plus other stuff, so a good first-order estimate is 2x80.4= 160.8 GeV.

With your guess, you would have gotten quite close to the real value -which the PDG tells us is 173.1 GeV, give or take 1.5 GeV-by sheer luck. But since a measurement is not complete without a determination of its uncertainty, you are now in a bind. What is the uncertainty on your estimate ? Of course you could guesstimate that too, for instance +-80.4 GeV -after all, the top mass could be three times the W mass, but not less than the W mass itself: so 2+-1 W masses sounds right.

Now, from a scientific standpoint, the central value of your "measurement" might well be legitimate (although nit-pickers would call it an indirect one),  but the uncertainty is much less so. That is because the uncertainty is more important than the central value itself: we rely on the former to know how much we can trust the latter. The point is that even a guessed measurement is valuable, if its uncertainty is correctly assessed -that is, if the uncertainty reflects the possible range of values that our measurement could have taken (and their relative probability), had we redone it, or done it with a different instrument.  My bottom-line is that guessing the uncertainty is more scientifically reproachable than guessing the central value.

With the above argument I may or I may not have convinced the seven of you who were not sleeping. Not to worry: to make my point stronger on how the assessment of systematic uncertainties is crucial in scientific measurements,  in this article I am going to reverse engineer the top quark mass measurements that the CDF and DZERO collaborations have produced since 1994.

A simple exercise, an intriguing result

We have the luxury of being able to look back to 15 years of determinations of the top quark mass: more than 30 independent or partly-dependent results have been produced on it. They are summarized in the top quark section of the Review of Particle Properties, the bible of particle physicists, a thick book which contains everything that has been measured on subatomic particles.

With so many measurements, performed in part at a time when the real value of the top quark was not so well known as now, and with the knowledge of the true top mass (the value is known to within a +-1.5 GeV interval, thanks to  careful average of the most precise new numbers published by the Tevatron experiments) we can assess whether the published systematic uncertainties attached by CDF and by DZERO to their top mass determinations were probably underestimated, overestimated, or roughly ok.

How can we do that ? Simple. First of all, let us take from the PDG the results by CDF and DZERO which have not been used to compute the current World average. Here they are:



In the above list appear combinations and other revue numbers. We need to take those off, since they are strongly correlated with other numbers in the list. A total of 11 CDF measurements and 6 DZERO measurements remain. That is our base of data.

What do we do with those numbers ? Consider the second one, which reads
: the first quoted uncertainty is statistical, and comes from the fitting method and the size of the dataset; it is usually correct, and not really hard to determine. The second one, instead, is systematic, and it is the one which the experiments had to fight hard to assess, thinking of all the possible sources of error and bias.

Now, from the World average we know that the real top mass is 173.1 GeV with a 1.5 GeV uncertainty: so we pick at random a "true" mass value from a Gaussian function of average 173.1 GeV and width 1.5 GeV, and similarly pick at random a "measured" mass value from another Gaussian function of average 177.1 GeV and width of 4.9 GeV. We thus leave out the systematic uncertainty part, which we want to study.

We then proceed and compare the two numbers: say we got a true value of 172.5 GeV, and a measured one of 178.5 GeV. We can divide the difference, 6.0 GeV, by the total uncertainty, which includes the systematic term, and equals   : we get a "normalized deviation" of +0.86 sigma. This means that in the considered case the "measurement" was in good agreement with the true value, if uncertainties are properly accounted for.

One single run of the above exercise says next to nothing about the uncertainties. However, if we do the same thing many times, for all the measurements that the CDF or the DZERO experiments produced of the top quark mass, we get a long collection of  "normalized deviations", which assume some meaning. In fact, if the systematic uncertainties of those measurements were assessed correctly, one would expect that the resulting distribution of deviations would be a perfect Gaussian, with a width of 1.0.

Instead, in the case when one experiment tended to under-estimate its systematic uncertainties, one would find a distribution with a width larger than 1.0; in the opposite case, of an over-conservative experiment, the resulting distribution would be narrower than the standard Gaussian.

Before going to the results, let me mention that this exercise is not too deep, and has several shortcomings. That is, I need to clarify here that all I am doing is to compare a bunch of results with the world average, and I do not claim that the very simple-minded procedure I have put together is too meaningful. I would be able to fill a few pages of text by listing the caveats (correlated systematics among different measurements, assumption of Gaussianicity, etcetera, etcetera, etcetera..., etcetera); however, the good thing about a blog is that one can be a bit more relaxed about what one publishes there. So here we go.

The Results

We first consider the DZERO experiment. We have only six top mass results with which to play, but by the toy Monte Carlo approach I have outlined above, we still get a rather smooth distribution of "pseudo-deviations". As you can see, the average is centred more or less well at zero: this is not too meaningful by itself -the six measurements will in general have an average that is above or below the world average mass of the top quark, and this will indeed reflect in the average of the distribution you are looking at.

More interesting is to check the RMS, which is printed on the top right corner. We get a result of 1.136: larger than one. This means that on average, the six DZERO results underestimated a bit their systematical uncertainties. Not by too much, but significantly so: if we assume that statistics and systematic errors are contributing equally to the width, the systematics are underestimated by 26% to get the result above. But do not forget the caveats I discussed previously...

Next, let us look at the CDF distribution: here, we get a slightly larger offset -0.33 standard deviations away from the World average; just a result of the way the 11 results used for this exercise were selected. Instead, the RMS of CDF is better: it is 1.064, definitely closer to 1.0. CDF, as DZERO, underestimates its systematic uncertainties on the top mass, but about half as much as DZERO.

Ok, the exercise above is a bit too hand waving to be really meaningful. All I meant to do with it today is to make a point about the fact that the scientific integrity of experiments does one day appear, to those who have eyes to see. From the a posteriori perspective, many things can be said of past measurements. In fact, the PDG already does something similar to what I have offered here today in its pages: a comparison of the measured value of several physical quantities is shown as a function of time, to highlight the biasing effect that the knowledge of past results has in the publication of new ones.

Take the example shown in the plot on the right as a telling case: it shows the mass of the Lambda hyperon as a function of time. The trend in the mass versus time does not mean that the Lambda fattened in the last forty years, but rather that past determinations clearly influenced the following ones... I can imagine somebody measuring the Lambda mass to be significantly higher than the previously known World average, and then saying: "It is too high! We certainly have got the scale of our magnetic fiend wrong..."

Plots such as the one above are a very clear warning to experimenters: trust your data, not external inputs!

Comments

logicman's picture
With the above argument I may or I may not have convinced the seven of you who were not sleeping.

Make that 7.5 Tommaso, I'm only half asleep. 

That's an interesting observation on the psychology of estimation.  Conformity is definitely a subtle influence in our lives.  Have you read about Solomon Asch's experiments?

dorigo's picture
Hello Patrick - no, I haven't, but will now look that up.
T.

dorigo's picture
YouTube is wonderful. In a second I found this video of Asch's experiment: http://www.youtube.com/watch?v=R6LH10-3H8k

T.

logicman's picture
Thanks, Tommaso!  I'd not seen that video before.  It brings those dry and dusty old papers to life.  Some time soon I hope to write about how language works through its rules of conformity.  The video will come in handy, and I'll thank you again in a footnote.  Kudos!

dorigo's picture
Sure -and thank you for having brought that up. I had heard of Asch's experiments, but then forgotten about them. Instead, they should be remembered, for the simplicity of the setup and the clarity of the interpretation.

Cheers,
T.

A plot of neutron lifetime vs. publication date is much more dramatic for demonstrating "trust your data, not external inputs".

dorigo's picture
Sure, that is another good example. There are several in fact -I liked the Lambda mass one but indeed the most striking one, in terms of the variation of the measurement value, is the n lifetime in fact.
Cheers,
T.

"It is too high! We certainly have got the scale of our magnetic fiend wrong..." Move over, Maxwell's demon!

logicman's picture
"It is too high! We certainly have got the scale of our magnetic fiend wrong..." Move over, Maxwell's demon!

Dotwatcher!   :)

dorigo's picture
Hah! I think I will leave the typo where it is...
Cheers,
T.

Interesting study! But it seems to me that it is a bit optimistic to divide by the total error when determining your pulls. Many of your systematics will be pretty correlated with the world average mass (e.g., if you are underestimating your top mass due to out-of-cone jet energy effects, then you will be doing so for both your world average mass and for each individual measurement). A more pessimistic but still simple approach would be to include only statistical uncertainties when determining the value to divide by, while I suspect the truth would be somewhere in between the two extremes.

Yes, I realize that this is one of the caveats you listed. I would be curious to know if you happened to try something like this more pessimistic approach in your trial runs.

dorigo's picture
Hi Ford,

well, I agree that there may be correlations between the WA and each measurement even if they are not used in the average, and the source of syst. you mention is a plausible one. However, dividing by the stat error alone would be tantamount to neglecting altogether the very systematic error bars that the experiments have used, which are the objective of the study. I believe we cannot go much farther than what I have put together above... Unless one takes a _much_ more careful approach, which involves studying in detail how each systematic in each measurement was determined.

Cheers,
T.

Hi Tommaso,

In recent top mass WA combinations the correlations between systematics play a pretty big role (latest: http://www-cdf.fnal.gov/physics/new/top/2009/mass/tevcombination_march/ ). I would guess they were accounted for in the older measurement that the PDG considers as well, but I could be wrong. If someone wanted to do this more precisely, it should be possible to break up the systematics for each of these measurements into the same error categories as are used in the combination note, and apply the agreed-upon correlation coefficients. Of course, that would take a lot of tedious work to look everything up, but maybe someone in the top group should try it at some point.

At any rate, my goals is not to nag. I agree that this is necessarily a somewhat hand-wavy study, and the results were interesting to see. I just thought I should point out that the correlations do play a pretty big role (at least for modern measurements) and I think neglecting them will bias you somewhat to overestimate the level of agreement with the world average number.

Add a comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <sup> <sub> <a> <em> <strong> <center> <cite> <code> <TH><ul> <ol> <li> <dl> <dt> <dd> <img> <br> <p> <blockquote> <strike> <object> <param> <embed> <del> <pre> <b> <i> <table> <tbody> <div> <tr> <td> <h1> <h2> <h3> <h4> <h5> <h6> <hr> <iframe>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.
CAPTCHA
If you register, you will never be bothered to prove you are human again. And you get a real editor toolbar to use instead of this HTML thing that wards off spam bots.