The York Fit, MSWDs, and Geochronology

I am by no means a statistician. There are a variety of tests and techniques I use, most of which I mostly understand. This post is about one of those techniques that I became familiar with while working in 40Ar/39Ar labs, and trying to write some of my own data reduction software. One thing I noticed was that although this statistic is rather common in the earth sciences, there wasn’t a single place that explained the steps required to calculate or interpret the results. So, here it goes…

There are a variety of methods for calculating a best-fit line through a set of data.  The most common methods are the least squares and weighted least squares techniques. The least squares method allows for the calculation of a best fit line, but does not consider errors in the measurements, while the weighted least squares method considers only errors in one axis or measurement. For geochronologic applications, it is important to consider errors in measurements in both the x- and y-directions.  The following method employs a set of equations derived by Derek York, originally for use with Rb-Sr isochron diagrams, but later employed in a variety of geochronologic systems, including 40Ar/39Ar and U-Th-Pb.  The basic idea of the York fit is to produce a line that minimizes the distance of each data point from the line, while weighing the precision of each point, as shown below

mswd

The above diagram shows the basics of the York fit. First of all, the goal is to come up with the line that minimizes the distance from your data point to the line.  The average distance of each data point (xi,yi) to the best fit line (point Xi,Yi on line Y=aX+b) is the value that should be minimized.  In addition, more weight, or importance in the fit, is given to samples with higher precision.  This minimization parameter S is therefore defined by York (1969) as

1     (1)

where the weight of each data point is defined as

2     (2)

Calculation of the best-fit line using this method is an iterative process, where you start with a reasonable approximation for the slope (b) and y-intercept (a), as well as the correlation coefficient (r) for the data errors.  Reasonable starting values can be calculated from a least squares fit to the data, which does not take into account the errors in the x- and y-direction (Eq.15 and 16).  The correlation coefficient between the errors is given in equation 16.  Equations 3-8 can be used to calculate the slope (b), y-intercept (a), and x-intercept (c) from a given data set. Once the new value for b is calculated, it is reinserted into the equation, and the values are calculated again.  The answers will eventually converge, depending upon the desired precision, scatter in the data, and quality of the original estimate

3     (3)

4a , and   4b (4)

5     (5)

6     (6)

7     (7)

8   (8)

Uncertainties for the a and b values calculated above are given by McDougall and Harrison (1997) as

9_b     (9)

10_b     (10)

The Mean Square of the Weighted Deviates (MSWD) is the parameter used to assess the goodness of fit of the calculated line to the data.  It is the ratio of the minimization parameter calculated above (S) to the number of degrees of freedom in the system (in this case, n-2). The MSWD cannot be compared to other parameters, such as r2, that are typically used to describe the goodness of fit of a line.  The MSWD is unitless, and is normalized to the input values (this is not a measure of absolute error).  The standard deviation of the MSWD is given by equation 12.  Equation 13 is the target MSWD for a given data set. If the calculated MSWD is less than the value calculated in equation 13, then there is a > 5% probability that the data are adequately described by the line, and are generally accepted. The target MSWD is typically around 2, but varies with the number of data points. As the number of data points increases, the target MSWD approaches 1.  An MSWD << 1 indicates that the reported errors are most likely overestimations of the true error.

11     (11)

12     (12)

13     (13)

14     (14)

15     (15)

16     (16)

Eq.1-11 : McDougall I., and Harrison, T.M., 1999, Geochronology and Thermochronology by the 39Ar/40Ar Method, Oxford University Press, New York, 269 p.

Eq. 12-13 : Wendt, I. and Carl, C., 1991, The statistical distribution of the mean squared weighted deviation, Chemical Geology (Isotope Geoscience Section), v.86, pp.275-285

Eq. 14-16 : Taylor, J.R., 1997, An introduction to error analysis, the study of uncertanties in physical measurements, University Science Books, Sausalito, CA, 327 p.

Example: Sample data set and associated calculations from an 40Ar/39Ar analysis. Use the X and Y data and the associated uncertainties to calculate fits and statistics. Answers below!

X

X stdev

Y

Y stdev

2.7023E-03

2.7102E-05

3.0800E-03

2.7400E-05

3.5143E-03

4.6900E-05

3.0226E-03

4.6800E-05

1.1232E-02

5.7000E-05

1.8457E-03

2.0600E-05

1.7668E-02

7.0400E-05

1.0541E-03

1.3000E-05

2.3694E-02

2.8400E-04

1.3909E-04

1.3900E-04

1.8766E-02

5.8708E-05

8.8394E-04

1.7017E-05

2.1217E-02

6.8546E-05

5.7748E-04

1.4818E-05

2.5336E-02

1.2869E-04

2.4434E-06

2.5362E-05

1.8063E-02

4.8321E-05

9.8585E-04

1.5664E-05

1.9546E-02

5.5549E-05

8.0004E-04

1.5398E-05

1.7832E-02

6.9203E-05

1.0204E-03

1.9451E-05

1.2708E-02

4.1328E-05

1.7162E-03

2.0225E-05

1.2232E-02

3.9281E-05

1.8275E-03

1.9135E-05

1.1136E-02

2.8326E-05

1.9530E-03

1.3782E-05

1.1843E-02

2.1066E-05

1.8550E-03

7.7317E-06

1.3436E-02

1.5847E-05

1.6080E-03

6.0629E-06

r = 0.898

a = 3.45E-3 ± 1.48E-5

b = -0.1362 ± 1.01E-3

c = 0.02535

MSWD = 2.12

Target MSWD = 1.76

Bad Fit (P < 5%)

This entry was posted in dealing with data, geochronology, Primers, thermochronology, Uncategorized. Bookmark the permalink.

9 Responses to The York Fit, MSWDs, and Geochronology

  1. Ajoy Baksi says:

    There are a large number of publications looking at the York II fit (as we call it) to 40Ar/39Ar isochrons. Ramifications of MSWD values, slopes and intercepts, have been looked at in quite a few publications.

  2. F. Scott says:

    I think you may have forgotten to define the parameter f.

  3. Y. Zhang says:

    This is a very useful summary! Do you have the expressions for the case of zero intercept (that is, the intercept a = 0)? If so, I would appreciate these.

  4. Y. Zhang says:

    Continuing from my message above, I would appreciate the approximate expression for sigma_b (that is, eq. 9 in this post for the case of a=0).

  5. Thanks for the comments! I don’t have those expressions handy, but do you mean a case where the line is forced through the origin? If the calculated origin is 0, I don’t see why different equations would be needed, but if you are trying to force the line through the origin, or know if must pass through is a priori, I’m not sure how to deal with that!

  6. Y. Zhang says:

    Yes, I mean that the line is forced through the origin because it must pass through the origin. Hence, in eq. (1), a = 0. It is relatively easy to obtain the expression of b by minimizing S, but finding a good approximation of the error on b (analogy to eq. 9) is difficult.

  7. Cjay says:

    Very helpful. Thank you

  8. Before asking my question, I just would like to say your post is amazing and absolutely important. There is a huge number of professional who works with geochronology and have no idea about the calculations! The results are poor conclusions or even meaningless ages. Well, my question: what is the definition of rhoi, in equation 2? Where can I find it?

    Cheers,
    Felipe

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s