Mediant Fractions and Simpson's Paradox
Given two fractions a/b and c/d, the mediant of the two is defined as the fraction (a + c)/(b + d).
Many would point to the mediant fraction as a dangerous concept that is bound to confuse students who often quite innocently produce it adding up to fractions. However, the mediant has its uses and was in fact considered twice by Euclid in his Elements V.12 and VII.12 in the following context:
If a/b = c/d, then a/b = (a + c)/(b + d) = c/d
D. H. Fowler in an 1991 article remarked that (quoted by Scott B. Guthery, p. 32)
... the Greeks who reasoned in proportions, might have had difficulty reasoning in deciml notations; we, in turn, who are so thoroughly steeped in decimals, seem to have difficulty reasoning in proportions. This may be so ingrained that me may actually find some of that reasoning paradoxical.
(Fowler and Guthery referred to what in the 1950s emerged as Simpson's Paradox. I'll return to this shortly.)
Mediant fractions have indeed a long history which has been lucidly and exhaustively described by S. B. Guthery. They appear most remarkably in the Farey series (probably better referred to as sequenes because each of them is a finite collection of fractions.)
The Farey series FN is the set of all fractions in lowest terms between 0 and 1 whose denominators do not exceed N, arranged in order of their magnitudes. For example, F6 is
Farey was a British geologist who in 1816 published the statement to the effect that in the Farey series the middle of any three successive terms is the mediant of the other two. The first proof of this has been supplied by Cauchy.
Just to verify the claim, observe for example, that
Mediant fractions furnish a mechanism to define the Stern-Brocot tree. They are used in rational approximation alongside continued fraction and for solving Deophantine equations. Some of the features of the Farey series relate to the Riemann Hypothesis.
Mediant fractions possess a fundamental property that also explains the term, to wit:
If a/b < c/d, then a/b < (a + c)/(b + d) < c/d
The property is easily verified. It has a simple geometric interpretation: the direction (slope) of the sum of two vectors is somewhere in-between the slopes of the two addends.
Mediant fractions emerged in one of the problem offered at the 2009-2010 International Internet Mathematical Olympiad run by the Ariel University Center of Samaria (Israel):
Four glasses are given. The first one contains a certain amount of apple juice, the second peach juice, the third grapefruit juice and the fourth carrot juice. It is known that apple juice is sweeter (i.e., contains a higher concentration of sugar) than grapefruit juice, and peach juice is sweeter than carrot juice. Is it true that if we mix the contents of the first two glasses, the resulting mixture will be sweeter than the mixture of the contents of the third and the fourth glass?
The answer is no, not necessarily so. To see why, let's introduce some symbols for the amounts of juices and their sugar contents. Let A and a be the amounts of apple juice and the sugar it contains. Similarly, let the pairs
g/G < a/A and c/C < p/P.
The mixture of apple and peach juices from the first two classes will contain the quantity
Now, remember that a fraction could be treated as a slope of a line through the origin. Any point on the line defines the same slope. In other words,
Looking at the fractions as concentrations, we see that the concentration of a mixture may be anywhere between the concentrations of the two constituents.
To solve the problem, let's take four (simple) fractions of increasing magnitude, e.g., 1/5, 1/4, 1/3, 1/2. Observe that
This is the essence of Simpson's Paradox that was first called to the attention of the scientific community by Karl Pearson, a statistician, in a 1899 paper [Guthery]. One of Pearson's co-workers, George Yule, wrote in 1903
The fictitious association caused by mixing records finds its counterpart in the spurious correlation to which that same process mae give rise in the case of continuous variables, a case to which attention was drawn and which was fully discussed by Professor Pearson in a recent memoir.
Both publications have been largely overlooked until a 1951 paper by E. H. Simpson who rased essentially the same concerns, with no reference to either Pearson or Yule. But the real attention-getter was a 1972 paper "On Simpson's Paradox and the Sure-Thing Principle" by Colin Blythe. The paper led to a surge in investigation and solidified the terminology - from then on the phenomenon became known as Simpson's Paradox, serving another instance of the Law of Eponymy: no scientific discovery ever named after its original discoverer.
The Greeks may not have thought about that as a paradox, but unless caution is exercised, the statistical conclusions may lead to unexpected results. There are many examples of that sort. I'll give another one from [Guthery].
The death rate of males in the Navy (3/5) is less than the death rate of males in the Army (8/13).
The death rate of females in the Navy (7/10) is less than the death rate of females in the Army (5/7).
The death rate of soldiers
((8 + 5)/(13 + 7) = 13/20) is less than the death rate of sailors((3 + 7)/(5 + 10) = 2/3).
References
- D. H. Fowler, An Approximation Technique, and its Use by Wallis and Taylor, Archive for History of Exact Sciences, vol. 41, no. 3, pp. 189-233, September, 1991
- S. B. Guthery, A Motif of Mathematics, Docent Press, 2010
Related posts:
[...] Towards the end of February, James Tanton posted a series of videos discussion the notion of area and Paul Curry’s paradox. As a result a discussion sprang up on Twitter, which raised the question as to why a triangle has area. The issue has been discussed by both Alexander Bogomolny in a post entitled Does Triangle Have Area? , posted at CTK Insights; and Gary Davies at Why does a triangle have an area?. Also from Alexander is a post on Mediant Fractions and Simpson’s Paradox. [...]
March 4th, 2011 at 11:36 am