Problems
The Gwodyen DDJ

Her is a real-life actual genuine example of a statistical problem. The Dau/Dv Jing (DDJ) we know has 81 chapters. In 1993, three sets of extracts from the DDJ were recovered from a tomb at a place called Gwodyen. All DDJ chapters represented in those three extracts were from the range DDJ 2-66. At a conference held in 1998, in connection with the publication of the Gwodyen texts, several theories were presented about the nature of the DDJ from which the extracts were presumably taken. The two main theories were:

(1) The DDJ was written all at once (by Laudz in about 0500), and has always had 81 chapters, just like our present text. Therefore, the DDJ on which the Gwodyen florilegia drew must have had 81 chapters too.

(2) The DDJ was an accretional text, begun about 0350, and it was still growing when it was drawn on, about 0288, for the Gwodyen florilegia. It thus contained, at that time, less than the eventual 81 chapters. Previously published statements of Theory 2 came down to this: a DDJ in the year 0288 ought to be complete to DDJ 65, and might contain anything up to DDJ 70, but probably nothing beyond DDJ 70.

As things actually happened, the second theory was never discussed at the conference. But in this statistical context we are free to ask: What if it had been?

Problem as Given

Theory 2 obviously predicts an outcome very close to what the Gwodyen data show. It is thus a plausible theory. Theory 1, also obviously, fits the Gwodyen data less well. The point to submit to statistical analysis is whether the scenario of Theory 1 is also within the limits of chance variation. In that case Theory 2 may be less likely, but both remain will within the realm of the possible, and neither can be rejected.

Theory 1 has this scenario: The source text contained 81 chapters. The 33 extracts in the three florilegia, though with one exception they do not duplicate each other, happened never to pick any of the last 15 chapters, DDJ 67-81. Let us put this in a familiar form. It is the same kind of situation we find in the Hypergeometric or drawn ball example. The difference is merely in the details. In the "golf ball" version of Theory 1, we would have a total of 81 balls, 66 of them white (representing DDJ 1-66), and 15 of them red (representing DDJ 67-81). By Theory 1, we are to imagine the Gwodyen florilegia makers to have reached into the box 33 times, each time drawing one ball at random (without replacement), and each time drawing a white ball. No red ball is ever drawn. What are the odds that this could happen by chance? In ordinary practice, an event which could occur by chance only 5 times out of 100 (5% probability) is usually considered grounds for suspicion, and an event which could occur only once in 100 tries (1% probability) is considered strong grounds for suspicion.

Analysis

What degree of suspicion, if any, attends the Theory 1 scenario? All we have to do is count it out.

The first draw from the box has 66 chances out of 81 to draw a white ball. That is a decimal probability of 66/81 = 0·815. Since that ball is not replaced (the Gwodyen extracts do not duplicate, so once chosen, a chapter cannot be chosen again), the situation at the second draw is that there are 65 white balls out of a total of 80. The chance of getting white is thus a little less: 65/80 = 0·813. And so it goes at each step. The total is reduced by 1, and the number of white balls is also reduced by 1, and so the fraction representing the chance of drawing a white ball at any given step goes steadily down.

With these sequence events, all of which must happen to reach the final result, we multiply the probabilities for the single events. Here are the first ten draws, showing the chance for a white ball on that turn, and also, in the bottom line, the cumulative chance that a white ball is drawn at each step:

 Draw 1 2 3 4 5 6 7 8 9 10 White 66 65 64 63 62 61 60 59 58 57 Total 81 80 79 78 77 76 75 74 73 72 Chance 0·815 0·812 0·810 0·808 0·805 0·803 0·800 0·797 0·795 0·792 Cum 0·815 0·662 0·536 0·433 0·349 0·280 0·224 0·179 0·142 0·112

We start off with the odds in favor of drawing a white ball; as noted, the probability is 0·815, or about 4 out of 5. But already at the second turn, the odds of having gotten two white balls in a row are down to 0·662, or about 2 out of 3. By the fourth turn, the odds of having drawn all white balls is down to less than half. That is, it would be just a hair more likely to have drawn 1 red ball somewhere among the first 4 draws. Still, there is nothing here yet to raise suspicion. If we persist to the 10th turn, at the end of the row, the probability of having drawn only white balls is down to 0·112, or a little better than 1 in 10. As noted in Lesson 1, that gives a little less than a 90% degree of certainty that something nonrandom is happening. It is not a decisive level of certainty. We may thus say that if the Gwodyen florilegia had contained only 10 DDJ extracts, all of them from the white or DDJ 1-66 part of the source text, there would be no grounds for rejecting Theory 1. The Gwodyen text profile is maybe a little bit odd, but not so odd as to raise statistically serious doubts about it.

So far so good for Theory 1. Here are the next 10 draws, each of them also producing a white ball:

 Draw 11 12 13 14 15 16 17 18 19 20 White 56 55 54 53 52 51 50 49 48 47 Total 71 70 69 68 67 66 65 64 63 62 Chance 0·789 0·786 0·782 0·779 0·776 0·773 0·769 0·766 0·762 0·758 Cum 0·089 0·070 0·054 0·042 0·033 0·025 0·020 0·015 0·011 0·009

By the end of this series, we have reached the cumulative probability of about 1 in 100; that is, we are at the level of 99% confidence that the result in question could not have been produced by chance. That, in principle, is enough to suggest that we should reject Theory 1 as requiring a too improbable event.

But the Theory 1 scenario is not finished yet. It goes on to specify further events. Let us follow them. Here is the third set of ten draws:

 Draw 21 22 23 24 25 26 27 28 29 30 White 46 45 44 43 42 41 40 39 38 37 Total 61 60 59 58 57 56 55 54 53 52 Chance 0·754 0·750 0·746 0·741 0·737 0·732 0·727 0·722 0·717 0·712 Cum 0·007 0·005 0·004 0·003 0·002 0·001 0·001 0·001 0·001 0·000

The cumulative possibility for the 30th consecutive white draw is not really zero; it just doesn't show up in three decimal places. The actual number, to the limits of the calculator we are using, is 0·0003913, or about 1 in 3,000. That is to say, in this third series of draws yielding nothing but white balls, we have passed the 1 in 100 chance, and even passed the 1 in 1,000 chance, and have entered a realm where the chance of such a result is so small that it does not show up at all unless we give the probability to four decimal places.

But even now, Theory 1 is not finished. It predicts this final set of three draws:

 Draw 31 32 33 White 36 35 34 Total 51 50 50 Chance 0·706 0·700 0·694 Cum 0·000 0·000 0·000

(Notice, in the interest of accuracy, that because there is one duplication in the Gwodyen florilegia, we must assume that one of the draws has been "with replacement." We have shown that by having a total ball count of 50 for both draw 32 and draw 33. It does not make all that much difference in the arithmetic).

Up to the very end, as we see from the "chance" row of the table, the chance of drawing a white ball on that one draw is in the vicinity of 7 out of 10, a quite likely proposition. But the claim of Theory 1 is of getting a white ball on that and every preceding draw. The chance of that outcome has declined, by the 33rd draw, to the miniscule figure of 0·0001314. This amounts to one chance in 7,610.

Formula

We have gone through these considerable multiplications in order to arrive at the following result, where w means "drawing a white ball" and 33w means "drawing 33 white balls in succession:"

P(33W) = 0·0001314, or 1 chance in 7,610

Our degree of certainty that we have a nonrandom result is thus not 95%, the usual threshold of probable significance. It is not 99% (1 chance in 100), the highest level of certainty we had previously considered. It is not 99·9% (1 chance in 1,000). It is more precisely 99·98686% (1 chance in 7,610 that the result could have occurred by natural processes). Even the most skeptical factory manager would be convinced by it.

Outcome

The humanistic scholar should also be convinced by it, and the outcome is thus that we reject Theory 1. Theory 1 requires the assumption of an extraordinarily unlikely event: the consistent ignoring of a span of 15 DDJ chapters, in a supposedly random selection of 33 out of a total of 81 chapters.

Let us reflect a bit on the reality of this test of Theory 1. Is it reasonable to suppose, as we have, that the selection for the Gwodyen florilegia was in fact random? No, people in real life do not compile anthologies at random. It was probably purposive. Does it change the argument if we factor in that possibility? Yes, it makes the argument even stronger. It is well known that the DDJ chapters tend to be mystical toward the beginning (the so-called "Dau Jing"), and governmental toward the end (the "Dv Jing"). From evidence in the Gwodyen 1 tomb, which is the grandest in that particular assemblage, it is likely that the person buried there had been the Tutor to the Heir Apparent of Chu, the future ruler of that state. Given the grandiose tomb furnishings, he was definitely not a meditative hermit or recluse. The tutor of a future ruler might well, in his teaching materials, prefer DDJ quotations that had governmental relevance, and in fact the Gwodyen florilegia selection does lean more heavily on the higher numbered chapters, within its range of DDJ 2-66. Then if DDJ 67-81 had been present in the source text, there is, if anything, a better than average chance that they would have been drawn on for the Tutor's materials. The "random selection" assumption ignores this fact. To that extent, the above test is actually weighted in favor of Theory 1.

It didn't help.

So on all assumptions, Theory 1 is gone and only Theory 2 is left standing. The inference is that DDJ 66-81 were not included in the DDJ source text.

Envoi

For further considerations on the probable size of the omitted material, see the recent article in Warring States Papers. Other evidence suggests that, in fact, the missing material was not approximately, but exactly DDJ 67-81.

Contact The Project / Exit to Resources Page