2. Outline of the Procedure
In this section we describe the test in outline. In the Appendix, sufficient details are provided to enable the reader to repeat the computations precisely, and so to verify their correctness. The authors will provide, upon request, at cost, diskettes containing the program used and the texts G, I, R, T, U, V and W (see Section 3).
We test the significance of the phenomenon onsamples of pairs of related words (such as hammer-anvil andZedekia-Matanya). To do this we must do the following:
- define the notion of "distance"between any two words, so as to lend meaning to the idea of words in"close proximity";
- define statistics that express how close,"on the whole," the words making up the sample pairs areto each other (some kind of average over the whole sample);
- choose a sample of pairs of related wordson which to run the test;
- determine whether the statistics definedin (ii) are "unusually small" for the chosen sample.
Task (I) has several components. First, we must define the notion of "distance" between two given ELS's in a given array; for this we use a convenient variant of the ordinary Euclidean distance. Second, there are many ways of writing a text as a two-dimensional array, depending on the row length; we must select one or more of these arrays and somehow amalgamate the results (of course, the selection and/or amalgamation must be carried out according to clearly stated, systematic rules). Third, a given word may occur many times as an ELS in a text; here again, a selection and amalgamation process is called for. Fourth, we must correct for factors such as word length and composition. All this is done in detail in Sections A.1 and A.2 of the Appendix.
We stress that our definition of distanceis not unique. Although there are certain general principles (likeminimizing the skip d) some of the details can be carried out inother ways. We feel that varying these details is unlikely to affect theresults substantially. Be that as it may, we chose one particulardefinition, and have, throughout, used only it, that is, thefunction c (w, w') described in Section A.2 of theAppendix had been defined before any sample was chosen, and it underwentno changes. [Similar remarks apply to choices made in carrying out task (II).]
Next, we have task (II), measuring theoverall proximity of pairs of words in the sample as a whole. For this, weused two different statistics P1and P2, which are defined and motivated in the Appendix (Section A.5).Intuitively, each measures overall proximity in a different way. In eachcase, a small value of Pi indicates that the words inthe sample pairs are, on the whole, close to each other. No otherstatistics were ever calculated for the first, second or indeed anysample.
In task (III), identifying an appropriate sampleof word pairs, we strove for uniformity and objectivity with regard to thechoice of pairs and to the relation between their elements. Accordingly,our sample was built from a list of personalities (p) and the dates(Hebrew day and month) (p') of their death or birth. The personalities weretaken from the Encyclopedia of Great Men in Israel. (MARGALIOTH, M., ed. (1961). Encyclopedia of Great Men in Israel;a Bibliographical Dictionary of Jewish Sages and Scholars from the 9th tothe End of the 18thCentury 1-4. Joshua Chachik, TelAviv).
At first, the criterion for inclusion of apersonality in the sample was simply that his entry contain at least threecolumns of text and that a date of birth or death be specified. Thisyielded 34 personalities (the first list--Table1). In order to avoid any conceivable appearance of having fitted thetests to the data, it was later decided to use a fresh sample, withoutchanging anything else. This was done by considering all personalitieswhose entries contain between 1.5 and 3 columns of text in the Encyclopedia;it yielded 32 personalities (the second list--Table2). The significance test was carried out on the second sample only.
Note that personality-date pairs (p, p')are not word pairs. The personalities each have several appellations,there are variations in spelling and there are different ways ofdesignating dates. Thus each personality-date pair (p, p')corresponds to several word pairs (w, w'). The precise methodused to generate a sample of word pairs from a list of personalities isexplained in the Appendix (Section A.3).
The measures of proximity of word pairs (w, w')result in statistics P1and P2. As explained in the Appendix (Section A.5), we also used a variant ofthis method, which generates a smaller sample of word pairs from the samelist of personalities. We denote the statistics P1 and P2, when applied to this smaller sample, by P3 and P4.
Finally, we come to task (iv), the significance test itself. It is sosimple and straightforward that we describe it in full immediately.
The second list contains of 32 personalities. For each of the 32!permutations p of these personalities, wedefine the statistic P1pobtained by permuting the personalities in accordance with p,so that Personality i is matched with the dates of Personality p(i).The 32! numbers P1pare ordered, with possible ties, according to the usual order of the realnumbers. If the phenomenon under study were due to chance, it would bejust as likely that P1 occupies any one of the 32!places in this order as any other. Similarly for P2, P3and P4. This is our null hypothesis.
To calculate significance levels, we chose 999,999 random permutations pof the 32 personalities; the precise way in which this was done isexplained in the Appendix (Section A.6). Each of thesepermutations p determines a statistic P1p;together with P1, we have thus 1,000,000 numbers. Definethe rank order of P1 among these 1,000,000numbers as the number of P1pnot exceeding P1; if P1 is tied withother P1p, half ofthese others are considered to "exceed" P1.Let r1 be the rank order of P1,divided by 1,000,000; under the null hypothesis, r1is the probability that P1 would rank as low as it does.Define r2, r3and r4 similarly (using the same999,999 permutations in each case).
After calculating the probabilities r1through r4, we must make an overalldecision to accept or reject the research hypothesis. In doing this, weshould avoid selecting favorable evidence only. For example, suppose that r3= 0.01, the other ri beinghigher. There is then the temptation to consider r3only, and so to reject the null hypothesis at the level of 0.01. But thiswould be a mistake; with enough sufficiently diverse statistics, it isquite likely that just by chance, some one of them will be low. Thecorrect question is, "Under the null hypothesis, what is theprobability that at least one of the four ri would be less than or equal to 0.01?" Thus denoting the event "ri <= 0.01" by Ei, we must find the probability notof E3, but of "E1 or E2or E3 or E4." If the Ei were mutually exclusive, this probability would be 0.04; overlaps onlydecrease the total probability, so that it is in any case less than orequal to 0.04. Thus we can reject the null hypothesis at the level of0.04, but not 0.01.
More generally, for any given d, theprobability that at least one of the four numbers ri is less than or equal to d is at most 4 d.This is known as the Bonferroni inequality. Thus the overall significancelevel (or p-value), using all four statistics, is r0:= 4 min ri.