The paper “Good–Turing frequency estimation
without tears”
is scanned from pp. 217–37 of the
Journal of Quantitative Linguistics, vol. 2, 1995.
The paper has subsequently been reprinted as chapter 7
of Sampson, Empirical Linguistics,
Continuum, 2001. I thank Benjamin Anderson of the University of
Washington for spotting a mathematical misprint (corrected here
and in the 2001 reprint)
on p. 226 of the 1995 version.
Note that the senior co-author (Gale) died in 2002, and the contact information shown at the foot of the first page for the junior co-author (Sampson) is out of date (see his home page for current details). The resources mentioned in footnotes 13 and 18 of the paper are now obtainable via Sampson’s Resources page.
p. 218:
p. 219:
p. 220:
p. 221:
p. 222:
p. 223:
p. 224:
p. 225:
p. 226:
p. 227:
p. 228:
p. 229:
p. 230:
p. 231:
p. 232:
p. 233:
p. 234:
p. 235:
p. 236:
p. 237:
Source code implementing the Simple Good–Turing technique mentioned in note 13, and the SUSANNE Corpus mentioned in note 18, are now obtainable via Sampson’s Resources page.
The Gale & Church 1994 paper “What is wrong with adding one?”, listed in the References, has been reprinted in Geoffrey Sampson & Diana McCarthy, eds., Corpus Linguistics, Continuum, 2004.