Geoffrey Sampson

Geoffrey Sampson: Intellectual Contributions

Being now retired, this seems a suitable time to sum up the various contributions I have tried to make to human knowledge and understanding over my career. That does not mean that my future work will be limited to, as it were, just adding footnotes to what I did in the past. I have continued to strike out in new directions. I was already well into my sixties when I contributed to the pressure group which successfully challenged the massive but doomed recent programme of National Health Service computerization, and when I took up the cudgels to refute a controversial new theory about Odysseus’s homeland. But probably the bulk of what I am spared to do in the future will extend my established areas of activity, listed here, rather than open up new ones:

defining the structure of English
language and the human mind
frequency estimation
freedom and creativity
information technology and business
empirical linguistics
language complexity
writing systems
the beginnings of Chinese culture

Some of the headings below link to separate web pages discussing the topics in more detail.

I also offer a page with links to my latest publications. And, for the benefit of anyone chasing up the more obscure things I have written, there is a page which indexes a complete list of my academic writings.

Defining the Structure of English

When we think about a language – English or any other – everyone sees that it comprises a rich vocabulary of words, and people have been putting effort into registering the vocabulary of English in ever-more-precise detail for centuries. A famous milestone was Dr Johnson’s dictionary in the eighteenth century, and in our own time the most advanced lexicographical resources are surely the Oxford English Dictionary and its sister publications from the same stable (though of course there are many competitors). But a language is not just a set of words; it also comprises the grammatical structures into which the words are assembled to form meaningful discourse. Registering the range of structures in English has been a less well-developed enterprise. There is a long history of teaching schoolchildren to analyse sentences as simple, compound, or complex, with complex sentences containing subordinate clauses of various kinds. But that tradition really only covered the high points, as a small-scale map shows just the most basic features of a terrain. One could not take the conceptual apparatus of that tradition and use it to give an exhaustive account of the structural features of any arbitrary sample of real-life English usage, written or spoken, formal or spontaneous. Refining the traditional framework to the point where it could achieve that had to wait until we had electronic corpora, that is, large-scale samples of real-life language of various genres searchable by computer.

I got involved in corpus linguistics at an early stage, having the good fortune to spend much of my early career in the Lancaster University department where Geoffrey Leech created the first corpus of British English. Within Geoffrey’s group, I took on the task of developing a refined scheme of structural annotation by applying it to examples and in that way uncovering and progressively eliminating gaps and ambiguities in the definition of the scheme. Quite a few people in the 1980s, in the English-speaking world and elsewhere, began using computers to develop “treebanks” of structurally-annotated language samples (though I believe I may have been the first by a short head); but most of those who produced treebanks were chiefly interested in maximizing the quantity of annotated material, in order to generate language-usage statistics. My aim, by contrast, was to define the annotation scheme as comprehensively and rigorously as possible (so that any statistics would reliably count apples with apples and oranges with oranges).

Essentially, I was trying to do for the English language what Linnaeus in the eighteenth century did for botany.

This overall programme of work led to a book, English for the Computer, which defines what seems to be generally accepted as the most comprehensive and precise scheme extant for registering structural features in edited, written English, together with extensions of the scheme to deal with the distinctive structural features of spontaneous spoken English and of the unskilled writing of (for instance) children. Since an annotation scheme can only be refined through applying it to examples, the work has also led to the creation of annotated electronic corpora of these various English genres, as well as to examination of the theoretical problems that arise when one seeks to impose a rigorous structural analysis on a system as anarchic as the English language (e.g. Sampson 1998, 2000a, 2000b, Sampson and Babarczy 2003).

Trying to pin down the structure of English explicitly raises an obvious question about how precisely it is possible to pin it down. Together with my former researcher Anna Babarczy (now of McDaniel College, Budapest) I examined this question experimentally, with results discussed in our book Grammar Without Grammaticality, one chapter of which quantifies the definability of different aspects of English – it turns out that some humanly-important structural features are far more resistant to precise classification than other features. Linguists assume that English and other languages are defined by clearcut rules, though it is very difficult to specify these comprehensively. Anna and I argue that this is a false perception of language; the evidence suggests that fully defining the grammar of a human language is not just beyond the grasp of present-day science but a meaningless goal even in principle.

One reason for being as precise as one can about language structure is that computer applications involving human language tend to require the ability to parse – that is, automatically to infer the grammatical structure underlying a linear sequence of words. Parsing turns out to be a huge, unsolved challenge, and we need quantitative measures of success. The best-known metric for evaluating parsing accuracy was widely recognized as giving misleading results; Anna’s and my book defines a different metric, and demonstrates experimentally that it is a good measure. I wrote and published software to implement the metric, which has been used and further developed by the international research community.

(Remarkably, one recent application of the “PageRank” algorithm which Google uses to decide which Web pages to serve up in response to a query makes me the “most central” author in the field of computational linguistics worldwide – see Table 14 in Dragomir Radev et al., “A bibliometric and network analysis of the field of computational linguistics”, J. of the Assoc. for Info Sci. and Technology vol. 67, pp. 683–706, 2016. If I believed this, it would be highly gratifying! Alas, it is completely unbelievable – there must have been some error in Radev and his co-authors’ calculations.)

Information Technology and Business

My teaching, which in the years before I retired related mainly to information technology in a business context, led me to write survey books about Electronic Business and about Law for Computing Students. Many e-business textbooks focus on the technical details of the software systems used for business purposes. I prefer instead to bring the technology into relationship with concepts of economic theory and business studies, to help computing graduates (among others) to think about how their expertise can contribute to wider business strategies.

I have argued that the outcome of current commercial tussles between alternative types of enterprise-I.T. environment is likely to have large consequences for the future economic and even perhaps political complexion of society.

A number of commentators on e-business have used the ideas of the economics Nobel laureate Ronald Coase to predict that e-commerce is creating a business environment in which average company size will be much smaller than in the past. In another paper I argue that this is a misunderstanding of Coase’s ideas, and an unjustified conclusion.

(Both these latter papers are now incorporated in the Electronic Business book.)

Language and the Human Mind

During the last third of the twentieth century, ideas about how the human mind works were dominated by a remarkable new theory whose roots lay in linguistics, though its implications extended to most or all aspects of human cognition and human behaviour, and perhaps even to the nature of human societies. This theory, originally advanced by the American Noam Chomsky in the 1960s and 1970s, and revived after a period of neglect by the Canadian Steven Pinker in his 1994 book The Language Instinct, claims that the content of our minds is fixed by our genes, as tightly as (everyone agrees) the structure of our bodies is fixed. “We do not really learn language”, Chomsky has written; “rather, grammar grows in the mind.” And similarly our beliefs, our art, and in fact all the important features of human cognitive life are claimed to derive not from cultural creation and transmission but from genetic inheritance.

This point of view might seem both bizarre and unattractive. Believe it or not, by the end of the twentieth century it had become the default view among people professionally concerned with such issues. Noam Chomsky was frequently described as the world’s greatest living intellectual, and discussed in a manner that seemed to verge on hagiography rather than sober scholarship.

The trouble is that when one cuts through the impressive-sounding rhetoric, there is no actual evidence for this thesis about human cognition, and plenty of evidence against it. Chomsky and Pinker rely almost exclusively on evidence about language, but what they say about language just is not so. They grossly misrepresent the observable facts, though the facts they cite tend to be somewhat technical and hence hard for non-experts to assess.

One of my major areas of activity has been to marshal the case against Chomsky’s and Pinker’s “nativist” account of human cognition. For instance, Chomsky’s argument rests to quite a large extent on the claim that a certain kind of sentence is never used in English; Chomsky has written that “you can easily live your whole life without ever producing a relevant example … you can go over a vast amount of data of experience without ever finding such a case”. Chomsky quotes no evidence for this claim; I have studied samples of casual everyday speech, and these suggest that the sentences in question actually occur quite frequently. My various disagreements with linguistic nativism culminated in two books, The “Language Instinct” Debate, responding to Pinker’s The Language Instinct, and The Linguistics Delusion, which broadens the argument out beyond the specific points taken up by Pinker.

I am rather sorry if I have to count this area as my chief intellectual contribution, since it is essentially a negative one. Chomsky, Pinker, and their supporters have argued for a novel idea about human nature, and I am simply pointing out that the longstanding common-sense idea is the correct one, and Chomsky’s and Pinker’s arguments to the contrary are illogical and based on false premisses. However, negative or not, it is very clear (for instance from statistics of visits to my website) that this is the area of my work which has aroused most interest on the part of members of the public. That presumably reflects the extraordinary success which the nativists have achieved since the 1960s in converting people to their point of view: which in turn is a testimony to the power of public-relations techniques to overcome logic and common sense.

While Chomsky’s and Pinker’s nonsense was merely an abstract philosophical thesis, it did little harm. It is when it influences practical political behaviour that it becomes really offensive. One of my papers analyses how that thesis is now acting as an intellectual underpinning for the growth of a “New World Order”: Western nations are arrogating the right to override cultural differences between themselves and Third World countries, because they see Western civilization as the only “real” culture. If Chomsky and Pinker were correct about the human mind, there could not be significant cultural differences between peoples, because genetics would not allow it. But they are not correct.

Frequency Estimation

A very common problem in scientific and engineering contexts is inferring the frequency of a phenomenon from sample observations. This is a far trickier task than it might seem. If you observe birds visiting your garden and find that out of the first hundred you see, five are robins, then your best guess at the future frequency of robins will not be exactly five per cent! (The best guess will be slightly lower than that.) Suitable frequency-estimation techniques vary, depending on the nature of the phenomena sampled, but one method which gives good results over a wide range of cases was invented by the famous mathematician Alan Turing and his assistant I.J. Good during their work on wartime code-breaking at Bletchley Park. After the war, the Good–Turing technique was neglected for many years, because in its initial versions it was too mathematically cumbersome for most people to bother with. The late Bill Gale (of AT&T Research in the U.S.A.) and I defined a simpler version which is straightforward to understand and use (published originally in the Journal of Quantitative Linguistics vol. 2, 1995, and reprinted as chapter 7 of my book Empirical Lingustics). I was very much the junior partner in this collaboration – the original idea, and the expertise in statistical theory, all came from Bill’s side; my contributions lay in documenting the work to make it accessible to a wide audience, and developing computer software to implement the technique.

What we produced is now a recognized standard technique, and the software has been further developed by other researchers world-wide. I have been delighted to see our work being used in many diverse fields – some examples are:

breast cancer diagnosis
testing the reliability of safety-critical software
analysing mutation distribution in the COVID-19 virus genome
forensic investigation of the causes of fires
decisions on discharging patients from intensive-care units
making online shopping more efficient.

My own interest in frequency estimation arose in connexion with research on language, and the Good–Turing work was one way in which I have tried to reintroduce statistical thinking into linguistics (a discipline which for some time had been hostile to quantitative methods); see my encyclopaedia article on “statistical linguistics”. Another area in which I made this link was by experimenting with so-called “stochastic optimization” techniques for automatic parsing, including investigation of an efficient parallel-processing algorithm for optimizing tree structures. However, after my team and I had put several years of effort into these experiments (recounted in my book Evolutionary Language Understanding) and obtained promising initial results, we were eventually forced to conclude that our particular approach was a blind alley. This happens in science: the point of doing experiments is that one does not know how well they will turn out! One just has to cut one’s losses and explore alternative avenues of progress.

Empirical Linguistics

One reason why some of the language-related areas listed above were in need of attention when I began to address them was that linguistics in recent decades has been oddly uninterested in empirical data. Linguistics likes to call itself a “science”, but many of its leading practitioners have openly urged that as speakers of languages we know how our languages work and do not need to undertake painstaking empirical observations – indeed that it can be positively misleading to do so. This has often led to the same wild pseudoscientific theorizing that a comparable attitude would produce in other domains of enquiry.

One of my concerns has been to explain why this idea is mistaken, and how standard scientific method applies to linguistics as it does to other phenomena – see for instance chapter 8 of my book Empirical Linguistics. In recent years the empirical approach has gained more traction within linguistics. I have set out to quantify that development in the discipline. A study which examined the literature up to 2002 found that while there had been a marked movement towards empirical methods, about the turn of the century that shift appeared to go into reverse, with a number of linguists explicitly arguing against it. However, when the study was repeated ten years later, this counter-revolution turned out to have been a brief blip: happily, linguistics is now as empirically-minded as it has ever been.

Those linguists who do aim to make their research accountable to objective data often use electronic corpora as data sources. With my colleague Diana McCarthy I edited an anthology of corpus linguistics papers, to give individuals who are drawn into this field a sense of its past achievements and future possibilities. And see my discussion of the status and future of “corpus linguistics”, a discipline which can succeed only by disappearing.

Freedom and Creativity

I have tried to contribute to the better understanding and appreciation of the fundamental political concept of freedom, and the related concept of human creativity. My book An End to Allegiance analysed the “New Right” or “classical liberal” movement for a freer society. It was described by David Friedman (a distinguished exponent of the anarcho-capitalist political ideal, and son of Milton Friedman) as “the best survey of the liberal movement I have yet seen”.

(Back when the Iron Curtain was in place, I was once told by one of my students who was a Pole that my name was on a list of authors whose works were forbidden to be imported into the People’s Republic of Poland. Quite an honour!)

In an earlier book, Liberty and Language, I examined the presuppositions about human cognition on which discourse relating to political freedom is based, and I argued that influential current theories about cognition discussed under “Language and the Human Mind”, above, implicitly deny the possibility of creative freedom while purporting to do the reverse. (A paper “Two ideas of creativity”, reprinted in my 2017 book The Linguistics Delusion, points out that the theories make themselves appear plausible only by radically redefining the concept of creativity.) More recently, my paper “Economic growth and linguistic theory” has pointed out that those theories are flatly inconsistent with some of the most successful new thinking in academic economics. The linguists and the economists cannot both be right about intellectual creativity. I take it to be the linguists who have got human nature wrong — a claim which I flesh out further in The Linguistics Delusion.

Although the flowering of New Right thought in the 1970s and 1980s emerged chiefly from groups interested in politics and in economics, among professional philosophers by far the best known representative of the movement was the late Robert Nozick, for his book Anarchy, State and Utopia (1974). To me this was an irony, because Nozick’s case seemed so illogical as to risk undermining the movement in general. I argued against Nozick in a paper in vol. 87 of Mind, 1978.

The politics of freedom is an area where my “professional” writings shade off into journalism, including articles which were naturally often more ephemeral in nature. I do not detail these here.

Language Complexity

I am interested in the development of structural complexity in language, spoken and written. I have used the structurally-annotated samples mentioned under “Defining the Structure of English” above to study issues such as how children’s spontaneous written usage develops with age away from the patterns of speech towards the more complex structures characteristic of skilled writing (the trajectory they follow is by no means what one might have expected), and how spontaneous speech varies in complexity (see chapter 5 of Empirical Linguistics) – again contrary to what I would confidently have predicted beforehand, it seems that people’s speech continues to grow in structural complexity not just during childhood but on through middle and old age. This latter finding, while statistically significant, is not yet as robust as it might be; but if it is indeed correct, it suggests interesting social implications for concepts such as “lifelong learning”.

For most of the twentieth century, academic linguistics embraced a dogma according to which the languages of the world are all alike in structural complexity and have never developed in complexity within recorded history. Languages were seen as alternative forms of clothing for underlying thought-processes which are constant across humanity. That always seemed to me an expression of pure ideology rather than an assertion founded on fact. In reality languages differ in complexity, and for an individual, or a society, to acquire a more complex language surely means learning to think in subtler ways. At about the turn of the century the dogma began to come under attack from a number of different intellectual directions. David Gil and I organized a workshop at Leipzig in 2007 which assembled almost all of the leading questioners of the invariant-complexity dogma, and with Peter Trudgill we edited the resulting discussions into a book, Language Complexity as an Evolving Variable. (My introductory survey chapter is available online.)

The language complexity heading is also the area of my largest professional failure. Some years ago I was entrusted with a unique and remarkable resource, comprising extensive samples transcribed into typescript of the spontaneous writing and speech of a cross-section of English schoolchildren from the 1960s. This would in principle enable us to study a wealth of issues that ought to be of great interest to the teaching profession, for instance one would like to know more than any of us know today about exactly how children’s speech and writing skills develop with age, and whether the large changes in educational practice over the past forty years are reflected in any way in children’s patterns of literacy. What needed doing with the material went far beyond what one academic can achieve in the intervals between teaching and admin duties, so I made repeated attempts to raise funding from research sponsors either to analyse the material, or at least to publish it electronically so that anyone could analyse it, and to interest organizations representing the teaching profession in the material. Although previously I had enjoyed great success at winning funding for research on topics which seemed to me less socially significant than this one, here I drew a complete blank. After years of trying, I had to accept that I was too close to retirement to pursue the campaign further, and I handed the materials on to a group elsewhere with ideas of their own about how to exploit it.

This experience suggested worrying conclusions about present-day schoolteaching in Britain. The impression that came across from my vain attempts to make links with that profession was that it is nowadays so institutionally focused on the special problems of teaching ethnic minority children whose mother tongue is not English that it perhaps has little energy to spare for thinking about language and writing skills among the indigenous majority of children. (I also wondered, cynically, whether the profession preferred not to know what impact new modes of schooling might have had on children’s literacy skills. Impressionistically, the 1960s writing struck me as surprisingly rich and articulate for the children’s respective ages – but impressions are no use, one needs to inspect hard quantitative findings such as would have been produced by the research I planned.)

Writing Systems

For much of the twentieth century, academic linguistics focused on spoken language to the near-exclusion of writing, which was seen as an “artificial” language mode. In 1985 my book Writing Systems was perhaps the first fullscale attempt to integrate the study of scripts and their diversity into modern linguistics since I.J. Gelb’s Study of Writing (1952) – and Gelb’s book, which had many strengths, treated the global history of writing as a kind of conspiracy to evolve the alphabet, a view which just does not match the facts. A distinctive feature of my book is that it aims to pay due attention to what we know about the psychology of reading and writing, rather than limiting its scope to philological considerations. For instance, I use findings about the psychology of reading to argue that, for an advanced society, a highly irregular spelling system (such as that of English) is not the liability it seems, and may even be a net benefit.

There have been repeated claims by Western scholars that the world’s scripts are less diverse than is commonly supposed, and notably that Chinese writing is really much more like European writing systems than it seems. To my mind this is misguided. I answered the point, I believe adequately, when it was made in direct response to my book by John DeFrancis. Recently I co-authored with Zhiqun Chen of Monash University a refutation of an essentially similar argument by William Boltz, which at present seems surprisingly widely accepted in the West.

My book is cited as a standard reference in sources as diverse as the Unicode Standard, the British Museum series on Reading the Past, and the Encyclopædia Britannica, and I contributed an entry on “writing systems” in an encyclopaedia of cognitive science produced by the MIT Press.

Research, not just on the psychology of reading and writing but even on the ultimate origin of our own alphabet, has advanced since my first edition, and a new edition came out in 2015 bringing the story up to date.

The Beginnings of Chinese Culture

Although like any other educated person of the period I learned Latin in my schooldays, and have attempted to grapple with ancient Greek and Biblical Hebrew in adulthood, the only ancient language in which I can claim any special expertise is Chinese. During most of my career this knowledge was useful to me in assessing generalizations about language produced by linguists who considered mainly the familiar European languages, but I did not publish original work about Chinese. However, in 2020 I brought out a book which tries to make the earliest known stage of Chinese civilization more accessible to English-speaking readers, by translating an anthology of several hundred poems dating from the period 1000 to 600 B.C., one of the oldest monuments of Chinese literature and among the oldest works in any living language. The anthology has been translated before, but mainly by scholars more interested in technical philological problems than in the human interest of the poems – many of them are charming, for instance by women about their love problems. When I published a small selection of these translations in 2006, one Chinese reviewer was kind enough to write “This is the only readable translation I have found, certainly the only one that makes these ancient poems enjoyable to read.”

As well as translating the poems into English, my book spells them out alphabetically as they sounded to the original audiences – with rhymes and assonances that have been almost completely destroyed by changes in the Chinese language over the subsequent three thousand years. And a glossary lists each word used with its meaning. Because early Chinese was an unusually simple, grammar-free language, it is quite easy for an English-speaking reader to follow how the original wording of the poems means what the translation on the facing page says it means. This is a very rare, perhaps unique case where the non-expert can hear and understand people speaking to us, in their own voices, across several millennia.

Over and Out

Some of the areas listed above are ones where I have helped to find out new things. In other cases my role has been to draw together and present, clearly and systematically, findings by others which were scattered across specialist publications. As I see it, both kinds of activity are equally valuable components of scholarship; both of them are the sorts of thing that the general public expects its universities to achieve.

This is not, be it said, the way professional British academics are expected to work nowadays. The pressures that have been applied to us during the latter half of my career push us into avoiding “secondary” scholarship (systematizing others’ findings), and to focus on original research in very narrow specialisms – “knowing more and more about less and less”. The only criterion of worth is fellow-specialists’ opinion.

When I was a young man in the 1960s, we and our older mentors used to laugh (justifiably in my view) at this sort of thing happening in American universities; senior British academics quite explicitly held a different model of what universities are for. Now we are in a new century, I am sorry to say that the situation is by now worse in Britain than America. The British academic profession has been appallingly bad at defending its proper values, once government began to challenge them. By now, governance structures in many British universities have been revised so that those in the driving seat are often people who lack even an awareness of academic values, let alone loyalty to them.

Happily for myself, by the time this kind of thing got going I was old and obstinate enough largely to ignore it. I have tried to make sure that whatever I did in my professional life was worth doing. Whether my work has been done well or not is for others to judge, but at least I believe I have not wasted my time on pseudoscience or trivia.

(If anyone thinks that I am exaggerating the change in public understanding of the point of universities in Britain, consider the implications of the fact that government responsibility for them, which for generations had resided in an education ministry, was transferred in 2009 to the Department for Business, Innovation, and Skills. University teaching always did include training for certain kinds of job, but until recently its purposes were, and were generally accepted as being, much broader than just that. The old cliché that higher education aimed to make people better citizens, more fulfilled individuals – and also more valuable professionals – by encouraging them to assess complex ideas critically and make wise decisions in contexts where there is no clear “right answer” seems dead in Britain today.)

It may seem pompous for an academic to publish a summary of his own intellectual contributions. Probably it is pompous, but I can offer two excuses. One is that some time ago someone who is a stranger to me chose to create an entry about me in the popular Wikipedia reference source; that entry, together with subsequent revisions, read to me as an absurd travesty of a forty-year career. What Wikipedia and its enthusiasts choose to write about people they don’t know is their own business, but it would be expecting a lot to suppose that the subject of such a write-up would let it stand as the accepted account by default. (Shortly after I posted this page on the Web, and perhaps in consequence, the Wikipedia entry was improved.)

Secondly, a feature of the academic régime of the last thirty years or so has been frequent demands for us to account in detail for our activities to various authorities within and outside the institutions that employ us. Commonly, these inquisitions seem to assess us by criteria that are not obviously valid or suitable. The people whose efforts have actually supported me throughout most of my working life have been the long-suffering British taxpayers, who never normally get a chance to ask what we have been doing with their money. So I am taking this opportunity to tell any of them who happen to be interested.

Geoffrey Sampson

last changed 30 Oct 2023