corpus linguistics

Glossary of Grammatical and Rhetorical Terms

corpus linguistics
"Corpus linguistics is concerned not just with describing patterns of form," says Winnie Cheng, "but also with how form and meaning are inseparable" (Exploring Corpus Linguistics: Language in Action, 2012). (Hardie/Getty Images)


Corpus linguistics is the study of language based on large collections of "real life" language use stored in corpora (or corpuses)--computerized databases created for linguistic research. Also known as corpus-based studies.

Corpus linguistics is viewed by some linguists as a research tool or methodology, and by others as a discipline or theory in its own right. Kuebler and Zinsmeister conclude that "the answer to the question whether corpus linguistics is a theory or a tool is simply that it can be both.

It depends on how corpus linguistics is applied" (Corpus Linguistics and Linguistically Annotated Corpora, 2015).

Although the methods used in corpus linguistics were first adopted in the early 1960s, the term corpus linguistics didn't appear until the 1980s. 

See Examples and Observations below. Also see:


Examples and Observations

  • "[C]orpus linguistics is . . . a methodology, comprising a large number of related methods which can be used by scholars of many different theoretical leanings. On the other hand, it cannot be denied that corpus linguistics is also frequently associated with a certain outlook on language. At the centre of this outlook is that the rules of language are usage-based and that changes occur when speakers use language to communicate with each other. The argument is that if you are interested in the workings of a particular language, like English, it is a good idea to study language in use. One efficient way of doing this is to use corpus methodology . . .."  
    (Hans Lindquist, Corpus Linguistics and the Description of English. Edinburgh University Press, 2009) 

  • "Corpus studies boomed from 1980 onwards, as corpora, techniques and new arguments in favour of the use of corpora became more apparent. Currently this boom continues--and both of the 'schools' of corpus linguistics are growing . . .. Corpus linguistics is maturing methodologically and the range of languages addressed by corpus linguists is growing annually."
    (Tony McEnery and Andrew Wilson, Corpus Linguistics, Edinburgh University Press, 2001)
  • Corpus Linguistics in the Classroom
    - "In the context of the classroom the methodology of corpus linguistics is congenial for students of all levels because it is a 'bottoms-up' study of the language requiring very little learned expertise to start with. Even the students that come to linguistic enquiry without a theoretical apparatus learn very quickly to advance their hypotheses on the basis of their observations rather than received knowledge, and test them against the evidence provided by the corpus."
    (Elena Tognini-Bonelli, Corpus Linguistics at Work. John Benjamins, 2001)

    - "To make good use of corpus resources a teacher needs a modest orientation to the routines involved in retrieving information from the corpus, and--most importantly--training and experience in how to evaluate that information."
    (John McHardy Sinclair, How to Use Corpora in Language Teaching, John Benjamins, 2004)
  • Quantitative and Qualitative Analyses
    - "Quantitative techniques are essential for corpus-based studies. For example, if you wanted to compare the language use of patterns for the words big and large, you would need to know how many times each word occurs in the corpus, how many different words co-occur with each of these adjectives (the collocations), and how common each of those collocations is. These are all quantitative measurements. . . .

    "A crucial part of the corpus-based approach is going beyond the quantitative patterns to propose functional interpretations explaining why the patterns exist. As a result, a large amount of effort in corpus-based studies is devoted to explaining and exemplifying quantitative patterns."
    (Douglas Biber, Susan Conrad, and Randi Reppen, Corpus Linguistics: Investigating Language Structure and Use, Cambridge University Press, 2004)

    - "[I]n corpus linguistics quantitative and qualitative methods are extensively used in combination. It is also characteristic of corpus linguistics to begin with quantitative findings, and work toward qualitative ones. But . . . the procedure may have cyclic elements. Generally it is desirable to subject quantitative results to qualitative scrutiny--attempting to explain why a particular frequency pattern occurs, for example. But on the other hand, qualitative analysis (making use of the investigator's ability to interpret samples of language in context) may be the means for classifying examples in a particular corpus by their meanings; and this qualitative analysis may then be the input to a further quantitative analysis, one based on meaning . . .."
    (Geoffrey Leech, Marianne Hundt, Christian Mair, and Nicholas Smith, Change in Contemporary English: A Grammatical Study. Cambridge University Press, 2012)