What is a Lemma?


In morphology and lexicology, the form of a word that appears at the beginning of a dictionary or glossary entry: a headword.

The lemma, says David Crystal, is "essentially an abstract representation, subsuming all the formal lexical variations which may apply" (Dictionary of Linguistics and Phonetics, 2008).

Examples and Observations:

  • "The lemma is the base form under which the word is entered [in a dictionary] and assigned its place: typically, the 'stem,' or simplest form (singular noun, present/infinitive verb, etc.). Other forms may not be entered if they are predictable (such as the plural bears, not given here); but the irregular past forms of the verbs are given (irregular in the sense that they do not follow the default pattern of adding -ed) and there is also an indication under cut that the t must be doubled in the spelling of inflected forms like cutting. An irregular form may appear as a separate lemma, with cross reference. This dictionary [the two-volume New Shorter Oxford English Dictionary, 1993] has such an entry for borne v. pa. pple & ppl a. of BEAR v., indicating that borne is the past participle and participial adjective of the verb bear."
    (M. A. K. Halliday and Colin Yallop, Lexicology: A Short Introduction. Continuum, 2007)
  • Lemmas and Lexemes
    "The conventional term lemma is currently used in corpus research and psycholinguistic studies as quasi-synonymous with lexeme. But lemma cannot be confused with lexemes. For example, the editors of the British National Corpus warn users that items such as phrasal verbs, that is, verbs containing two or three parts like turn out, or look forward to, which lexicologists treat as lexical units, can only be accessed through separate lemmas. In the case of turn out, it contains two lemmas, and in that of look forward to, three. Also, homonymic distinction is not always established by the editors of lists containing lemmas (Leech, Rayson and Wilson 2001).

    "However, a lemma does resemble the lexeme concept in other ways. Linguistic corpora allow for two basic searches, one of which produces lemmatized word lists, that is word lists containing lemmas, and another one containing unlemmatized word lists, that is word lists containing word forms. . . .

    "Finally, dictionary headwords cannot always be identified with lexemes. For instance, the headword bubble, in a dictionary like the OALD [Oxford Advanced Learner's Dictionary] includes information about the noun bubble and the verb bubble within the same entry. For a lexicologist, these represent two different lexemes."
    (Miguel Fuster Márquez, "English Lexicology." Working with Words: An Introduction to English Linguistics, ed. by Miguel Fuster and Antonia Sánchez. Universitat de València, 2008)
  • The Morphological Status of Lemmas
    "What is the morphological status of lemmas? Several hypotheses have been set forth, for example:
    1) that every 'word' (free form), including inflectional forms and word-formations, has its own entry and corresponds to a lemma; a weaker one is
    2) that not all words have their own entry, i.e. 'regular' inflectional forms and perhaps word-formations make up a part of the entry of the base and are accessed via that base;
    3) that stems or roots, rather than free-standing forms, form the lemma, independently of whether other forms derived from these are 'regular' or not."
    (Amanda Pounder, Processes and Paradigms in Word Formation Morphology. Mouton de Gruyter, 2000)
  • Measuring Lemma Frequency
    "[T]here is a problem with word frequency in that it is unclear what the correct measure of frequency is. There exists a number of different ways of counting word frequency and these are not theory neutral. . . .

    "One example is lemma frequency; this is the cumulative frequency of all the word form frequencies of words within an inflectional paradigm. The lemma frequency of the verb help, for example, is the sum of the word form frequencies of help, helps, helped and helping. In accounts of language processing in which regular inflectional forms are decomposed and map onto root morphemes, we would expect the frequency of the root to be more critical for determining response latencies than word form frequency and hence the lemma frequency would play a prominent role.

    "Accounts in which other complex forms are also decomposed (e.g., inflections, derivations and compounds) will instead emphasise the cumulative morpheme frequency, which is the sum of the frequencies of all the complex words in which a root morpheme appears. For example, the cumulative morpheme frequency of help would be the sum of the lemma frequency of help plus the lemma frequencies of helpful, helpless, helplessness etc. Another measure, family size, is the number of word types in which a morpheme occurs, rather than the number of tokens in it. The word help has a family size of ten."
    (Michael A. Ford, William D. Marslen-Wilson, and Matthew H. Davis, "Morphology and Frequency: Contrasting Methodologies." Morphological Structure in Language Processing, ed. by R. Harald Baayen and Robert Schreuder. Mouton de Gruyter, 2003)
