Word Boundaries

The importance of recognizing word boundaries is illustrated by this advertisement from the County Down Spectator.

Word boundaries are the beginning and the ending of a word.

In writing, word boundaries are conventionally represented by spaces between words. In speech, word boundaries are determined in various ways, as discussed below.

Examples of Word Boundaries

  • "When I was very young, my mother scolded me for flatulating by saying, 'Johnny, who made an odor?' I misheard her euphemism as 'who made a motor?' For days I ran around the house amusing myself with those delicious words." (John B. Lee, Building Bicycles in the Dark: A Practical Guide on How to Write. Black Moss Press, 2001
  • "I could have sworn I heard on the news that the Chinese were producing new trombones. No, it was neutron bombs." (Doug Stone, quoted by Rosemarie Jarski in Dim Wit: The Funniest, Stupidest Things Ever Said. Ebury, 2008
  • "As far as input processing is concerned, we may also recognize slips of the ear, as when we start to hear a particular sequence and then realize that we have misperceived it in some way; e.g. perceiving the ambulance at the start of the yam balanced delicately on the top . . .." (Michael Garman, Psycholinguistics. Cambridge University Press, 2000

    Word Recognition

    • "The usual criterion for word recognition is that suggested by the linguist Leonard Bloomfield, who defined a word as 'a minimal free form.' . . .
    • "The concept of a word as 'a minimal free form' suggests two important things about words. First, their ability to stand on their own as isolates. This is reflected in the space which surrounds a word in its orthographical form. And secondly, their internal integrity, or cohesion, as units. If we move a word around in a sentence, whether spoken or written, we have to move the whole word or none of it--we cannot move part of a word."
      (Geoffrey Finch, Linguistic Terms, and Concepts. Palgrave Macmillan, 2000)
    • "[T]he great majority of English nouns begins with a stressed syllable. Listeners use this expectation about the structure of English and partition the continuous speech stream employing stressed syllables."
      (Z.S. Bond, "Slips of the Ear." The Handbook of Speech Perception, ed. by David Pisoni and Robert Remez. Wiley-Blackwell, 2005)

    Tests of Word Identification

    • Potential pause: Say a sentence out loud, and ask someone to 'repeat it very slowly, with pauses.' The pauses will tend to fall between words, and not within words. For example, the / three / little / pigs / went / to / market. . . .
    • Indivisibility: Say a sentence out loud, and ask someone to 'add extra words' to it. The extra item will be added between the words and not within them. For example, the pig went to market might become the big pig once went straight to the market. . . .
    • Phonetic boundaries: It is sometimes possible to tell from the sound of a word where it begins or ends. In Welsh, for example, long words generally have their stress on the penultimate syllable . . .. But there are many exceptions to such rules.
    • Semantic units: In the sentence Dog bites vicar, there are plainly three units of meaning, and each unit corresponds to a word. But language is often not as neat as this. In I switched on the light, the has little clear 'meaning,' and the single action of 'switching on' involves two words.​

      (Adapted from The Cambridge Encyclopedia of Language, 3rd ed., by David Crystal. Cambridge University Press, 2010)

      Explicit Segmentation

      • ""[E]xperiments in English have suggested that listeners segment speech at strong syllable onsets. For example, finding a real word in a spoken nonsense sequence is hard if the word is spread over two strong syllables (e.g., mint in [mǀntef]) but easier if the word is spread over a strong and a following weak syllable (e.g., mint in [mǀntəf]; Cutler & Norris, 1988).

        The proposed explanation for this is that listeners divide the former sequence at the onset of the second strong syllable, so that detecting the embedded word requires recombination of speech material across a segmentation point, while the latter sequence offers no such obstacles to embedded word detection as the non-initial syllable is weak and so the sequence is simply not divided.

        Similarly, when English speakers make slips of the ear that involve mistakes in word boundary placement, they tend most often to insert boundaries before strong syllables (e.g., hearing by loose analogy as by Luce and Allergy) or delete boundaries before weak syllables (e.g., hearing how big is it? as how bigoted?; Cutler & Butterfield, 1992).

        These findings prompted the proposal of the Metrical Segmentation Strategy for English (Cutler & Norris, 1988; Cutler, 1990), whereby listeners are assumed to segment speech at strong syllable onsets because they operate on the assumption, justified by distributional patterns in the input, that strong syllables are highly likely to signal the onset of lexical words. . . .

        Explicit segmentation has the strong theoretical advantage that it offers a solution to the word boundary problem both for the adult and for the infant listener. . . .

        "Together these strands of evidence motivate the claim that the explicit segmentation procedures used by adult listeners may in fact have their origin in the infant's exploitation of
        rhythmic structure to solve the initial word boundary problem."​

        (Anne Cutler, "Prosody and the Word Boundary Problem." Signal to Syntax: Bootstrapping from Speech to Grammar in Early Acquisition, ed. by James L. Morgan and Katherine Demuth. Lawrence Erlbaum, 1996)