Pumping lemma for context-free languages

In computer science, in particular in formal language theory, the pumping lemma for context-free languages, also known as the Bar-Hillel lemma, is a lemma that gives a property shared by all context-free languages and generalizes the pumping lemma for regular languages. As the pumping lemma does not suffice to guarantee that a language is context-free there are more stringent necessary conditions, such as Ogden's lemma.

Formal statement

Proof idea: If is sufficiently long, its derivation tree w.r.t. a Chomsky normal form grammar must contain some nonterminal twice on some tree path (upper picture). Repeating times the derivation part ⇒...⇒ obtains a derivation for (lower left and right picture for and , respectively).

If a language is context-free, then there exists some integer (called a "pumping length"[1]) such that every string in that has a length of or more symbols (i.e. with ) can be written as

with substrings and , such that

1. ,
2. , and
3. for all .

Below is a formal expression of the Pumping Lemma.

Informal statement and explanation

The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this article) describes a property that all context-free languages are guaranteed to have.

The property is a property of all strings in the language that are of length at least , where is a constant—called the pumping length—that varies between context-free languages.

Say is a string of length at least that is in the language.

The pumping lemma states that can be split into five substrings, , where is non-empty and the length of is at most , such that repeating and any (and the same) number of times in produces a string that is still in the language (it is possible and often useful to repeat zero times, which removes and from the string). This process of "pumping up" additional copies of and is what gives the pumping lemma its name.

Finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having equal to the maximum string length in plus one. As there are no strings of this length the pumping lemma is not violated.

Usage of the lemma

The pumping lemma is often used to prove that a given language is non-context-free, by showing that arbitrarily long strings are in that cannot be "pumped" without producing strings outside .

For example, the language can be shown to be non-context-free by using the pumping lemma in a proof by contradiction. First, assume that is context free. By the pumping lemma, there exists an integer which is the pumping length of language . Consider the string in . The pumping lemma tells us that can be written in the form , where , and are substrings, such that , , and for every integer . By the choice of and the fact that , it is easily seen that the substring can contain no more than two distinct symbols. That is, we have one of five possibilities for :

  1. for some .
  2. for some and with
  3. for some .
  4. for some and with .
  5. for some .

For each case, it is easily verified that does not contain equal numbers of each letter for any . Thus, does not have the form . This contradicts the definition of . Therefore, our initial assumption that is context free must be false.

While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free.

On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma, for example ∪ { aibjcjdj | i, j ∈ ℕ, i≥1 }: for s=bjckdl with e.g. j≥1 choose vwx to consist only of b’s, for s=aibjcjdj choose vwx to consist only of a’s; in both cases all pumped strings are still in L.[2]

References

  1. Berstel, Jean; Lauve, Aaron; Reutenauer, Christophe; Saliola, Franco V. (2009). Combinatorics on words. Christoffel words and repetitions in words (PDF). CRM Monograph Series. 27. Providence, RI: American Mathematical Society. p. 90. ISBN 978-0-8218-4480-9. Zbl 1161.68043. (Also see [www-igm.univ-mlv.fr/~berstel/ Aaron Berstel's website)
  2. John E. Hopcroft, Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 0-201-02988-X. Here: sect.6.1, p.129
  • Bar-Hillel, Y.; Micha Perles; Eli Shamir (1961). "On formal properties of simple phrase-structure grammars". Zeitschrift für Phonetik, Sprachwissenschaft, und Kommunikationsforschung. 14 (2): 143–172. Reprinted in: Y. Bar-Hillel (1964). Language and Information: Selected Essays on their Theory and Application. Addison-Wesley series in logic. Addison-Wesley. pp. 116–150. ISBN 0201003732. OCLC 783543642.
  • Michael Sipser (1997). Introduction to the Theory of Computation. PWS Publishing. ISBN 0-534-94728-X. Section 1.4: Nonregular Languages, pp. 77–83. Section 2.3: Non-context-free Languages, pp. 115–119.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.