2.3 Non-Context-Free Languages How do you prove a language is Context Free? construct a CFG or a PDA What happens when you try to build a machine that recognizes the language? B = { a^n b^n c^n | n >= 0 } When you try to build a machine and you fail, have you proven the language is not Context Free? success in building machine proves language is context free failure proves nothing How do you prove a language is not Context Free? The Pumping Lemma for Context-Free Languages What's Pumping? strings in context-free languages can be pumped (if they are long enough) pumping means two sections of a string can be repeated a^n b^n aaaabbbb How do you know that all context-free languages have the pumping property? Consider a string s in some context-free language. Suppose the parse tree for s has height h. This means the parse tree has a path of length h from the root to a leaf. How many nodes on this path are labelled with variables? terminals? (The length of a path is the number of edges on the path.) h variables one terminal Suppose the grammar has v variables and the path has length v+1. Since the path has length v+1, v+1 nodes are labelled with variables. If the grammar has only v variables but the path uses v+1 variables, what must be true about the variables used on the path? there's a repeated variable somewhere on the path Suppose R is the repeated variable. This means the parse tree has two subtrees with roots labelled R, where the lower subtree is contained within the upper subtree. (Draw a diagram of the parse tree on the board.) What happens if you replace the lower subtree with a copy of the upper subtree? you get another parse tree that can be produced by the grammar Suppose you divide s into five parts, where u is the yield of the parse tree to the left of the upper subtree, v is the yield of the upper subtree to the left of the lower subtree, x is the yield of the lower subtree, y is the yield of the upper subtree to the right of the lower subtree, z is the yield of the parse tree to the right of the upper subtree. The grammar generates the string uvxyz. Does the grammar also generate the string uvvxyyz? What other strings does the grammar generate? yes, the grammar generates uvvxyyz the grammar generates u(v^i)x(y^i)z for each i >= 0 If the parse tree used to divide s into uvxyz is the smallest parse tree for s, what can you say about the length of the v and y parts of the string? the length of vy must be at least 1 1. if both v and y were epsilon, you could pump down (remove v and y) and produce s with a smaller parse tree 2. assume you don't use rules that turn R into R without producing anything else Suppose the longest production in a grammar has b symbols on the right side. How long a string can you produce with a parse tree of height 1? 2? h? in 1 step you can make a string of length at most b in 2 steps you can make a string of length at most b*b in h steps you can make a string of length at most b^h Can you turn that around? How long must a string be to guarantee the parse tree is at least height h? the string must be at least length b^h Since a tree must be at least height v+1 to cause a repeated variable, how long does the string s need to be to ensure that s is pumpable? What's the pumping length in terms of the number of variables in the grammar? the length of s must be at least b^(v+1) the pumping length p is b^(v+1) How far from the leaves can you find a repeated variable? Is there a repeated variable at most v+1 steps away from a leaf? If you choose such a repeated variable to divide s into uvxyz, can you say anything about the length of the vxy part of the string? the length of vxy is at most p the height of the upper subtree is at most v+1 the length of the yield of the upper subtree (vxy) is at most b^(v+1) What's the Pumping Lemma? if A is a context-free language then there is a number p (the pumping length) where if s is any string in A with length at least p then s can be divided into s = uvxyz where three conditions are true 1. for each i >= 0, u(v^i)x(y^i)z is in A 2. length of vy > 0 3. length of vxy <= p Using the Pumping Lemma Can you use the Pumping Lemma to show a language is not context free? YES pumping lemma says if a language is context-free, it can be pumped show the language can't be pumped, then it can't be context free What are the standard steps for using the Pumping Lemma to prove a language B is not context free? 1. assume B is context free (proof by contradiction) 2. find a string s in B with length at least p that cannot be pumped 3. state (by the pumping lemma) that strings in B with length at least as long as the pumping length p can be pumped 4. show that no matter how you divide s into uvxyz, s cannot be pumped Prove language C is not context free using the pumping lemma. C = { a^n b a^n b a^n | n >= 0 } The proof is by contradiction. Assume C is context free. Let p be the pumping length given by the pumping lemma. Choose s to be the string a^p b a^p b a^p. Since s is in C and s is longer than p, by the pumping lemma, s can be split into uvxyz such that uvvxyyz is in C. Consider two ways of dividing s. 1. Either v or y contains a b. Then uvvxyyz contains more than two b's and is not in C. 2. Both v and y are all a's. Then uvvxyyz cannot contain the same number of symbols in all three groups of a's that must stay balanced so it is not in C. No matter how s is divided it cannot be pumped. So the assumption is false and C is not context free. Are you allowed to choose a specific value for p when using the pumping lemma? NO, the lemma doesn't claim to be true for any value of p Are you allowed to choose a specific string s when using the pumping lemma? YES, as long as s is at least as long as p Are you allowed to choose a specific division of s into u, v, x, y, and z when using the pumping lemma? NO, the lemma doesn't claim to work for any division Can you use the Pumping Lemma to show a language is context free? NO pumping lemma says if a language is context free it can be pumped it does not say if a language can be pumped, it is context free When you try to prove a language is not context free using the pumping lemma and you fail, have you proven the language is context free? NO never use the pumping lemma to prove a language is context free Suppose you try to prove a language is not context free using the pumping lemma and you discover a string that can be pumped. What have you proven? nothing Classwork (You may work with a partner.) Prove language D is not context free using the pumping lemma. (Hint: use the limitation on the length of vxy to limit which groups of symbols can be part of vxy.) D = { a^n b^m a^n b^m | n,m >= 0 } The proof is by contradiction. Assume D is context free. Let p be the pumping length given by the pumping lemma. Choose s to be the string a^p b^p a^p b^p. Since s is in D and s is longer than p, by the pumping lemma s can be split into uvxyz such that uvvxyyz is in D. Consider two ways of dividing s. 1. vxy contains symbols from only a single group of symbols. Then uvvxyyz increases the size of exactly one group of symbols. But for each group of symbols there is another group of symbols must stay balanced in size so uvvxyyz is not in D. 2. vxy crosses a boundary between two groups of symbols. (Note that vxy cannot cross two boundaries between groups of symbols because the length of vxy is at most p.) Then uvvxyyz increases the size of either one group of symbols or two groups of adjacent symbols. But the groups of symbols that must stay balanced in size are not adjacent so uvvxyyz is not in D. No matter how s is divided it cannot be pumped. So the assumption is false and D is not context free.