COMS W3261
Computer Science Theory
Lecture 10: October 8, 2012
Properties of Context-Free Languages
Outline
- Eliminating useless symbols
- Eliminating ε-productions
- Eliminating unit productions
- Chomsky normal form
- Pumping lemma for CFL's
- Cocke-Younger-Kasami algorithm
1. Eliminating Useless Symbols from a CFG
- A symbol X is useful for a CFG if there is a derivation of the
form S ⇒* αXβ ⇒* w
for some string of terminals w.
- If X is not useful, then we say X is useless.
- To be useful, a symbol X needs to be
- generating; that is, X needs to be able to derive some string of terminals.
- reachable; that is, there needs to be a derivation of the form
S ⇒* αXβ where α and β
are strings of nonterminals and terminals.
- To eliminate useless symbols from a grammar, we
- identify
the nongenerating symbols and eliminate all productions containing
one or more of these symbols, and then
- eliminate all productions containing symbols that are not reachable
from the start symbol.
- In the grammar
S → AB | a
A → b
S
, A
, a
, and
b
are generating. B
is not generating.
Eliminating the productions containing the nongenerating symbols we get
S → a
A → b
Now we see A
is not reachable from S
, so
we can eliminate the second production to get
S → a
The generating symbols can be computed inductively bottom-up from the
set of terminal symbols.
The reachable symbols can be computed inductively starting from S
.
2. Eliminating ε-productions from a CFG
- If a language L has a CFG, then L - { ε } has a CFG without
any ε-productions.
- A nonterminal A in a grammar is nullable if
A ⇒* ε.
- The nullable nonterminals can be determined iteratively.
- We can eliminate all ε-productions in a grammar as follows:
- Eliminate all productions with ε bodies.
- Suppose A → X1X2 ... Xk
is a production and m of the k Xi's
are nullable. Then add the 2m versions of this
production where the nullable Xi's are present
or absent. (But if all symbols are nullable, do not add an
ε-production.)
- Let us eliminate the ε-productions from the grammar G
S → AB
A → aAA | ε
B → bBB | ε
S, A and B are nullable.
For the production S → AB
we add the productions S → A | B
For the production A → aAA
we add the productions A → aA | a
For the production B → bBB
we add the productions B → bB | b
The resulting grammar H with no ε-productions is
S → AB | A | B
A → aAA | aA | a
B → bBB | bB | b
We can prove that L(H) = L(G) - { ε }.
3. Eliminating Unit Productions from a CFG
- A unit production is one of the form
A → B
where both A
and B
are nonterminals.
- Let us assume we are given a grammar G with no ε-productions.
- From G we can create an equivalent grammar H with no unit productions
as follows.
- Define (A, B) to be a unit pair if A ⇒* B in G.
- We can inductively construct all unit pairs for G.
- For each unit pair (A, B) in G, we add to H the productions
A → α where B → α is a nonunit production of G.
- Consider the standard grammar G for arithmetic expressions:
E → E + T | T
T → T * F | F
F → ( E ) | a
The unit pairs are (E,E), (E,T), (E,F), (T,T), (T,F), (F,F)
.
The equivalent grammar H with no unit productions is:
E → E + T | T * F | ( E ) | a
T → T * F | ( E ) | a
F → ( E ) | a
4. Putting a CFG into Chomsky Normal Form
- A grammar G is in Chomsky Normal Form if each production in G is
one of two forms:
- A → BC where A, B, and C are nonterminals, or
- A → a where a is a terminal.
- Every context-free language without ε can be generated by
a Chomsky Normal Form grammar.
- Let us assume we have a CFG G with no useless symbols, ε-productions,
or unit productions. We can transform G into an equivalent Chomsky Normal
Form grammar as follows:
- Arrange that all bodies of length two or more consist only of nonterminals.
- Replace bodies of length three or more with a cascade of productions, each with
a body of two nonterminals.
- Applying these two transformations to the grammar H above, we get:
E → EA | TB | LC | a
A → PT
P → +
B → MF
M → *
L → (
C → ER
R → )
T → TB | LC | a
F → LC | a
5. Pumping Lemma for CFL's
- For every nonfinite context-free language L,
there exists a constant n that depends on L such that
for all z in L with |z| ≥ n, we can write
z as uvwxy where
- vx ≠ ε,
- |vwx| ≤ n, and
- for all i ≥ 0, the string uviwxiy is in L.
- Proof: See HMU, pp. 281-282.
- One important use of the pumping lemma is to prove certain
languages are not context free.
- Example: The language L =
{ anbncn | n ≥ 0 }
is not context free.
- The proof will be by contradiction. Assume L is context free.
Then by the pumping lemma there is a constant n associated with L
such that for all z in L with |z| ≥ n,
z can be written as uvwxy
such that
- vx ≠ ε,
- |vwx| ≤ n, and
- for all i ≥ 0, the string
uviwxiy is in L.
- Consider the string z =
anbncn.
- From condition (2), vwx cannot contain both a's and c's.
- Two cases arise:
- vwx has no c's. But then uwy cannot be in L
since at least one of v or x is nonempty.
- vwx has no a's. Again, uwy cannot be in L.
- In both cases we have a contradiction, so we must conclude L cannot be context free.
The details of the proof can be found in HMU, p. 284.
6. Cocke-Younger-Kasami Algorithm for Testing Membership in a CFL
- Input: a Chomsky normal form CFG G = (V, T, P, S) and a string w =
a1a2 ... an
in T*.
- Output: "yes" if w is in L(G), "no" otherwise.
- Method: The CYK algorithm is a dynamic programming algorithm that fills in
a triangular table
Xij
with nonterminals A
such that A ⇒*
aiai+1 ...
aj.
for i = 1 to n do
if A → ai is in P then
add A to Xii
fill in the table, row-by-row, from row 2 to row n
fill in the cells in each row from left-to-right
if (A → BC is in P) and for some i ≤ k < j
(B is in Xik) and (C is in Xk+1,j) then
add A to Xij
if S is in X1n then
output "yes"
else
output "no"
The algorithm adds nonterminal A to Xij
iff there is a
production A → BC in P where B ⇒*
aiai+1 ... ak
and C ⇒*
ak+1ak+2 ... aj.
To compute entry Xij
, we examine at most
n pairs of entries:
(Xii
, Xi+1,j
),
(Xi,i+1
, Xi+2,j
),
and so on until
(Xi,j-1
, Xj,j
).
The running time of the CYK algorithm is O(n3).
7. Practice Problems
- Eliminate useless symbols from the following grammar:
S → AB | CA
A → a
B → BC | AB
C → aB | b
- Put the following grammar into Chomsky Normal Form:
S → ASB | ε
A → aAS | a
B → BbS | A | bb
C → aB | b
- Show that { anbncn | n ≥ 0 }
is not context free.
- Show that { anbnci | i ≤ n }
is not context free.
- Show that { ssRs | s is a string
of a's and b's } is not context free.
- (Hard) Show that the complement of { ss | ss is a string
of a's and b's } is context free.
8. Reading Assignment
aho@cs.columbia.edu