1 Alphabets, Languages, and Grammars
1.1 Terminology
- An alphabet (or vocabulary) is just a finite, non-empty set. It’s elements are called letters or symbols.
- A word (or string) is a finite sequence of symbols.
- A word consisting of no symbols is called the empty word and denoted \(\lambda\).
(Think
""
.) - The set of all words using exactly \(n\) symbols from \(V\) (repetition allowed) is denoted \(V^n\).
- The set of all words using symbols in alphabet \(V\) is denoted \(V^*\). (So \(V^* = V^0 \cup V^1 \cup V^2 \cdots\).)
- A language over \(V\) is a subset of \(V^*\).
Exercises
Let \(V = \{0, 1\}\). What is \(V^0\)? What is \(V^2\)? What is \(V^*\)?
Let \(V = \{2\}\). What is \(V^0\)? What is \(V^2\)? What is \(V^*\)?
Can \(\emptyset\) (empty set) be a letter? an alphabet? a word? a language?
1.2 (Phrase-Structure) Grammars
A phase-structure grammar consists of
- alphabet: \(V\)
- start symbol: \(S \in V\)
- terminal symbols: \(T \subset V\)
- \(N = V - T\) is the set of nonterminal symbols
- production rules: \(P \subseteq (V^*-N^*) \times V^*\)
- usually write elements of \(P\) as \(u \to v\) rather than \((u, v)\).
- \(V^*-N^*\) says that \(u\) must contain at least on nonterminal.
- \(v\) can be any combinaton of terminals and nonterminals (including the empty string).
Examples
Note: in each of these examples, capital letters are used for nonterminals and lower case letters or digits for terminals. That makes it easy to remember, but it is not a requirement of the definition of a grammar.
Grammar 1 (\(G_1\)): alphabet: \(\{a, b, A, B, S\}\), terminals: \(\{a, b\}\), start symbol: \(S\), production rules:
- \(S \to ABa\)
- \(A \to BB\)
- \(B \to ab\)
- \(AB \to b\)
Grammar 2 (\(G_2\)): alphabet: \(\{S, A, a, b\}\), terminals: \(\{a,b\}\), start symbol: \(S\), production rules:
- \(S \to aA\)
- \(S \to b\)
- \(A \to aa\)
Grammar 3 (\(G_3\)): alphabet: \(\{S, 0, 1\}\), terminals: \(\{0,1\}\), start symbol: \(S\), production rules:
- \(S \to 11S\)
- \(S \to 0\)
Grammar 4 (\(G_4\)): alphabet: \(\{S, A, B, C, a, b, c\}\), terminals: \(\{a, b, c\}\), start symbol: \(S\), production rules:
- \(S \to AB\)
- \(A \to Ca\)
- \(B \to Ba\)
- \(B \to Cb\)
- \(B \to b\)
- \(C \to cb\)
- \(C \to b\)
1.2.1 Derivations and Languages
The rules of a grammar are used to derive strings of terminals (elements of \(T^*\)) as follows.
- If \(w_0 \to w_1\) is a rule, then \(lw_0r \Rightarrow lw_1r\) for any strings \(l\) and \(r\).
- \(a \stackrel{*}{\Rightarrow}b\) is defined recursively. \(a \stackrel{*}{\Rightarrow}b\) if either
- \(a \Rightarrow b\), or
- \(\exists c \; (a \Rightarrow c \land c \stackrel{*}{\Rightarrow}b)\)
Basically the production rules are “rewrite rules” that allow us to replace the left side of the rule with the right side. A derivation is a sequence of rewrites.
The language of a grammar \(G\) is denoted \(L(G)\) and contains all strings of terminals that can be derived from the start symbol:
\[L(G) = \{w \in T^* \mid S \stackrel{*}{\Rightarrow}w\}\]
Exercises
Show that using Grammar 1, \(ABa \stackrel{*}{\Rightarrow}abababa\).
Is \(abababa \in L(G_1)\), the language generated by Grammar 1?
What is \(L(G_2)\)?
What is \(L(G_3)\)?
Generate several words using \(G_4\).
Create a grammar \(G_5\) such that \(L(G_5) = \{ 0^n1^n \mid n = 0, 1, 2, ... \}\)
Create a grammar \(G_6\) such that \(L(G_6) = \{ 0^n1^n \mid n \in \mathbb{Z}^+ \}\)
Create a grammar \(G_7\) such that \(L(G_7) = \{ 0^m1^n \mid m, n \in \mathbb{N} \}\)