Chomsky Normal Form: 두 판 사이의 차이

2026년 1월 15일 (목) 15:19 기준 최신판

개요

CNF(Chomsky Normal Form)는 CFG를 더욱 간단하고 증명하기 쉽게 바꾼 표준 형식이다. 모든 CFG는 CNF와 동등한 문법(Grammars)으로 바뀔 수 있으며, CFG가 다음 세 가지의 규칙만 가지면 CNF에 해당한다:

[math]\displaystyle{ A \rightarrow BC }[/math]^[1]
[math]\displaystyle{ A \rightarrow a }[/math]
[math]\displaystyle{ S \rightarrow \epsilon }[/math]

CFG를 CNF로 바꾸더라도 새로운 문자열이 생성되지는 않으며, 모든 grammar와 문자열이 그대로 보존된다.

CFG to CNF

아래와 같이 CFG가 주어졌다고 가정하자:

[math]\displaystyle{ S \rightarrow ASA | aB }[/math]
[math]\displaystyle{ A \rightarrow B | S }[/math]
[math]\displaystyle{ B \rightarrow b | \epsilon }[/math]

Step 1

CNF에서는 시작기호가 우변에 등장하면 안 되므로, 기존 시작기호 [math]\displaystyle{ S }[/math]를 포함하는 새 시작기호 [math]\displaystyle{ S_0 }[/math]를 만든다. 이를 통해 모든 변환은 [math]\displaystyle{ S_0 }[/math]에서 시작한다. 이를 통해 아래와 같은 식을 추가한다:

[math]\displaystyle{ S_0 \rightarrow S }[/math]

Step 2

주어진 식에 [math]\displaystyle{ B \rightarrow \epsilon }[/math]이 존재하므로 제거해야 한다. 이를 위해서 [math]\displaystyle{ B\rightarrow \epsilon }[/math] 규칙을 제거하고, [math]\displaystyle{ B }[/math]를 포함하는 식이 유도되는 모든 규칙에서, [math]\displaystyle{ B }[/math]를 [math]\displaystyle{ \epsilon }[/math]으로 치환한 아래의 식을 추가한다:

[math]\displaystyle{ S \rightarrow aB \rightarrow a }[/math]
[math]\displaystyle{ A \rightarrow B \rightarrow \epsilon }[/math]

위와 같은 과정을 [math]\displaystyle{ A \rightarrow \epsilon }[/math] 규칙에 대해서도 동일하게 적용해야 한다. 이를 위해 해당 식을 제거하고, 아래의 식들을 추가한다:

[math]\displaystyle{ S \rightarrow S }[/math]
[math]\displaystyle{ S \rightarrow AS }[/math]
[math]\displaystyle{ S \rightarrow SA }[/math]

또한, [math]\displaystyle{ A \rightarrow S, A \rightarrow B, S \rightarrow S }[/math]와 같은 규칙 대신, 해당 변수들이 만들어낼 수 있는 규칙으로 대체해야 한다. 따라서 이를 일괄적으로 적용하면 아래와 같은 결과가 된다:

[math]\displaystyle{ S \rightarrow ASA|SA|AS|aB|a }[/math]
[math]\displaystyle{ B \rightarrow b }[/math]
[math]\displaystyle{ S_0 \rightarrow ASA|SA|AS|aB|a }[/math] ^[2]
[math]\displaystyle{ A \rightarrow ASA|SA|AS|aB|a }[/math] ^[3]
[math]\displaystyle{ A \rightarrow b }[/math] ^[4]

Step 3

이제, CNF에서는 규칙의 오른쪽에 두개의 비단말만 가능하므로 3개 변수인 [math]\displaystyle{ ASA }[/math] 형태를 분해해야 한다. 이를 위해 해당 규칙을 아래와 같이 중간 비단말 [math]\displaystyle{ S_1, A_1 }[/math]을 활용한 규칙들로 대체해야 한다.^[5]:

[math]\displaystyle{ S \rightarrow AS_1 }[/math]
[math]\displaystyle{ S_1 \rightarrow SA }[/math]
[math]\displaystyle{ A \rightarrow AA_1 }[/math]
[math]\displaystyle{ A_1 \rightarrow SA }[/math]

또한, CNF에서는 단말이 반드시 단독으로만 나와야 하므로, [math]\displaystyle{ S \rightarrow aB }[/math]와 같은 규칙은 허용되지 않는다. 따라서 해당 규칙을 아래와 같이 중간 비단말 [math]\displaystyle{ A_2 }[/math]을 활용한 규칙들로 대체해야 한다:

[math]\displaystyle{ A_2 \rightarrow a }[/math]
[math]\displaystyle{ S \rightarrow A_2B }[/math]

따라서 아래와 같이 최종적인 CNF가 도출된다:

[math]\displaystyle{ S \rightarrow AS_1|SA|AS|A_2B|a }[/math]
[math]\displaystyle{ S_0 \rightarrow AS_1|SA|AS|A_2B|a }[/math]
[math]\displaystyle{ A \rightarrow AA_1|SA|AS|A_2B|a|b }[/math] 
[math]\displaystyle{ B \rightarrow b }[/math]
[math]\displaystyle{ S_1 \rightarrow SA }[/math]
[math]\displaystyle{ A_1 \rightarrow SA }[/math]
[math]\displaystyle{ A_2 \rightarrow a }[/math]

각주

↑ 이때 B, C는 시작 기호 S가 아니다
↑ [math]\displaystyle{ S_0 \rightarrow S }[/math] 규칙을 삭제하며 추가
↑ [math]\displaystyle{ A \rightarrow S }[/math] 규칙을 삭제하며 추가
↑ [math]\displaystyle{ A \rightarrow B }[/math] 규칙을 삭제하며 추가
↑ S_1, A_1과 같이 굳이 변수를 두개 추가하는 이유는 S에서 파생된 규칙과 A에서 파생된 규칙을 구분하기 위해서이다.

[1] 이때 B, C는 시작 기호 S가 아니다

[2] [math]\displaystyle{ S_0 \rightarrow S }[/math] 규칙을 삭제하며 추가

[3] [math]\displaystyle{ A \rightarrow S }[/math] 규칙을 삭제하며 추가

[4] [math]\displaystyle{ A \rightarrow B }[/math] 규칙을 삭제하며 추가

[5] S_1, A_1과 같이 굳이 변수를 두개 추가하는 이유는 S에서 파생된 규칙과 A에서 파생된 규칙을 구분하기 위해서이다.

[1]

[2]

[3]

[4]

[5]

@@ 1번째 줄: / 1번째 줄: @@
 [[분류:계산 이론 개론]]
-[[분류:컴퓨터 공학]]
 상위 문서: [[Grammars#Context-Free Languages#Chomsky Normal Form|Context-Free Languages]]
@@ 28번째 줄: / 28번째 줄: @@
   <math>S \rightarrow AS</math>
   <math>S \rightarrow SA</math>
-또한, <math>A \rightarrow S, A \rightarrow B, S \rightarrow</math>와 같은 규칙 대신, 해당 변수들이 만들어낼 수 있는 규칙으로 대체해야 한다. 따라서 이를 일괄적으로 적용하면 아래와 같은 결과가 된다:
+또한, <math>A \rightarrow S, A \rightarrow B, S \rightarrow S</math>와 같은 규칙 대신, 해당 변수들이 만들어낼 수 있는 규칙으로 대체해야 한다. 따라서 이를 일괄적으로 적용하면 아래와 같은 결과가 된다:
   <math>S \rightarrow ASA|SA|AS|aB|a</math>
   <math>B \rightarrow b</math>
@@ 46번째 줄: / 46번째 줄: @@
 따라서 아래와 같이 최종적인 CNF가 도출된다:
   <math>S \rightarrow AS_1|SA|AS|A_2B|a</math>
+ <math>S_0 \rightarrow AS_1|SA|AS|A_2B|a</math>
+ <math>A \rightarrow AA_1|SA|AS|A_2B|a|b</math>
   <math>B \rightarrow b</math>
-  <math>S_0 \rightarrow AS_1|SA|AS|A_2B|a</math>
+  <math>S_1 \rightarrow SA</math>
-  <math>A \rightarrow AA_1|SA|AS|A_2B|a</math>
+  <math>A_1 \rightarrow SA</math>
- <math>A \rightarrow b</math>
   <math>A_2 \rightarrow a</math>
-==CYK Algorithm==
-CNF으로 변환된 문법을 기반으로, 주어진 문자열이 언어 <math>L</math>에 속하는지를 결정론적으로(deterministic) 판단할 수 있다. 이때 recognizer라는 개념이 등장하는데, recognizer는 문자열 <math>w</math>를 입력받아, 그 문자열이 언어 <math>L</math>에 속하는지 여부를 판별하는 알고리즘이다. 이때 아래와 같이 수식이 정의된다:
-* <math>L = L(G)</math><ref>즉, <math>L</math>이 생성하는 언어를 의미한다.</ref>
-* <math>w = w_1w_2\cdots w_n</math>
-* <math>D(i, l, A) = true \leftrightarrow A \Rightarrow* w_iw_{i+1}\cdots w_{i+l-1}</math><ref>이는 비단말 <math>A</math>가 <math>w</math>의 i번째 문자부터 길이 <math>l</math>만큼의 부분 문자열을 생성할 수 있으면 true라는 의미이다.</ref>
-즉, 위에서 <math>D(i, l, A)</math>은 "비단말 <math>A</math>가 <math>w</math>의 특정 구간을 유도할 수 있는가?"를 기록하는 boolean 테이블이다. 아래는 <math>D(i, l, A)</math>가 true가 되는 두가지 경우이다:
-# 문법에 규칙 <math>A \rightarrow a</math>가 존재하는 경우
-#* 입력 문자열의 i번째 문자가 <math>a</math>이며 <math>l = 1</math>
-# 문법에 규칙 <math>A \rightarrow BC</math>이 존재하는 경우, 어떤 분할점 <math>k\,\, (1 \le k < l)</math>에 아래 두 조건이 모두 참
-#* <math>D(i,k,B)</math>
-#* <math>D(i+k, l-k, C)</math>
-즉, A가 길이 l의 부분문자열을 만들 수 있으려면 좌측 비단말 B가 앞쪽 부분, 우측 비단말 C가 뒷부분을 생성할 수 있어야 한다. 이때 아래와 같은 명제가 성립한다:
- <math>w \in L \Leftrightarrow (w = \epsilon \land S \rightarrow \epsilon) \lor D(1, |w|, S)</math>
-CYK(Cocke–Younger–Kasami) 알고리즘은 CFL(Context-Free Language)를 인식하기 위한 동적 프로그래밍 기반의 알고리즘이다.
-<math></math>
-<math></math>
-<math></math>
-<math></math>
-<math></math>
-<math></math>
-<math></math>
-<math></math>
-<math></math>
-<math></math>
 ==각주==