DFA / NFA to Regular Expression (without using GNFA)

The general approach to get a Regular expression (regex) from DFA/NFA is to convert a DFA to a GNFA by reducing states. Read more on wikipedia. However, for not too complicated DFA, it is often easier to get a regex from NFA instead of going through GNFA (which btw makes regex big).

For regex, the patterns that contains a substring, it is easy to write a regex. Lets say you have to write a regex for all strings that contains the string 010, where: ∑ = {0, 1}. Such cases are straight-forward. A regex would be:

(0 U 1)* 010 (0 U 1)*

But consider the following:

Write a regex for strings that does not contain substring 010. Where ∑ = {0, 1}.

It is not straight-forward. For such case, it is often easier to draw a DFA, then make NFA, and then get regex from it (without using GNFA approach).

Here’s one approach to solve such problem. The following steps are used:

  1. Create a DFA that contains the substring 010
  2. Complement the DFA and make the NFA from it (to get a NFA that does not contain 010)
  3. Get the Regex from it

Step #1: Creating the DFA that contains 010

Step #2: Complement the previous DFA (i.e. change the old accept state as reject state, and non-accept state as accept state). And, remove the unnecessary ‘d’ state to get the NFA.

Step #3: Now, we write the regex. There are several ways to do it, however, I use the following technique:

– From start state (a): You can either take 1 or 00*11 to come back to the start state (a), and repeat it.

So the first part of regex looks like following:

(1 U 00*11)*

– After reading the previous input sequence, you are still in start state (a). You can either end it there (as a is an accepting state). OR you can read 00* to get to State b to end it there, OR you read 00*1 and reach State c and end it there.

So the complete regex looks like the following:

(1 U 00*11)* (ε U 00* U 00*1)

Thats about it. I hope it helps out someone :).

Canada Postal Code & US Zip Code Regex

I decided to write a simple Canadian Postal Code and US Zip Code Regex. There are already good ones on the web, like here for example. But, I decided to write my own to make it slightly more accurate.

Canadian Postal Code Regex

Canada’s Postal Code format is ‘A1A 1X1’ or ‘a1a1x1’. Its made up of two parts. Forward Sortation Area (FSA) and Local Delivery Unit (LDU). Read more on wikipedia. The letters D, F, I, O, Q, or U are not used on postal Code. So, the Postal Code would be roughly (note that when applying the regex, the case-insensitive (i) modifier should be used):

US Zip Code Regex

US Zip Code format seems to be straightforward. It needs to have 5 digits followed by an optional 4 digit. So, the Zip Code would be roughly:

Merging US & Canadian Postal Code

Now, both Postal Code and Zip Code could be merged, using an OR (|) operator, into one regex: