I’m learning to write a lexical analyzer generator (a clone of lex), based on regular expression to DFA direct translation algorithm described in “Dragon Book”.
Now I can successfully convert a regular expression to DFA, but I got stuck when there is multiple rules, for example:
abc { printf("abc"); }
a* { printf("a*); }
I can convert abc and a* to two DFA graphs, but how to combile these two DFA graphs to only one?
I actually did this exercise several years ago – I built an integrated lexer and LALR parser in c++ using the book as a guide. The book actually tells you how to convert regexes directly into NFAs and then you convert the NFAs into DFAs using using an algo I can’t quite remember the name of right now. To support multiple rules you just need to create an NFA for each one. Then you create a new start state and create a epsilon transition from your start state the the start state of each of the NFAs you created for each rule. At least, thats what I can remember without reviewing my code.