Is it a good idea to use the CKY chart parsing algorithm to parse the syntax of programming languages (knowing that it is mostly used to parse the syntax of natural language)?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
CKY can parse any context free language, but the time complexity is not great compared to alternatives. CKY requires the grammar to be in Chomsky Normal Form, which can blow up the size of the grammar and hurt running time too. It’s an okay approach for a quick-and-dirty parser, but you’ll run into issues when you try to scale up to larger inputs or complex grammars.
If you’re looking for an understandable parsing algorithm that’s relatively straightforward to implement, take a look at Parsing Expression Grammars (PEGs). They can recognize a large subset of context-free languages, plus some languages with limited context sensitivity. Once you have a working PEG parser it’s easy to add memoization, which gives you a Packrat Parser that runs in linear time. The academic papers on PEGs, Packrat, and this extension to allow left-recursive grammars are all quite understandable.