I need to parse C#, Ruby and Python source code to generate some reports. I need to get a list of method names inside a class, and I need some other info such as usage of global variable or something. Just parsing using RE could be a solution, but I expect a better (systematic) solution using parsers, if it is easily possible.
What parsers for those languages are provided?
For C#, I found http://csparser.codeplex.com/Wikipage , but for the others, I found a bunch of parsers using those languages, but not the language parsers of them.
It may be worth looking into the ANTLR parser generator.
You’ll find, on the ANTLR site, grammars for all 3 languages you are interested in (Although the Ruby grammar is only for a “simplified” version of the language).
The next difficulty may be to adapt these grammars for the particular target language you would like, i.e. the language in which the parsers themselves will be generated.
ANTLR’s grammar language is very expressive, allowing one to deal with various context-sensitive languages. This is done by inserting various snippets (in the target language) and/or semantic or syntactic predicates (also in the target language) amid the EBNF-like grammar; consequently the grammar is a bit messier and may need adapting when the target language is changed. The “native” target language of ANTLR is Java, but many other targets languages are supported.
On the whole, ANTLR represents a bit a setup/learning-curve effort, but since you need to deal with 3 languages, it may well be worth the investment, as this will allow you to have a uniform framework (over which you have “full” control), rather than trying to corral three possibly very distinct, and possibly more “locked down” parsers as you started doing.
All three languages are relatively sophisticated languages and although your goal is “merely” to identify methods within programs, you may be able to hack/simplify some of the grammars (or maybe simply “ignore” parts of them), only mapping the few parser-level rules of interest to your eventual goal.
Once these rules are identified, you can apply the same or similar actions, i.e. snippets (in the target language) which implement what you wish to accomplish when the parser encounters such rules (eg: store the method’s signature for future reporting, start counting the number of lines… whatever).
A final suggestion:
As hinted in comments to the question, and depending on your goals, you may be able to reuse existing utility programs to perform directly, or indirectly these goals.
Also, because indeed messing with parsers for these sophisticated languages may be somewhat overkill for you possibly simple and possibly error-tolerant goals, the Regular Expressions approach may fit the bill, somehow; the fact of the matter is that none of these languages are regular nor context free, so success with regex will be highly dependent on the eventual goals and on the input data (programs).
Yet another suggestion!
See Larry Lustig‘s answer! Introspection may simplify much of you task as well. The implication is that you’d need to a) write your logic within each of the the underlying language b) integrate/load the programs to be inspected. All depends, but again, a possible way out from the -let’s be fair- relatively heavy investment with formal grammar tools.