I’m writing a Perl script and I’ve come to a point where I need to parse a Java source file line by line checking for references to a fully qualified Java class name. I know the class I’m looking for up front; also the fully qualified name of the source file that is being searched (based on its path).
For example find all valid references to foo.bar.Baz inside the com/bob/is/YourUncle.java file.
At this moment the cases I can think of that it needs to account for are:
-
The file being parsed is in the same package as the search class.
find foo.bar.Baz references in foo/bar/Boing.java
-
It should ignore comments.
// this is a comment saying this method returns a foo.bar.Baz or Baz instance // it shouldn't count /* a multiline comment as well this shouldn't count if I put foo.bar.Baz or Baz in here either */ -
In-line fully qualified references.
foo.bar.Baz fb = new foo.bar.Baz(); -
References based off an import statement.
import foo.bar.Baz; ... Baz b = new Baz();
What would be the most efficient way to do this in Perl 5.8? Some fancy regex perhaps?
open F, $File::Find::name or die; # these three things are already known # $classToFind looking for references of this class # $pkgToFind the package of the class you're finding references of # $currentPkg package name of the file being parsed while(<F>){ # ... do work here } close F; # the results are availble here in some form
You also need to skip quoted strings (you can’t even skip comments correctly if you don’t also deal with quoted strings).
I’d probably write a fairly simple, efficient, and incomplete tokenizer very similar to the one I wrote in node 566467.
Based on that code I’d probably just dig through the non-comment/non-string chunks looking for
\bimport\band\b\Q$toFind\E\bmatches. Perhaps similar to: