I want to detect programming language with ruby
For example:
(PHP)
$a = array("1","2","3");
print_r($a);
(Ruby)
def index
end
etc.
What gem can do this?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Linguist might do that for you (it’s what GitHub uses to detect the primary languages in a project).
If you’re looking to build your own, that would be a good place to start. Here are a few more notes on what else you might have to do in order to make one.
File extensions are a good cheat. For example:
.rb– almost always ruby.cpp– almost always C++.h– could be C/C++…etc., then read the code line by line. There are usually common key words, or the placement of those words within the code that will tip you off pretty quickly as to what language it’s written in. A review of several “getting started” tutorial web sites for the languages that you want to support should give you a good summary of these things, without needing to actually learn the languages themselves. All you really need is a few unique things to each language that you can pick up on that makes a file definitively one language or another.
You could also use a Bayesian learning filter (there is a module called Classifier in Ruby that appears to do this) to train a more flexible learning engine to identify code by language on its own. Since programming languages are highly structured text, it shouldn’t take very long for your learning software to get extremely good at identifying the language. If you wanted to go totally crazy, you could even train it to identify not only the language, but the minimum version of the language that the code can be compiled against. For example, in Java, they added generics at a particular point in the language’s life cycle. If you see the use of generics in the code, then you know that the source was written for a certain minimum version of Java, etc.
A little more complex, but not much, will be questions like
.erbfiles. Do you call those “Embedded Ruby”, do you call them “Ruby”, or do you count the lines of HTML vs. Ruby vs. JavaScript, and call it by the most numerous language, or do you just tag the file with ALL the found languages? I suppose that’s really more of a design decision.