I am looking for a way to test a particular string to determine if it contains code.
For instance, I would like to pass a string such as “body{font-weight: bold;}” and determine that it is CSS.
I would like to do it for:
HTML,
CSS,
JavaScript,
Ruby,
C,C++,C#
I am guessing that it would be regex of some sort, but I am pretty stumped!
You need some kind of a classifier that uses a heurisitic/statistical approach. The accuracy will be better if the input string is larger (e.g. it’s hard to say what language
=belongs to).Here’s an example of a classifier that uses bayesian methods – http://www.rubyinside.com/sourceclassifier-identifying-programming-languages-quickly-1431.html
The highlight.js script does detection in javascript. Take a look at the source.