Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 113765
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T02:45:14+00:00 2026-05-11T02:45:14+00:00

What would be the best way to detect what programming language is used in

  • 0

What would be the best way to detect what programming language is used in a snippet of code?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T02:45:15+00:00Added an answer on May 11, 2026 at 2:45 am

    I think that the method used in spam filters would work very well. You split the snippet into words. Then you compare the occurences of these words with known snippets, and compute the probability that this snippet is written in language X for every language you’re interested in.

    http://en.wikipedia.org/wiki/Bayesian_spam_filtering

    If you have the basic mechanism then it’s very easy to add new languages: just train the detector with a few snippets in the new language (you could feed it an open source project). This way it learns that "System" is likely to appear in C# snippets and "puts" in Ruby snippets.

    I’ve actually used this method to add language detection to code snippets for forum software. It worked 100% of the time, except in ambiguous cases:

    print "Hello" 

    Let me find the code.

    I couldn’t find the code so I made a new one. It’s a bit simplistic but it works for my tests. Currently if you feed it much more Python code than Ruby code it’s likely to say that this code:

    def foo    puts "hi" end 

    is Python code (although it really is Ruby). This is because Python has a def keyword too. So if it has seen 1000x def in Python and 100x def in Ruby then it may still say Python even though puts and end is Ruby-specific. You could fix this by keeping track of the words seen per language and dividing by that somewhere (or by feeding it equal amounts of code in each language).

    class Classifier   def initialize     @data = {}     @totals = Hash.new(1)   end    def words(code)     code.split(/[^a-z]/).reject{|w| w.empty?}   end    def train(code,lang)     @totals[lang] += 1     @data[lang] ||= Hash.new(1)     words(code).each {|w| @data[lang][w] += 1 }   end    def classify(code)     ws = words(code)     @data.keys.max_by do |lang|       # We really want to multiply here but I use logs        # to avoid floating point underflow       # (adding logs is equivalent to multiplication)       Math.log(@totals[lang]) +       ws.map{|w| Math.log(@data[lang][w])}.reduce(:+)     end   end end  # Example usage  c = Classifier.new  # Train from files c.train(open("code.rb").read, :ruby) c.train(open("code.py").read, :python) c.train(open("code.cs").read, :csharp)  # Test it on another file c.classify(open("code2.py").read) # => :python (hopefully) 
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

What would be the best way to detect newline return method in PHP. CR,
Does anyone know what would be the best way to detect which version of
In Java, what would the best way be to have a constantly listening port
What would be the best way to have a list of items with a
What would be the best way to fill a C# struct from a byte[]
What would be the best way to implement a simple crash / error reporting
What would be the best way to calculate someone's age in years, months, and
What would be the best way to expose certain functionality in a Dotnet VSTO
What would be the best way to port an existing Drupal site to a
What would be the best way to fill an array from user input? Would

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.