Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7056315
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T03:51:10+00:00 2026-05-28T03:51:10+00:00

Note: Lots of Possible duplicates, but nothing seems to be solving my problem. I

  • 0

Note: Lots of Possible duplicates, but nothing seems to be solving my problem.

I am working on a Plagiarism detection based on MOSS.

After successfully implementing a filter which strips out all the necessary details(comments,punctuations etc) I hash the content using a Rolling Hash Implementation(Rabin Karp)

However the Hashes which match in two text-files of source code, have very different underlying text(No plagiarism and yet same hashes)

The Algorithm I implemented(Ruby) –>
(Partial Snippet)

 #Preprocessing from RobinKarp Algorithm
  for c in 0...k do
    text_hash=(radix*text_hash+text_to_process[c].ord)%q
  end

  #Main loop
  for c in 0...loop do   
        text_hash=((radix*text_hash-(text_to_process[c].ord)*highorder)+(text_hash[c+k].ord))%q    

Is there an issue with my implementation? Or the Parameters I specify can be at fault?

I take radix=34
( I am not sure if it is the right value, I am assuming the stripped out text will only contain alphabets+some special charcters like ‘+’,’-‘,’*’,’/’ so a rough estimate of total 34 characters)

I am taking q(prime) to be 101

Is this a collision issue I am dealing with? Any pointers as to how to tackle the problem?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T03:51:10+00:00Added an answer on May 28, 2026 at 3:51 am

    I note that with q = 101, there are only 101 possible hash values – 0, 1, 2…100. Have you tried increasing q? Another approach would be to look and see if the hash values look like they are randomly chosen values within the possible values of 0,1..q-1.

    You should of course also test your program on cases where there are repeated strings for it to find – a failure there could be another symptom of any problem that is also causing collisions, and it would be easier to find and debug.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

NOTE: I have solved the majority of this problem but have run into a
The problem is: Manipulating external state in a test file reduces the possible parallelism
I'm working on a public project and one of the things we are after
(Note: I'm using jQuery below, but the question is really a general JavaScript one.)
NOTE: mischa's splat on GitHub has lots of cool interactive examples of * in
NOTE: Originally had this listed as a memory leak. After looking into this deeper,
My problem (note the yellow tiles and how they overflow): (link is https://i.stack.imgur.com/E6yDM.jpg )
Note: this was inspired by WebBrowser Event Properties? Why am I able to access
NOTE: I added my new solution at the UPDATE answer below. I try to
Note: Although my particular context is Objective-C, my question actually transcends programming language choice.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.