Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6742575
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T11:51:36+00:00 2026-05-26T11:51:36+00:00

sorry for the lengthy post, I have a question about common crpytographic hashing algorithms,

  • 0

sorry for the lengthy post, I have a question about common crpytographic hashing algorithms, such as the SHA family, MD5, etc.

In general, such a hash algorithm cannot be injective, since the actual digest produced has usually a fixed length (e.g., 160 bits under SHA-1, etc.) whereas the space of possible messages to be digested is virtually infinite.

However, if we generate a digest of a message which is at most as long as the digest generated, what are the properties of the commonly used hashing algorithms? Are they likely to be injective on this limited message space? Are there algorithms, which are known to produce collisions even on messages whose bit lengths is shorter than the bit length of the digest produced?

I am actually looking for an algorithm, which has this property, i.e., which, at least in principle, may generate colliding hashes even for short input messages.

Background: We have a browser plug-in, which for every web-site visited, makes a server request asking, whether the web-site belongs to one of our known partners. But of course, we don’t want to spy on our users. So, in order to make it hard to generate some kind of surf history, we do not actually send the URL visited, but a hash digest (currently, SHA-1) of some cleaned up version. On the server side, we have a table of hashes of well-known URIs, which is matched against the received hash. We can live with a certain amount of uncertainty here, since we consider not being able to track our users a feature, not a bug.

For obvious reasons, this scheme is pretty fuzzy and admits false positives as well as URIs not matched which should have.

So right now, we are considering changing the fingerprint generation to something, which has more structure, for example, instead of hashing the full (cleaned up) URI, we might instead:

  1. split the host name into components at “.” and hash those individually
  2. check the path into components at “.” and hash those individually

Join the resulting hashes into a fingerprint value. Example: hashing “www.apple.com/de/shop” using this scheme (and using Adler 32 as hash) might yield “46989670.104268307.41353536/19857610/73204162”.

However, as such a fingerprint has a lot of structure (in particular, when compared to a plain SHA-1 digest), we might accidentally make it pretty easy again to compute actual URI visited by a user (for example, by using a pre-computed table of hash values for “common” compont values, such as “www”).

So right now, I am looking for a hash/digest algorithm, which has a high rate of collisions (Adler 32 is seriously considered) even on short messages, so that the probability of a given component hash being unique is low. We hope, that the additional structure we impose provides us with enough additional information to as to improve the matching behaviour (i.e., lower the rate of false positives/false negatives).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T11:51:36+00:00Added an answer on May 26, 2026 at 11:51 am

    I do not believe hashes are guaranteed to be injective for messages the same size the digest. If they were, they would be bijective, which would be missing the point of a hash. This suggests that they are not injective for messages smaller than the digest either.

    If you want to encourage collisions, i suggest you use any hash function you like, then throw away bits until it collides enough.

    For example, throwing away 159 bits of a SHA-1 hash will give you a pretty high collision rate. You might not want to throw that many away.

    However, what you are trying to achieve seems inherently dubious. You want to be able to tell that the URL is one of your ones, but not which one it is. That means you want your URLs to collide with each other, but not with URLs that are not yours. A hash function won’t do that for you. Rather, because collisions will be random, since there are many more URLs which are not yours than which are (i assume!), any given level of collision will lead to dramatically more confusion over whether a URL is one of yours or not than over which of yours it is.

    Instead, how about sending the list of URLs to the plugin at startup, and then having it just send back a single bit indicating if it’s visiting a URL in the list? If you don’t want to send the URLs explicitly, send hashes (without trying to maximise collisions). If you want to save space, send a Bloom filter.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

First, sorry for the lengthy post. Basically, my question is this: I'm trying to
First I am sorry about the length of this post, but I wanted to
Sorry for a pretty specific question. I have a table (see bottom), and when
Sorry, a beginner's question: I have a very simple function in my Django application
Sorry for the lengthy post, I just want to illustrate my situation as best
I am sorry about how simple this question might be, but I cannot find
First off, sorry about the long post. Figured it's better to give context to
Sorry for the length, wanted to give a complete description! I have a need
Sorry for a long question and not a very descriptive title, but my problem
sorry for the lame question but with HTTP handler it is easy to do

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.