Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4579238
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 21, 20262026-05-21T20:36:50+00:00 2026-05-21T20:36:50+00:00

Are there any algorithms that you can continue hashing from a known hash digest?

  • 0

Are there any algorithms that you can continue hashing from a known hash digest? For example, the client upload a chunk of file to ServerA, I can get a md5 sum of the uploaded content, then the client upload the rest of the file chunk to ServerB, can I transfer the state of md5 internals to ServerB and finish the hashing?

There was a cool black magic hack based on md5 I found years ago at comp.lang.python, but it’s using ctypes for a specific version of md5.so or _md5.dll, so it’s not quite portable code for different python interpreter versions or other programming languages. Besides, the md5 module is deprecated in python standard library since 2.5 so I need to find a more general solution.

What’s more, can the state of the hashing be stored in the hex digest itself? (So I can continue hashing a stream of data with an existing hash digest, not a dirty internal hack.)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-21T20:36:50+00:00Added an answer on May 21, 2026 at 8:36 pm

    Not from the known digest, but from the known state. You can use a pure python MD5 implementation and save its state. Here is an example using _md5.py from from PyPy:

    import _md5
    
    def md5_getstate(md):
        return (md.A, md.B, md.C, md.D, md.count + [], md.input + [], md.length)
    
    def md5_continue(state):
        md = _md5.new()
        (md.A, md.B, md.C, md.D, md.count, md.input, md.length) = state
        return md
    
    m1 = _md5.new()
    m1.update("hello, ")
    state = md5_getstate(m1)
    m2 = md5_continue(state)
    m2.update("world!")
    print m2.hexdigest()
    
    m = _md5.new()
    m.update("hello, world!")
    print m.hexdigest()
    

    As e.dan noted, you can also use almost any checksuming algorithm (CRC, Adler, Fletcher), but they do not protect you well from the intentional data modification, only from the random errors.

    EDIT: of course, you can also re-implement the serialization method using ctypes from the thread you referenced in a more portable way (without magic constants). I believe this should be version/architecture independent (tested on python 2.4-2.7, both i386 and x86_64):

    # based on idea from http://groups.google.com/group/comp.lang.python/msg/b1c5bb87a3ff5e34
    
    try:
        import _md5 as md5
    except ImportError:
        # python 2.4
        import md5
    import ctypes
    
    def md5_getstate(md):
        if type(md) is not md5.MD5Type:
            raise TypeError, 'not an MD5Type instance'
        return ctypes.string_at(id(md) + object.__basicsize__,
                                md5.MD5Type.__basicsize__ - object.__basicsize__)
    
    def md5_continue(state):
        md = md5.new()
        assert len(state) == md5.MD5Type.__basicsize__ - object.__basicsize__, \
               'invalid state'    
        ctypes.memmove(id(md) + object.__basicsize__,
                       ctypes.c_char_p(state),
                       len(state))
        return md
    
    m1 = md5.new()
    m1.update("hello, ")
    state = md5_getstate(m1)
    m2 = md5_continue(state)
    m2.update("world!")
    print m2.hexdigest()
    
    m = md5.new()
    m.update("hello, world!")
    print m.hexdigest()
    

    It is not Python 3 compatible, since it does not have an _md5/md5 module.

    Unfortunately hashlib’s openssl_md5 implementation is not suitable for such hacks, since OpenSSL EVP API does not provide any calls/methods to reliably serialize EVP_MD_CTX objects.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Are there any algorithms that can help with hierarchical clustering? Google's map-reduce has only
Are there any algorithms or tools that can increase the resolution of an image
I am wondering if there are any well-known algorithms that I should be aware
Are there any known hash algorithms which input a vector of int's and output
Are there any good references for synchronisation algorithms? I'm interested in algorithms that synchronize
Can someone post any simple explanation of cache aware algorithms? There are lot of
Are there any named/famous/particularly good algorithms for organizing email messages into threads? I'm looking
Is there any free or commercial component written in .NET (no COM interop) that
There are lots of Linq algorithms that only need to do one pass through
Is there any simple algorithm to determine the likeliness of 2 names representing the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.