Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8682859
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T21:53:14+00:00 2026-06-12T21:53:14+00:00

I wanted to create a redis cache in python, and as any self respecting

  • 0

I wanted to create a redis cache in python, and as any self respecting scientist I made a bench mark to test the performance.

Interestingly, redis did not fare so well. Either Python is doing something magic (storing the file) or my version of redis is stupendously slow.

I don’t know if this is because of the way my code is structured, or what, but I was expecting redis to do better than it did.

To make a redis cache, I set my binary data (in this case, an HTML page) to a key derived from the filename with an expiration of 5 minutes.

In all cases, file handling is done with f.read() (this is ~3x faster than f.readlines(), and I need the binary blob).

Is there something I’m missing in my comparison, or is Redis really no match for a disk? Is Python caching the file somewhere, and reaccessing it every time? Why is this so much faster than access to redis?

I’m using redis 2.8, python 2.7, and redis-py, all on a 64 bit Ubuntu system.

I do not think Python is doing anything particularly magical, as I made a function that stored the file data in a python object and yielded it forever.

I have four function calls that I grouped:

Reading the file X times

A function that is called to see if redis object is still in memory, load it, or cache new file (single and multiple redis instances).

A function that creates a generator that yields the result from the redis database (with single and multi instances of redis).

and finally, storing the file in memory and yielding it forever.

import redis
import time

def load_file(fp, fpKey, r, expiry):
    with open(fp, "rb") as f:
        data = f.read()
    p = r.pipeline()
    p.set(fpKey, data)
    p.expire(fpKey, expiry)
    p.execute()
    return data

def cache_or_get_gen(fp, expiry=300, r=redis.Redis(db=5)):
    fpKey = "cached:"+fp

    while True:
        yield load_file(fp, fpKey, r, expiry)
        t = time.time()
        while time.time() - t - expiry < 0:
            yield r.get(fpKey)


def cache_or_get(fp, expiry=300, r=redis.Redis(db=5)):

    fpKey = "cached:"+fp

    if r.exists(fpKey):
        return r.get(fpKey)

    else:
        with open(fp, "rb") as f:
            data = f.read()
        p = r.pipeline()
        p.set(fpKey, data)
        p.expire(fpKey, expiry)
        p.execute()
        return data

def mem_cache(fp):
    with open(fp, "rb") as f:
        data = f.readlines()
    while True:
        yield data

def stressTest(fp, trials = 10000):

    # Read the file x number of times
    a = time.time()
    for x in range(trials):
        with open(fp, "rb") as f:
            data = f.read()
    b = time.time()
    readAvg = trials/(b-a)


    # Generator version

    # Read the file, cache it, read it with a new instance each time
    a = time.time()
    gen = cache_or_get_gen(fp)
    for x in range(trials):
        data = next(gen)
    b = time.time()
    cachedAvgGen = trials/(b-a)

    # Read file, cache it, pass in redis instance each time
    a = time.time()
    r = redis.Redis(db=6)
    gen = cache_or_get_gen(fp, r=r)
    for x in range(trials):
        data = next(gen)
    b = time.time()
    inCachedAvgGen = trials/(b-a)


    # Non generator version    

    # Read the file, cache it, read it with a new instance each time
    a = time.time()
    for x in range(trials):
        data = cache_or_get(fp)
    b = time.time()
    cachedAvg = trials/(b-a)

    # Read file, cache it, pass in redis instance each time
    a = time.time()
    r = redis.Redis(db=6)
    for x in range(trials):
        data = cache_or_get(fp, r=r)
    b = time.time()
    inCachedAvg = trials/(b-a)

    # Read file, cache it in python object
    a = time.time()
    for x in range(trials):
        data = mem_cache(fp)
    b = time.time()
    memCachedAvg = trials/(b-a)


    print "\n%s file reads: %.2f reads/second\n" %(trials, readAvg)
    print "Yielding from generators for data:"
    print "multi redis instance: %.2f reads/second (%.2f percent)" %(cachedAvgGen, (100*(cachedAvgGen-readAvg)/(readAvg)))
    print "single redis instance: %.2f reads/second (%.2f percent)" %(inCachedAvgGen, (100*(inCachedAvgGen-readAvg)/(readAvg)))
    print "Function calls to get data:"
    print "multi redis instance: %.2f reads/second (%.2f percent)" %(cachedAvg, (100*(cachedAvg-readAvg)/(readAvg)))
    print "single redis instance: %.2f reads/second (%.2f percent)" %(inCachedAvg, (100*(inCachedAvg-readAvg)/(readAvg)))
    print "python cached object: %.2f reads/second (%.2f percent)" %(memCachedAvg, (100*(memCachedAvg-readAvg)/(readAvg)))

if __name__ == "__main__":
    fileToRead = "templates/index.html"

    stressTest(fileToRead)

And now the results:

10000 file reads: 30971.94 reads/second

Yielding from generators for data:
multi redis instance: 8489.28 reads/second (-72.59 percent)
single redis instance: 8801.73 reads/second (-71.58 percent)
Function calls to get data:
multi redis instance: 5396.81 reads/second (-82.58 percent)
single redis instance: 5419.19 reads/second (-82.50 percent)
python cached object: 1522765.03 reads/second (4816.60 percent)

The results are interesting in that a) generators are faster than calling functions each time, b) redis is slower than reading from the disk, and c) reading from python objects is ridiculously fast.

Why would reading from a disk be so much faster than reading from an in-memory file from redis?

EDIT:
Some more information and tests.

I replaced the function to

data = r.get(fpKey)
if data:
    return r.get(fpKey)

The results do not differ much from

if r.exists(fpKey):
    data = r.get(fpKey)


Function calls to get data using r.exists as test
multi redis instance: 5320.51 reads/second (-82.34 percent)
single redis instance: 5308.33 reads/second (-82.38 percent)
python cached object: 1494123.68 reads/second (5348.17 percent)


Function calls to get data using if data as test
multi redis instance: 8540.91 reads/second (-71.25 percent)
single redis instance: 7888.24 reads/second (-73.45 percent)
python cached object: 1520226.17 reads/second (5132.01 percent)

Creating a new redis instance on each function call actually does not have a noticable affect on read speed, the variability from test to test is larger than the gain.

Sripathi Krishnan suggested implementing random file reads. This is where caching starts to really help, as we can see from these results.

Total number of files: 700

10000 file reads: 274.28 reads/second

Yielding from generators for data:
multi redis instance: 15393.30 reads/second (5512.32 percent)
single redis instance: 13228.62 reads/second (4723.09 percent)
Function calls to get data:
multi redis instance: 11213.54 reads/second (3988.40 percent)
single redis instance: 14420.15 reads/second (5157.52 percent)
python cached object: 607649.98 reads/second (221446.26 percent)

There is a HUGE amount of variability in file reads so the percent difference is not a good indicator of speedup.

Total number of files: 700

40000 file reads: 1168.23 reads/second

Yielding from generators for data:
multi redis instance: 14900.80 reads/second (1175.50 percent)
single redis instance: 14318.28 reads/second (1125.64 percent)
Function calls to get data:
multi redis instance: 13563.36 reads/second (1061.02 percent)
single redis instance: 13486.05 reads/second (1054.40 percent)
python cached object: 587785.35 reads/second (50214.25 percent)

I used random.choice(fileList) to randomly select a new file on each pass through the functions.

The full gist is here if anyone would like to try it out – https://gist.github.com/3885957

Edit edit:
Did not realize that I was calling one single file for the generators (although the performance of the function call and generator was very similar). Here is the result of different files from the generator as well.

Total number of files: 700
10000 file reads: 284.48 reads/second

Yielding from generators for data:
single redis instance: 11627.56 reads/second (3987.36 percent)

Function calls to get data:
single redis instance: 14615.83 reads/second (5037.81 percent)

python cached object: 580285.56 reads/second (203884.21 percent)
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T21:53:16+00:00Added an answer on June 12, 2026 at 9:53 pm

    This is an apples to oranges comparison.
    See http://redis.io/topics/benchmarks

    Redis is an efficient remote data store. Each time a command is executed on Redis, a message is sent to the Redis server, and if the client is synchronous, it blocks waiting for the reply. So beyond the cost of the command itself, you will pay for a network roundtrip or an IPC.

    On modern hardware, network roundtrips or IPCs are suprisingly expensive compared to other operations. This is due to several factors:

    • the raw latency of the medium (mainly for network)
    • the latency of the operating system scheduler (not guaranteed on Linux/Unix)
    • memory cache misses are expensive, and the probability of cache misses increases while the client and server processes are scheduled in/out.
    • on high-end boxes, NUMA side effects

    Now, let’s review the results.

    Comparing the implementation using generators and the one using function calls, they do not generate the same number of roundtrips to Redis. With the generator you simply have:

        while time.time() - t - expiry < 0:
            yield r.get(fpKey)
    

    So 1 roundtrip per iteration. With the function, you have:

    if r.exists(fpKey):
        return r.get(fpKey)
    

    So 2 roundtrips per iteration. No wonder the generator is faster.

    Of course you are supposed to reuse the same Redis connection for optimal performance. There is no point to run a benchmark which systematically connects/disconnects.

    Finally, regarding the performance difference between Redis calls and the file reads, you are simply comparing a local call to a remote one. File reads are cached by the OS filesystem, so they are fast memory transfer operations between the kernel and Python. There is no disk I/O involved here. With Redis, you have to pay for the cost of the roundtrips, so it is much slower.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I wanted to create my own Python exception class, like this: class MyException(BaseException): def
I wanted to create smth similar to this one Kansas county map where user
I wanted to create a new property on a table in my model.. Basically
I wanted to create a control with a TextBox and to bind TextBox.Text property
I wanted to create a very simple method that switches between views in a
I wanted to create a page with a simple button which runs away from
I wanted to create one js file which includes every js files to attach
I wanted to create some subdirectories inside my blob. But it is not working
I wanted to create jquery plugin & started off creating a sample jquery plugin...But
I wanted to create a default value for :order : class Comment < ActiveRecord::Base

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.