When retrieving rows from a Google AppEngine’s datastore, we would like to implement retrieval

Question

0

Asked: June 15, 20262026-06-15T08:47:14+00:00 2026-06-15T08:47:14+00:00

When retrieving rows from a Google AppEngine’s datastore, we would like to implement retrieval

0

When retrieving rows from a Google AppEngine’s datastore, we would like to implement retrieval of all data of an entity type, with several, simultaneous, processes. The processes run asynchronously in back-end Python servers. The point would be to have each process retrieve a “chunk” of the whole data set, so that we can nearly-evenly distribute the load across all of them, like this:

    |_____|_____|_____|_____|_____|_____|_____|.....|_____|_____|
       p1   p2    p3     p4    p5    p6    p7         pk-1   pk

Where each pn is a process and all the entities are retrieved.

I think the way to enable this is to somehow say something like this (in Python):

chunk_size = num_entities / num_chunks
base_query = 'select * from entity offset %d limit %d'

for chunk in range(0, to = num_entities, step_by = chunk_size):

    cursor = get_cursor(base_query, offset = chunk, limit = chunk_size)

    while is_ready(cursor):
        do_task_with_data(cursor.next())

Where get_cursor would get a cursor from AppEngine which scrolls from results starting from the given offset. I am only including the limit argument here in case it helps, but it could also be enforced inside the while loop, for example. In any case, we would hopefully like to get a situation where queries are not O(n) with limit and offset (i.e. the last queries have to scroll through nearly all the data before fetching data).

Another option might be distributing entities based on some random value (which we do have), using a range of 0 -> 1 divided into chunk_num chunks.

It might even be possible to somehow get a data dump out of App Engine and then work on that (although due to size it would not be our first choice).

What would be a good way to achieve this? Is there a better way to solve this problem? Any ideas on this would be really appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T08:47:15+00:00

Editorial Team

2026-06-15T08:47:15+00:00Added an answer on June 15, 2026 at 8:47 am

I think you’re pretty much describing what the mapreduce framework does.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When retrieving rows from a Google AppEngine’s datastore, we would like to implement retrieval

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply