Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8357575
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T10:31:23+00:00 2026-06-09T10:31:23+00:00

I need to loop over a dataset which is sorted, grouping all the results

  • 0

I need to loop over a dataset which is sorted, grouping all the results by that sorted attribute into chunks which all have the same value for that attribute. Then I run some operations on that chunk of results.

Sorry that’s a bit confusing, examples are probably a better way of describing what I’m doing:

I’ve got a dataset that’s structured like this except the “data” strings are actually objects and contain plenty of other data.

[ [1, "data1"], [1, "data2"], [2, "moredata"], [2, "stuff"], 
  [2, "things"], [2, "foo"], [3, "bar"], [4, "baz"] ]

What I want to happen is for that data to get grouped into 4 different function calls:

process_data(1, ["data1", "data2"])
process_data(2, ["moredata", "stuff", "things", "foo"])
process_data(3, ["bar"])
process_data(4, ["baz"])

What I end up with is a construct that looks something like this:

last_id = None
grouped_data = []

for row in dataset:
    id = row[0]
    data = row[1]

    if last_id != id:
         # we're starting a new group, process the last group
         processs_data(last_id, grouped_data)
         grouped_data = []
    last_id = id
    grouped_data.append(data)

if grouped_data:
    # we're done the loop and we still have a last group of data to process
    # if there was no data in the dataset, grouped_data will still be empty
    # so we won't accidentally process any empty data.
    process_data(last_id, grouped_data)

It works, but it seems clumsy. Especially the need to track everything with the last_id variable as well as the second call to process_data after the loop. I’d just like to know if anyone can offer any suggestions for a more elegant/clever solution.

My language of choice is Python, but a general solution is fine.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T10:31:25+00:00Added an answer on June 9, 2026 at 10:31 am

    itertools.groupby is just what you want:

    >>> data = [ [1, "data1"], [1, "data2"], [2, "moredata"], [2, "stuff"],
    ...   [2, "things"], [2, "foo"], [3, "bar"], [4, "baz"] ]
    >>>
    >>> from itertools import groupby
    >>> from operator import itemgetter
    >>>
    >>> def process_data(key, keydata):
    ...     print key, ':', keydata
    ...
    >>> for key,keydata in groupby(data, key=itemgetter(0)):
    ...   process_data(key, [d[1] for d in keydata])
    ...
    1 : ['data1', 'data2']
    2 : ['moredata', 'stuff', 'things', 'foo']
    3 : ['bar']
    4 : ['baz']
    

    Pass groupby a sorted list, and a key function on what to group by within each item in the list. You get back a generator of (key,itemgenerator) pairs, as shown being passed to my made-up process_data method.

    [Added 8 Aug 2023]
    I have more details in a pair of blog posts on groupby, starting with this one.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a list of member IDs that I need to loop over and
I have a nested JSON object that I need to loop through, and the
I have a number rather large, complex xml documents that I need to loop
I need to loop over my array and set all the vars to false
I need to get all those files under D:\dic and loop over them to
I am writing a PHP function that will need to loop over an array
I need to have stored procedure where I can run multiple cursors. Loop over
How do I loop over an Array which has 5 elements. I have 5
I need to have very high-performance loop going over large datasets. I need to
I'm writing a Perl script in which I need to loop over each character

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.