Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8718977
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T06:40:39+00:00 2026-06-13T06:40:39+00:00

I am using itertools.groupby to parse a short tab-delimited textfile. the text file has

  • 0

I am using itertools.groupby to parse a short tab-delimited textfile. the text file has several columns and all I want to do is group all the entries that have a particular value x in a particular column. The code below does this for a column called name2, looking for the value in variable x. I tried to do this using csv.DictReader and itertools.groupby. In the table, there are 8 rows that match this criteria so 8 entries should be returned. Instead groupby returns two sets of entries, one with a single entry and another with 7, which seems like the wrong behavior. I do the matching manually below on the same data and get the right result:

import itertools, operator, csv
col_name = "name2"
x = "ENSMUSG00000002459"
print "looking for entries with value %s in column %s" %(x, col_name)
print "groupby gets it wrong: "
data = csv.DictReader(open(f), delimiter="\t", fieldnames=fieldnames)
for name, entries in itertools.groupby(data, key=operator.itemgetter(col_name)):
    if name == "ENSMUSG00000002459":
        wrong_result = [e for e in entries]
        print "wrong result has %d entries" %(len(wrong_result))
print "manually grouping entries is correct: "
data = csv.DictReader(open(f), delimiter="\t", fieldnames=fieldnames)
correct_result = []
for row in data:
    if row[col_name] == "ENSMUSG00000002459":
        correct_result.append(row)
print "correct result has %d entries" %(len(correct_result))

The output I get is:

looking for entries with value ENSMUSG00000002459 in column name2
groupby gets it wrong: 
wrong result has 7 entries
wrong result has 1 entries
manually grouping entries is correct: 
correct result has 8 entries

what is going on here? If groupby is really grouping, it seems like I should only get one set of entries per x, but instead it returns two. I cannot figure this out. EDIT: Ah got it it should be sorted.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T06:40:40+00:00Added an answer on June 13, 2026 at 6:40 am

    You’re going to want to change your code to force the data to be in key order…

    data = csv.DictReader(open(f), delimiter="\t", fieldnames=fieldnames)
    sorted_data = sorted(data, key=operator.itemgetter(col_name))
    for name, entries in itertools.groupby(data, key=operator.itemgetter(col_name)):
        pass # whatever
    

    The main use though, is when the datasets are large, and the data is already in key order, so when you have to sort anyway, then using a defaultdict is more efficient

    from collections import defaultdict
    name_entries = defaultdict(list)
    for row in data:
        name_entries[row[col_name]].append(row)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Using ASIHTTPRequest, I downloaded a zip file containing a folder with several audio files.
Basically I have converted a tab delimited txt file into a list containing a
I'm having an odd problem using itertools.groupby to group the elements of a queryset.
I have a text file like this: 11 2 3 4 11 111 Using
Using a populated Table Type as the source for a TSQL-Merge. I want to
Using the Redis info command, I am able to get all the stats of
Using pythons itertools , I'd like to create an iterator over the outer product
In Python, it is quite simple to produce all permutations of a list using
Hello I am using something the Grouper function from python's itertools to cut large
I am using this code to generate random text : from collections import defaultdict,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.