Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9023843
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T05:51:35+00:00 2026-06-16T05:51:35+00:00

I have a data file with multiple rows, and 8 columns – I want

  • 0

I have a data file with multiple rows, and 8 columns – I want to average column 8 of rows that have the same data on columns 1, 2, 5 – for example my file can look like this:

564645  7371810 0   21642   1530    1   2   30.8007
564645  7371810 0   21642   8250    1   2   0.0103
564645  7371810 0   21643   1530    1   2   19.3619

I want to average the last column of the first and third row since columns 1-2-5 are identical;

I want the output to look like this:

564645  7371810 0   21642   1530    1   2   25.0813
564645  7371810 0   21642   8250    1   2   0.0103

my files (text files) are pretty big (~10000 lines) and redundant data (based on the above rule) are not in regular intervals – so I want the code to find the redundant data, and average them…

in response to larsks comment – here are my 4 lines of code…

import os
import numpy as np
datadirectory = input('path to the data directory, ')
os.chdir( datadirectory)

##READ DATA FILE AND CREATE AN ARRAY
dataset = open(input('dataset_to_be_used, ')).readlines()
data = np.loadtxt(dataset)
##Sort the data based on common X, Y and frequency
datasort = np.lexsort((data[:,0],data[:,1],data[:,4]))
datasorted = data[datasort]
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T05:51:37+00:00Added an answer on June 16, 2026 at 5:51 am

    Ok, based on Hury’s input I updated the code –

    import os #needed system utils
    import numpy as np# for array data processing
    import pandas as pd #import the pandas module
    datadirectory = input('path to the data directory, ')
    working = os.environ.get("WORKING_DIRECTORY", datadirectory) 
    os.chdir( working)
    
     ##READ DATA FILE AND and convert it to string
    dataset = open(input('dataset_to_be_used, ')).readlines()
    data = ''.join(dataset) 
    
    df = pd.read_csv(data, sep="\\s+", header=None)
    sorted_data = df.groupby(["X.1","X.2","X.5"])["X.8"].mean().reset_index()
    tuple_data = [tuple(x) for x in sorted_data.values]
    datas = np.asarray(tuple_data)
    

    this worked with the test data, as posted by hury – but when I use my file after the df = … does not seem to work (I get an output like:

    Traceback (most recent call last):
    File “/media/DATA/arxeia/Programming/MyPys/data_refine_average.py”, line 31, in
    df = pd.read_csv(data, sep=”\s+”, header=None)
    File “/usr/lib64/python2.7/site-packages/pandas/io/parsers.py”, line 187, in read_csv
    return _read(TextParser, filepath_or_buffer, kwds)
    File “/usr/lib64/python2.7/site-packages/pandas/io/parsers.py”, line 141, in _read
    f = com._get_handle(filepath_or_buffer, ‘r’, encoding=encoding)
    File “/usr/lib64/python2.7/site-packages/pandas/core/common.py”, line 673, in _get_handle
    f = open(path, mode)
    IOError: [Errno 36] File name too long: ‘564645\t7371810\t0\t21642\t1530\t1\t2\t30.8007\r\n564645\t7371810\t0\t21642\t8250\t1\t2\t0.0103\r\n564645\t7371810\t0\t21642\t20370\t1\t2\t0.0042\r\n564645\t7371810\t0\t21642\t33030\t1\t2\t0.0026\r\n564645\t7371810\t0\t21642\t47970\t1\t2\t0.0018\r\n564645\t7371810\t0\t21642\t63090\t1\t2\t0.0013\r\n564645\t7371810\t0\t21642\t93090\t1\t2\t0.0009\r\n564645\t7371810\t0\t216……….

    any ideas?

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have text file having multiple rows of data.here I want to align all
I have a data file that looks like the following example. I've added '%'
Say I have a data file that I want to process; I want to
I have a data file that looks like this: xyz123 2.000 -0.3974 0.0 hij123
I have a data file with the # sign as delimiter, that I would
I have a data file which is comprised of rows of data, newline separated.
I have a datagridview that displays data from an XML file. The DGV is
I have a CSV file with 74 columns and about 60K rows. The contents
I have a tab-delimited DAT file that I want to read into R. When
I have a data file that looks like this: x ys ---------------------> 1 20

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.