Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 520581
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T08:09:36+00:00 2026-05-13T08:09:36+00:00

I have data in a CSV file. One of the column lists a persons

  • 0

I have data in a CSV file. One of the column lists a persons name and all the rows that follow in that column provide some descriptive attributes about that person until the next persons name shows up. I can tell when the row has a name or an attribute by the LTYPE column, N in that column indicates that in that row the NAME value is actually a name, an A in that column indicates that the data in the NAME column is an attribute. The attributes are coded and I have 600K lines of the data. Here is a sample. The data is grouped and the befinning of each grouping is indicated by RID resetting to 1.

{'LTYPE': 'N', 'RID': '1', 'NAME': 'Jason Smith'}
{'LTYPE': 'A', 'RID': '2', 'NAME': 'DA'}
{'LTYPE': 'A', 'RID': '3', 'NAME': 'B'}
{'LTYPE': 'N', 'RID': '4', 'NAME': 'John Smith'}
{'LTYPE': 'A', 'RID': '5', 'NAME': 'BC'}
{'LTYPE': 'A', 'RID': '6', 'NAME': 'CB'}
{'LTYPE': 'A', 'RID': '7', 'NAME': 'DB'}
{'LTYPE': 'A', 'RID': '8', 'NAME': 'DA'}
{'LTYPE': 'N', 'RID': '9', 'NAME': 'Robert Smith'}
{'LTYPE': 'A', 'RID': '10', 'NAME': 'BC'}
{'LTYPE': 'A', 'RID': '11', 'NAME': 'DB'}
{'LTYPE': 'A', 'RID': '12', 'NAME': 'CB'}
{'LTYPE': 'A', 'RID': '13', 'NAME': 'RB'}
{'LTYPE': 'A', 'RID': '14', 'NAME': 'VC'}
{'LTYPE': 'N', 'RID': '15', 'NAME': 'Harvey Smith'}
{'LTYPE': 'A', 'RID': '16', 'NAME': 'SA'}
{'LTYPE': 'A', 'RID': '17', 'NAME': 'AS'}
{'LTYPE': 'N', 'RID': '18', 'NAME': 'Lukas Smith'}
{'LTYPE': 'A', 'RID': '19', 'NAME': 'BC'}
{'LTYPE': 'A', 'RID': '20', 'NAME': 'AS'}

I want to create the following:

{'PERSON_ATTRIBUTES': 'DA B ', 'LTYPE': 'N', 'RID': '1', 'PERSON_NAME': 'Jason Smith', 'NAME': 'Jason Smith'}
{'PERSON_ATTRIBUTES': 'DA B ', 'LTYPE': 'A', 'RID': '2', 'PERSON_NAME': 'Jason Smith', 'NAME': 'DA'}
{'PERSON_ATTRIBUTES': 'DA B ', 'LTYPE': 'A', 'RID': '3', 'PERSON_NAME': 'Jason Smith', 'NAME': 'B'}
{'PERSON_ATTRIBUTES': 'BC CB DB DA ', 'LTYPE': 'N', 'RID': '4', 'PERSON_NAME': 'John Smith', 'NAME': 'John Smith'}
{'PERSON_ATTRIBUTES': 'BC CB DB DA ', 'LTYPE': 'A', 'RID': '5', 'PERSON_NAME': 'John Smith', 'NAME': 'BC'}
{'PERSON_ATTRIBUTES': 'BC CB DB DA ', 'LTYPE': 'A', 'RID': '6', 'PERSON_NAME': 'John Smith', 'NAME': 'CB'}
{'PERSON_ATTRIBUTES': 'BC CB DB DA ', 'LTYPE': 'A', 'RID': '7', 'PERSON_NAME': 'John Smith', 'NAME': 'DB'}
{'PERSON_ATTRIBUTES': 'BC CB DB DA ', 'LTYPE': 'A', 'RID': '8', 'PERSON_NAME': 'John Smith', 'NAME': 'DA'}
{'PERSON_ATTRIBUTES': 'BC DB CB RB VC ', 'LTYPE': 'N', 'RID': '9', 'PERSON_NAME': 'Robert Smith', 'NAME': 'Robert Smith'}
{'PERSON_ATTRIBUTES': 'BC DB CB RB VC ', 'LTYPE': 'A', 'RID': '10', 'PERSON_NAME': 'Robert Smith', 'NAME': 'BC'}
{'PERSON_ATTRIBUTES': 'BC DB CB RB VC ', 'LTYPE': 'A', 'RID': '11', 'PERSON_NAME': 'Robert Smith', 'NAME': 'DB'}
{'PERSON_ATTRIBUTES': 'BC DB CB RB VC ', 'LTYPE': 'A', 'RID': '12', 'PERSON_NAME': 'Robert Smith', 'NAME': 'CB'}
{'PERSON_ATTRIBUTES': 'BC DB CB RB VC ', 'LTYPE': 'A', 'RID': '13', 'PERSON_NAME': 'Robert Smith', 'NAME': 'RB'}
{'PERSON_ATTRIBUTES': 'BC DB CB RB VC ', 'LTYPE': 'A', 'RID': '14', 'PERSON_NAME': 'Robert Smith', 'NAME': 'VC'}
{'PERSON_ATTRIBUTES': 'SA AS ', 'LTYPE': 'N', 'RID': '15', 'PERSON_NAME': 'Harvey Smith', 'NAME': 'Harvey Smith'}
{'PERSON_ATTRIBUTES': 'SA AS ', 'LTYPE': 'A', 'RID': '16', 'PERSON_NAME': 'Harvey Smith', 'NAME': 'SA'}
{'PERSON_ATTRIBUTES': 'SA AS ', 'LTYPE': 'A', 'RID': '17', 'PERSON_NAME': 'Harvey Smith', 'NAME': 'AS'}
{'PERSON_ATTRIBUTES': 'BC AS ', 'LTYPE': 'N', 'RID': '18', 'PERSON_NAME': 'Lukas Smith', 'NAME': 'Lukas Smith'}
{'PERSON_ATTRIBUTES': 'BC AS ', 'LTYPE': 'A', 'RID': '19', 'PERSON_NAME': 'Lukas Smith', 'NAME': 'BC'}
{'PERSON_ATTRIBUTES': 'BC AS ', 'LTYPE': 'A', 'RID': '20', 'PERSON_NAME': 'Lukas Smith', 'NAME': 'AS'}

I started off by getting the index positions of LTYPE

nameIndex=[]
attributeIndex=[]
for line in thedata:
    if line['LTYPE']=='N':
        nameIndex.append(int(line["RID"])-1)
    if line['LTYPE']=='A':
        attributeIndex.append(int(line["RID"])-1)

So I have the list index of each of the rows classified as a name in one list and the list index of each of the rows classified as an attribute in another list. It is then easy to attach the name to each observation as follows

for counter, row in enumerate(thedata):
    if counter in nameIndex:
        row['PERSON_NAME']=row['NAME']
        person_NAME=row['NAME']
    if counter not in nameIndex:
        row['PERSON_NAME']=person_NAME

I am struggling to determine and assign the list of attributes to each person.

First I need to combine the attributes that belong together so I did this:

 newAttribute=[]
 for counter, row in enumerate(thedata):
     if counter in attributeIndex:
         tempAttribute=tempAttribute+' '+row['NAME']

     if counter not in attributeIndex:
         if counter==0:
             tempAttribute=""
             pass
         if counter!=0:
             newAttribute.append(tempAttribute.lstrip())
             tempAttribute=""

one problem with my approach is that I still have to add the last group to the newAttribute list since the loop finishes before it is added. So to get the list of grouped attributes I have to run

newAttribute.append(tempAttribute)

But even then I can’t seem to find a clean way to add the attributes I have to do it in two steps. First, I create a dictionary with the nameIndex positions as the key and the attributes as the values

tempDict={}
for each in range(len(nameIndex)):
    tempdict[nameIndex[each]]=newAttribute[each]

I cycle through the list once putting in the attribute on the name line

for counter,row in enumerate(thedata):
    if counter in tempDict:
        thedata[counter]['TA']=tempDict[counter]

and then I go through it again checking if the key ‘TA’ exists and using the existence to set the PERSON_ATTRIBUTE key

for each in thedata:
    if each.has_key('TA'):
        each['PERSON_ATTRIBUTES']=each['TA']
        holdAttribute=each['TA']
    else:
        each['PERSON_ATTRIBUTES']=holdAttribute

There has got to be a cleaner way to think about this and so I was wondering if anyone would like to point me in the direction of some functions that I could read about that would let me clean up this code. I know I still have to drop the ‘TA’ key but I figured that I have taken enough space.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T08:09:37+00:00Added an answer on May 13, 2026 at 8:09 am

    I suggest a different, index-free approach based on itertools.groupby:

    import itertools, operator
    
    data = [
    {'LTYPE': 'N', 'RID': '1', 'NAME': 'Jason Smith'},
    {'LTYPE': 'A', 'RID': '2', 'NAME': 'DA'},
    {'LTYPE': 'A', 'RID': '3', 'NAME': 'B'},
    {'LTYPE': 'N', 'RID': '4', 'NAME': 'John Smith'},
    {'LTYPE': 'A', 'RID': '5', 'NAME': 'BC'},
    {'LTYPE': 'A', 'RID': '6', 'NAME': 'CB'},
    {'LTYPE': 'A', 'RID': '7', 'NAME': 'DB'},
    {'LTYPE': 'A', 'RID': '8', 'NAME': 'DA'},
    ]
    
    for k, g in itertools.groupby(data, operator.itemgetter('LTYPE')):
      if k=='N':
        person_name_record = next(g)
      else:
        attribute_records = list(g)
        person_attributes = ' '.join(r['NAME'] for r in attribute_records)
        addfields = dict(PERSON_ATTRIBUTES=person_attributes,
                         PERSON_NAME=person_name_record['NAME'])
        person_name_record.update(addfields)
        for r in attribute_records: r.update(addfields)
    
    for r in data: print r
    

    This prints your desired results for the first couple people (and each person is treated separately, so it should work just the same for a few hundred thousand people;-).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a CSV data file with rows that may have lots of columns
I have an application that reads a CSV file with piles of data rows.
I have a CSV of file of data that I can load in R
I have a large collection of data in an excel file (and csv files).
I have CSV data loaded into a multidimensional array. In this way each row
I have a datagrid which is populated with CSV data when the user drag/drops
I have a mysql database set as utf-8, and csv data set as utf-8,
I have a VB application which extracts data and creates 3 CSV files (a.csv,
I have data in a MySQL database. I am sending the user a URL
I have data coming from the database in the form of a DataSet .

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.