I have a file that is about 100Mb that looks like this: #meta data

Question

0

Asked: May 26, 20262026-05-26T18:54:42+00:00 2026-05-26T18:54:42+00:00

I have a file that is about 100Mb that looks like this: #meta data

0

I have a file that is about 100Mb that looks like this:

#meta data 1    
skadjflaskdjfasljdfalskdjfl
sdkfjhasdlkgjhsdlkjghlaskdj
asdhfk
#meta data 2
jflaksdjflaksjdflkjasdlfjas
ldaksjflkdsajlkdfj
#meta data 3
alsdkjflasdjkfglalaskdjf

This file contains one row of meta data that corresponds to several, variable length data containing only alpha-numeric characters. What is the best way to read this data into a simple list like this:

data = [[#meta data 1, skadjflaskdjfasljdfalskdjflsdkfjhasdlkgjhsdlkjghlaskdjasdhfk],
       [#meta data 2, jflaksdjflaksjdflkjasdlfjasldaksjflkdsajlkdfj],
       [#meta data 3, alsdkjflasdjkfglalaskdjf]]

My initial idea was to use the read() method to read the whole file into memory and then use regular expressions to parse the data into the desired format. Is there a better more pythonic way? All metadata lines start with an octothorpe and all data lines are all alpha-numeric. Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T18:54:42+00:00

itertools.groupby provides an easy way to collect lines into groups:

import itertools

data=[]
with open('data.txt','r') as f:
    for key,group in itertools.groupby(f,lambda line: line.startswith('#meta')):
        if key:
            meta=next(group).strip()
        else:
            lines=''.join(group).strip()
            data.append((meta,lines))
print(data)

yields

[('#meta data 1', 'skadjflaskdjfasljdfalskdjfl\nsdkfjhasdlkgjhsdlkjghlaskdj\nasdhfk'), ('#meta data 2', 'jflaksdjflaksjdflkjasdlfjas\nldaksjflkdsajlkdfj'), ('#meta data 3', 'alsdkjflasdjkfglalaskdjf')]

The expression

itertools.groupby(f,lambda line: line.startswith('#meta'))

returns an iterator. It loops through the lines in f, and calls the lambda function on each line. When it encounters a line that begins with #meta, that function returns True, otherwise False.

itertools.groupby collects all the contiguous lines that return the same value.

So the line that begins with #meta is placed in its own group, then all the subsequent lines not beginning with #meta are placed in the next group, and so on.

The key is the return value from the lambda function. In this case, it will be either True or False.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a file that is about 100Mb that looks like this: #meta data

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply