I have a file which I’m trying to extract information from, the file has

Question

0

Asked: June 6, 20262026-06-06T02:52:37+00:00 2026-06-06T02:52:37+00:00

I have a file which I’m trying to extract information from, the file has

0

I have a file which I’m trying to extract information from, the file has the information in it and is in a neat line by line format, the information is separated by commas.

I want to put it in a list, or do whatever I can to extract information from a specific index. The file is huge with over 1000000000 lines, I have to extract the same index in every line in order to get the same piece of information. These are HASHES I want from the files so I was wondering how I’d find all the occurrences of hashes based on length.

import os

os.chdir('C:\HashFiles')

f = open('Part1.txt','r')

file_contents=f.readlines()

def linesA():

for line in file_contents:
    lista = line.split(',')

print linesA()

this is all I have so far and this just puts everything in a list which I can index from, but I want to output the data from those indexes to another file and I am unable to because of the for statement, how can I get around this?

Wow you guys are great, now I have a problem because in the file where this info is stored it starts with information about the sponsor who provided the information, how do I bypass those lines to start from another line since the lines I need start at about 100 lines down the file, to help me because at the moment I get an index error and am unable to figure out how to set a condition to counter it. I tried this condition but didnt work : if line[:] != 15: continue

Most recent code to work with:

import csv

with open('c:/HashFiles/search_engine_primary.sql') as inf, open('c:/HashFiles/hashes.txt','w') as outf:
for i in xrange(47):
    inf.next()       # skip a line

for line in inf:
    data = line.split(',')
    if str(line[0]) == 'GO':
        continue
    hash = data[15]
    outf.write(hash + '\n')

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T02:52:40+00:00

You can process the file line-by-line, like so:

with open('c:/HashFiles/Part1.txt') as inf, open('c:/HashFiles/hashes.txt','w') as outf:
    for line in inf:
        data = line.split(',')
        hash = data[4]
        outf.write(hash + '\n')

If you want to separate the hashes by length, maybe something like:

class HashStorage(object):
    def __init__(self, fname_fmt):
        self.fname_fmt = fname_fmt
        self.hashfile = {}

    def thefile(self, hash):
        hashlen = len(hash)
        try:
            return self.hashfile[hashlen]
        except KeyError:
            newfile = open(self.fname_fmt.format(hashlen), 'w')
            self.hashfile[hashlen] = newfile
            return newfile

    def write(self, hash):
        self.thefile(hash).write(hash + '\n')

    def __del__(self):
        for f in self.hashfiles.itervalues():
            f.close()
        del self.hashfiles

store = HashStorage('c:/HashFiles/hashes{}.txt')

with open('c:/HashFiles/Part1.txt') as inf:
    for line in inf:
        data = line.split(',')
        hash = data[4]
        store.write(hash)

Edit:: is there any way to identify sponsor lines – for example, they start with “#”? You could filter like

with open('c:/HashFiles/Part1.txt') as inf, open('c:/HashFiles/hashes.txt','w') as outf:
    for line in inf:
        if not line.startswith('#'):
            data = line.split(',')
            hash = data[4]
            outf.write(hash + '\n')

otherwise, if you have to skip N lines – this is nasty, because what if the number changes? – you can instead

with open('c:/HashFiles/Part1.txt') as inf, open('c:/HashFiles/hashes.txt','w') as outf:
    for i in xrange(N):
        inf.next()       # skip a line

    for line in inf:
        data = line.split(',')
        hash = data[4]
        outf.write(hash + '\n')

Edit2:

with open('c:/HashFiles/search_engine_primary.sql') as inf, open('c:/HashFiles/hashes.txt','w') as outf:
    for i in xrange(47):
        inf.next()       # skip a line

    for line in inf:
        data = line.split(',')
        if len(data) > 15:      # skip any line without enough data items
            hash = data[15]
            outf.write(hash + '\n')

Does this still give you errors??

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a file which I’m trying to extract information from, the file has

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply