I have a file which I’m trying to extract information from, the file has the information in it and is in a neat line by line format, the information is separated by commas.
I want to put it in a list, or do whatever I can to extract information from a specific index. The file is huge with over 1000000000 lines, I have to extract the same index in every line in order to get the same piece of information. These are HASHES I want from the files so I was wondering how I’d find all the occurrences of hashes based on length.
import os
os.chdir('C:\HashFiles')
f = open('Part1.txt','r')
file_contents=f.readlines()
def linesA():
for line in file_contents:
lista = line.split(',')
print linesA()
this is all I have so far and this just puts everything in a list which I can index from, but I want to output the data from those indexes to another file and I am unable to because of the for statement, how can I get around this?
Wow you guys are great, now I have a problem because in the file where this info is stored it starts with information about the sponsor who provided the information, how do I bypass those lines to start from another line since the lines I need start at about 100 lines down the file, to help me because at the moment I get an index error and am unable to figure out how to set a condition to counter it. I tried this condition but didnt work : if line[:] != 15: continue
Most recent code to work with:
import csv
with open('c:/HashFiles/search_engine_primary.sql') as inf, open('c:/HashFiles/hashes.txt','w') as outf:
for i in xrange(47):
inf.next() # skip a line
for line in inf:
data = line.split(',')
if str(line[0]) == 'GO':
continue
hash = data[15]
outf.write(hash + '\n')
You can process the file line-by-line, like so:
If you want to separate the hashes by length, maybe something like:
Edit:: is there any way to identify sponsor lines – for example, they start with “#”? You could filter like
otherwise, if you have to skip N lines – this is nasty, because what if the number changes? – you can instead
Edit2:
Does this still give you errors??