i download many html store in os,now get their content ,and extract data what

Question

0

Asked: May 17, 20262026-05-17T06:59:35+00:00 2026-05-17T06:59:35+00:00

i download many html store in os,now get their content ,and extract data what

0

i download many html store in os,now get their content ,and extract data what i need to persistence to mysql,
i use the traditional load file one by one ,it’s not efficant cost nealy 8 mins.

any advice is welcome

g_fields=[
 'name',
 'price',
 'productid',
 'site',
 'link',
 'smallImage',
 'bigImage',
 'description',
 'createdOn',
 'modifiedOn',
 'size',
 'weight',
 'wrap',
 'material',
 'packagingCount',
 'stock',
 'location',
 'popularity',
 'inStock',
 'categories',
]   @cost_time
def batch_xml2csv():
    "批量将xml导入到一个csv文件中"
    delete(g_xml2csv_file)
    f=open(g_xml2csv_file,"a")
    import os.path
    import mmap
    for file in glob.glob(g_filter):
    print "读入%s"%file
    ff=open(file,"r+")
    size=os.path.getsize(file)
    data=mmap.mmap(ff.fileno(),size)
    s=pq(data.read(size))
    data.close()
    ff.close()
    #s=pq(open(file,"r").read())
    line=[]
    for field in g_fields:
        r=s("field[@name='%s']"%field).text()
        if r is None:
            line.append("\N")
        else:
            line.append('"%s"'%r.replace('"','\"'))
    f.write(",".join(line)+"\n")
    f.close()
    print "done!"

i tried mmap,it seems didn’t work well

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-17T06:59:35+00:00

If you’ve got 25,000 text files on disk, ‘you’re doing it wrong’. Depending on how you store them on disk, the slowness could literally be seeking on disk to find the files.

If you’ve got 25,0000 of anything it’ll be faster if you put it in a database with an intelligent index — even if you make the index field the filename it’ll be faster.

If you have multiple directories that descend N levels deep, a database would still be faster.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

i download many html store in os,now get their content ,and extract data what

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply