I have a simple python script for indexing a CSV file containing 1 million rows:
import csv
from pyes import *
reader = csv.reader(open('data.csv', 'rb'))
conn = ES('127.0.0.1:9200', timeout=20.0)
counter = 0
for row in reader:
try:
data = {"name":row[5]}
conn.index(data,'namesdb',counter, bulk=True)
counter += 1
except:
pass
This works quite well but as we go into the thousands, it all slows down exponentially.
I’m guessing if I did the index in smaller chunks ES will perform better.
Is there a more efficient way of doing this? Would a sleep() delay help? or is there an easy way to break up the csv into smaller chunks programmatically?
Thanks.
on every Nth count run
example here