X is a text file that contains 100000 equal size (500 elements) bit vector

Question

0

Asked: May 28, 20262026-05-28T01:43:08+00:00 2026-05-28T01:43:08+00:00

X is a text file that contains 100000 equal size (500 elements) bit vector

0

X is a text file that contains 100000 equal size (500 elements) bit vector (i.e. each row is a vector of 500 elements). I am generating an adjacency matrix (100000 X 100000) using the code below, but its not optimized and very time consuming. How can I improve that.

import numpy as np
import scipy.spatial.distance


 readFrom = "vector.txt"
 fout = open("adjacencymatrix.txt","a")

 X = np.genfromtxt(readFrom, dtype=None) 

 for outer in range(0,100000):
    for inner in range(0,100000):
        dis = scipy.spatial.distance.euclidean(X[outer],X[inner])
        tmp += str(dis)+" "
    tmp += "\n"        
    fout.write(tmp)
 fout.close()

Thank you.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T01:43:09+00:00

Edit: Complete rewrote after understanding the question better. Given the size of the data, etc. this one is tricky. I got my best results at speedup with the following so far:

import time
import numpy as np
from scipy import spatial
import multiprocessing as mp

pool = mp.Pool(4)

test_data = np.random.random(100000*500).reshape([100000,500])

outfile = open('/tmp/test.out','w')

def split(data,size):
    for i in xrange(0, len(data), size):
        yield data[i:i+size]

def distance(vecs):
    return spatial.distance.cdist(vecs,test_data)

chunks = list(split(test_data,100))
for chunk in chunks:
    t0 = time.time()
    distances = spatial.distance.cdist(chunk,test_data)
    outfile.write(' '.join([str(x) for x in distances]))
    print 'estimated: %.2f secs'%((time.time()-t0)*len(chunks))

So I tried balancing the size of each chunk of the dataset vs. the memory overhead. This got me down to an estimated 6,600 secs to finish, or ~110 mins. You can see I also started seeing if I could parallelize using the multiprocessing pool. My strategy would have been to asynchronously process each chunk and save them to a different text file, then concatenate the files aftwerwards, but I got to get back to work.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

X is a text file that contains 100000 equal size (500 elements) bit vector

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply