I am using sparse matrices as a mean of compressing data, with loss of course, what I do is I create a sparse dictionary from all the values greater than a specified treshold. I’d want my compressed data size to be a variable which my user can choose.
My problem is, I have a sparse matrix with alot of near-zero values, and what I must do is choose a treshold so that my sparse dictionary is of a specific size (or eventually that the reconstruction error is of a specific rate)
Here’s how I create my dictionary (taken from stackoverflow I think >.< ):
n = abs(smat) > treshold #smat is flattened(1D)
i = mega_range[n] #mega range is numpy.arange(smat.shape[0])
v = smat[n]
sparse_dict = dict(izip(i,v))
How can I find treshold so that it is equal to the nth greatest value of my array (smat)?
scipy.stats.scoreatpercentile(arr,per)returns the value at a given percentile:The value is interpolated if the desired percentile lies between two points in
arr.So if you set
per=(len(smat)-n)/len(smat)thenshould give you (close to) the nth greatest value of the array smat.