I am writing a function to select randomly elements stored in a dictionary: import

Question

0

Asked: June 18, 20262026-06-18T15:34:13+00:00 2026-06-18T15:34:13+00:00

I am writing a function to select randomly elements stored in a dictionary: import

0

I am writing a function to select randomly elements stored in a dictionary:

import random
from liblas import file as lasfile
from collections import defaultdict

def point_random_selection(list,k):
    try:
        sample_point = random.sample(list,k)
    except ValueError:
        sample_point = list
    return(sample_point)

def world2Pixel_Id(x,y,X_Min,Y_Max,xDist,yDist):
    col = int((x - X_Min)/xDist)
    row = int((Y_Max - y)/yDist)
    return("{0}_{1}".format(col,row))

def point_GridGroups(inFile,X_Min,Y_Max,xDist,yDist):
    Groups = defaultdict(list)
    for p in lasfile.File(inFile,None,'r'):
        id = world2Pixel_Id(p.x,p.y,X_Min,Y_Max,xDist,yDist)
        Groups[id].append(p)
    return(Groups)

where k is the number of element to select. Groups is the dictionary

file_out = lasfile.File("outPut",mode='w',header= h)
for m in Groups.iteritems():
   # select k point for each dictionary key 
   point_selected = point_random_selection(m[1],k)
   for l in xrange(len(point_selected)):
     # save the data 
     file_out.write(point_selected[l])
file_out.close()

My problem is that this approach is extremely slow (for file of ~800 Mb around 4 days)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T15:34:14+00:00

You could try and update your samples as you read the coordinates. This at least saves you from having to store everything in memory before running your sample. This is not guaranteed to make things faster.

The following is based off of BlkKnght’s excellent answer to build a random sample from file input without retaining all the lines. This just expanded it to keep multiple samples instead.

import random
from liblas import file as lasfile
from collections import defaultdict


def world2Pixel_Id(x, y, X_Min, Y_Max, xDist, yDist):
    col = int((x - X_Min) / xDist)
    row = int((Y_Max - y) / yDist)
    return (col, row)

def random_grouped_samples(infile, n, X_Min, Y_Max, xDist, yDist):
    """Select up to n points *per group* from infile"""

    groupcounts = defaultdict(int)
    samples = defaultdict(list)

    for p in lasfile.File(inFile, None, 'r'):
        id = world2Pixel_Id(p.x, p.y, X_Min, Y_Max, xDist, yDist)
        i = groupcounts[id]
        r = random.randint(0, i)

        if r < n:
            if i < n:
                samples[id].insert(r, p)  # add first n items in random order
            else:
                samples[id][r] = p  # at a decreasing rate, replace random items

        groupcounts[id] += 1

    return samples

The above function takes inFile and your boundary coordinates, as well as the sample size n, and returns grouped samples that have at most n items in each group, picked uniformly.

Because all you use the id for is as a group key, I reduced it to only calculating the col, row tuple, there is no need to make it a string.

You can write these out to a file with:

file_out = lasfile.File("outPut",mode='w',header= h)

for group in samples.itervalues():
    for p in group:
        file_out.write(p)

file_out.close()

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am writing a function to select randomly elements stored in a dictionary: import

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply