Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8361849
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T11:47:50+00:00 2026-06-09T11:47:50+00:00

I have a text files that lists pairs, for example 10,1 2,7 3,1 10,1

  • 0

I have a text files that lists pairs, for example

10,1
2,7
3,1
10,1

That has then been turned into a symmetric matrix, so the (1,10) entry is the number of times the pair (1,10) showed up on the list. I would now like to subsample this matrix. By subsample I mean – I would like to make a matrix that would have been the result of only using a random 30% of the lines in the original text file. So in this example, had I erased 70% of the text file, the (1,10) pair might only show up once instead of twice, and so the (1,10) entry in the matrix would be 1 instead of 2.

This can be done easily if I actually have the original text file, by just using random.sample to pick out 30% of the lines in the files. But if I only have the matrix, how can I randomly decimate 70% of the data?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T11:47:52+00:00Added an answer on June 9, 2026 at 11:47 am

    I guess the best way depends on where your data is large:

    • Do you have a huge matrix, with mostly small counts in it? or
    • Do you have a moderately sized matrix with huge numbers of counts in it?

    Here’s a solution that will be suited to the second case, though it will also work
    OK in the first case.

    Basically, the fact that the counts happen to be in a 2D matrix is not so
    important: this is basically the problem of sampling from a population that has
    been binned. So what we can do is extract the bins directly, and forget about the
    matrix for a bit:

    import numpy as np
    import random
    
    # Input counts matrix
    mat = np.array([
        [5, 5, 2],
        [1, 1, 3],
        [6, 0, 4]
    ], dtype=np.int64)
    
    # Build a list of (row,col) pairs, and a list of counts
    keys, counts = zip(*[
        ((i,j), mat[i,j])
            for i in range(mat.shape[0])
            for j in range(mat.shape[1])
            if mat[i,j] > 0
    ])
    

    And then sample from those bins, using a cumulative array of counts:

    # Make the cumulative counts array
    counts = np.array(counts, dtype=np.int64)
    sum_counts = np.cumsum(counts)
    
    # Decide how many counts to include in the sample
    frac_select = 0.30
    count_select = int(sum_counts[-1] * frac_select)
    
    # Choose unique counts
    ind_select = sorted(random.sample(xrange(sum_counts[-1]), count_select))
    
    # A vector to hold the new counts
    out_counts = np.zeros(counts.shape, dtype=np.int64)
    
    # Perform basically the merge step of merge-sort, finding where
    # the counts land in the cumulative array
    i = 0
    j = 0
    while i<len(sum_counts) and j<len(ind_select):
        if ind_select[j] < sum_counts[i]:
            j += 1
            out_counts[i] += 1
        else:
            i += 1
    
    # Rebuild the matrix using the `keys` list from before
    out_mat = np.zeros(mat.shape, dtype=np.int64)
    for i in range(len(out_counts)):
        out_mat[keys[i]] = out_counts[i]
    

    Now you will have the sampled matrix in out_mat.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a text file that lists the name of a track then after
I have two text files that contain columnar data of the variety position -
Hi I have two text files that each contain different information about certain structures.The
I have two large (~100 GB) text files that must be iterated through simultaneously.
I have a program that creates multiple text files of rdf triples. I need
I have some xml files that contain text, which are displayed on my website.
i have large numbers of text files and i am in problem that i
I have a couple different apps that write text files to the Documents Directory
I have some html files that I want to convert to text. I have
I have a text file that lists the names of a large number of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.