Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8741399
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T11:18:26+00:00 2026-06-13T11:18:26+00:00

Imagine we have four symbols – ‘a’ , ‘b’ , ‘c’ , ‘d’ .

  • 0

Imagine we have four symbols – ‘a’, ‘b’, ‘c’, ‘d’. We also have four given probabilities of those symbols appearing in the function output – P1, P2, P3, P4 (the sum of which is equal to 1). How would one implement a function which would generate a random sample of three of those symbols, such is that the resulting symbols are present in it with those specified probabilities?

Example: ‘a’, ‘b’, ‘c’ and ‘d’ have the probabilities of 9/30, 8/30, 7/30 and 6/30 respectively. The function outputs various random samples of any three out of those four symbols: ‘abc’, ‘dca’, ‘bad’ and so on. We run this function many times, counting the amount of times each of the symbols is encountered in its output. At the end, the value of counts stored for ‘a’ divided by the total amount of symbols output should converge to 9/30, for ‘b’ to 8/30, for ‘c’ to 7/30, and for ‘d’ to 6/30.

E.g. the function generates 10 outputs:

adc
dab
bca
dab
dba
cab
dcb
acd
cab
abc

which out of 30 symbols contains 9 of ‘a’, 8 of ‘b’, 7 of ‘c’ and 6 of ‘d’. This is an idealistic example, of course, as the values would only converge when the number of samples is much larger – but it should hopefully convey the idea.

Obviously, this all is only possible when neither probability is larger than 1/3, since each single sample output would always contain three distinct symbols. It is ok for the function to enter an infinite loop or otherwise behave erratically if it’s impossible to satisfy the values provided.

Note: the function should obviously use an RNG, but should otherwise be stateless. Each new invocation should be independent from any of the previous ones, except for the RNG state.

EDIT: Even though the description mentions choosing 3 out of 4 values, ideally the algorithm should be able to cope with any sample size.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T11:18:27+00:00Added an answer on June 13, 2026 at 11:18 am

    Your problem is underdetermined.

    If we assign a probability to each string of three letters that we allow, p(abc), p(abd), p(acd) etc xtc we can gernerate a series of equations

    eqn1: p(abc) + p(abd) + ... others with a "a" ... = p1
      ...
      ...
    eqn2: p(abd) + p(acd) + ... others with a "d" ... = p4
    

    This has more unknowns than equations, so many ways of solving it. Once a solution is found, by whatever method you choose (use the simplex algorithm if you are me), sample from the probabilities of each string using the roulette method that @alestanis describes.

    from numpy import *
    
    # using cvxopt-1.1.5
    from cvxopt import matrix, solvers 
    
    ###########################
    # Functions to do some parts
    
    # function to find all valid outputs
    def perms(alphabet, length):
        if length == 0:
            yield ""
            return
        for i in range(len(alphabet)):
            val1 = alphabet[i]
            for val2 in perms(alphabet[:i]+alphabet[i+1:], length-1):
                yield val1 + val2
    
    
    # roulette sampler
    def roulette_sampler(values, probs):
        # Create cumulative prob distro
        probs_cum = [sum(probs[:i+1]) for i in range(n_strings)]
        def fun():
            r = random.rand()
            for p,s in zip(probs_cum, values):
                if r < p:
                    return s
            # in case of rounding error
            return values[-1]
        return fun
    
    
    ############################
    #    Main Part
    
    
    
    # create list of all valid strings
    
    alphabet = "abcd"
    string_length = 3
    alpha_probs = [string_length*x/30. for x in range(9,5,-1)]
    
    # show probs
    for a,p in zip(alphabet, alpha_probs):
        print "p("+a+") =",p
    
    
    
    
    # all valid outputs for this particular case
    strings = [perm for perm in perms(alphabet, string_length)]
    n_strings = len(strings)
    
    # constraints from probabilities p(abc) + p(abd) ... = p(a)
    contains = array([[1. if s.find(a) >= 0 else 0. for a in alphabet] for s in strings])
    #both = concatenate((contains,wons), axis=1).T # hacky, but whatever
    #A = matrix(both)
    #b = matrix(alpha_probs + [1.])
    A = matrix(contains.T)
    b = matrix(alpha_probs)
    
    #also need to constrain to [0,1]
    wons = array([[1. for s in strings]])
    G = matrix(concatenate((eye(n_strings),wons,-eye(n_strings),-wons)))
    h = matrix(concatenate((ones(n_strings+1),zeros(n_strings+1))))
    
    ## target matricies for approx KL divergence
    # uniform prob over valid outputs
    u = 1./len(strings)
    P = matrix(eye(n_strings))
    q = -0.5*u*matrix(ones(n_strings))
    # will minimise p^2 - pq for each p val equally
    
    
    # Do convex optimisation
    sol = solvers.qp(P,q,G,h,A,b)
    probs = array(sol['x'])
    
    # Print ouput
    for s,p in zip(strings,probs):
        print "p("+s+") =",p
    checkprobs = [0. for char in alphabet]
    for a,i in zip(alphabet, range(len(alphabet))):
        for s,p in zip(strings,probs):
            if s.find(a) > -1:
                checkprobs[i] += p
        print "p("+a+") =",checkprobs[i]
    print "total =",sum(probs)
    
    # Create the sampling function
    rndstring = roulette_sampler(strings, probs)
    
    
    ###################
    # Verify
    
    print "sampling..."
    test_n = 1000
    output = [rndstring() for i in xrange(test_n)]
    
    # find which one it is
    sampled_freqs = []
    for char in alphabet:
        n = 0
        for val in output:
            if val.find(char) > -1:
                n += 1
        sampled_freqs += [n]
    
    print "plotting histogram..."
    import matplotlib.pyplot as plt
    plt.bar(range(0,len(alphabet)),array(sampled_freqs)/float(test_n), width=0.5)
    plt.show()
    

    EDIT: Python code

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Imagine that I have a function in C that has 5 parameter. sum(n1,n2,n3,n4,n5); In
Imagine i have this HTML snippet 4 times over for four different sections of
Imagine that I have UserId (actually we do have roughly four columns like userId,
Imagine I have two (three, four, whatever) tasks that have to run in parallel.
I currently have a page with four list boxes on it and imagine a
Imagine four lists, all at least have this Id string property, but may have
Imagine you have a function tree which has a number of nodes which can
Imagine that I have a question for which there are four options, and a
Imagine I have following table: NAME DATE OTHER_CONTANT 'A' '2012-06-05' 'baz' 'A' '2012-06-04' 'bar'
Imagine you have two views with code like the following: controller_a/a.html.erb <%= content_tag(:div) do

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.