I have some data like this, where the second field is the probability of

Question

0

Asked: June 1, 20262026-06-01T16:32:21+00:00 2026-06-01T16:32:21+00:00

I have some data like this, where the second field is the probability of

0

I have some data like this, where the second field is the probability of the first field, so “0: 0.017” mean there is a 0.017 chance of 0. The sum of all probability is 1.

My question is: how do I “range line” from the probabilities so that I can find the lower bound and the upper bound of each character? so 0 would be [0, 0.017), [0.017, 0.022) and so on.

I am trying to implement the arithmetic encoding.

(0: 0.017,
1: 0.022,
2: 0.033,
3: 0.033,
4: 0.029,
5: 0.028,
6: 0.035,
7: 0.032,
8: 0.028,
9: 0.027,
a: 0.019,
b: 0.022,
c: 0.029,
d: 0.03,
e: 0.028,
f: 0.035,
g: 0.026,
h: 0.037,
i: 0.029,
j: 0.025,
k: 0.025,
l: 0.037,
m: 0.025,
n: 0.023,
o: 0.026,
p: 0.035,
q: 0.033,
r: 0.031,
s: 0.023,
t: 0.022,
u: 0.038,
v: 0.022,
w: 0.016,
x: 0.026,
y: 0.021,
z: 0.033,)

edit*

nvm i figured it out, just messing up on the silly math… thanks for all the inputs!!!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T16:32:22+00:00

# The data is input as '1: 0.022,' format
def process_data(line):
    # for returning the new string that is cleaned up
    result_line = ''
    for character in line:
        # check if it is either a number or a letter
        if character.isdigit() or character.isalpha():
            result_line += character
        # we want the decimal point
        elif character == '.':
            result_line += character
        # else we replace it with space ' '
        else:
            result_line += ' '
    return result_line

my_list = []  

with open('input.txt') as file:
    for lines in file:
        processed_line = process_data(lines)
        # temp_list has ['letter', 'frequency']
        temp_list = (processed_line.split())
        value = temp_list[0]
        # Require to cast it to a float, since it is a string
        frequency = float(temp_list[1])
        my_list.append([value, frequency])

print(my_list)

From this point on you can figure out what to do with your values. I documented the code (granted a really simple naive way to process the input file). But the my_list is now clean and nicely formatted, with string (value), and float (frequency). Hope this help.

Output of the code from above:

[['0', 0.017], ['1', 0.022], ['2', 0.033], ['3', 0.033], 
['4', 0.029], ['5', 0.028], ['6', 0.035], ['7', 0.032], 
['8', 0.028], ['9', 0.027], ['a', 0.019], ['b', 0.022], 
['c', 0.029], ['d', 0.03], ['e', 0.028], ['f', 0.035], 
['g', 0.026], ['h', 0.037], ['i', 0.029], ['j', 0.025], 
['k', 0.025], ['l', 0.037], ['m', 0.025], ['n', 0.023], 
['o', 0.026], ['p', 0.035], ['q', 0.033], ['r', 0.031], 
['s', 0.023], ['t', 0.022], ['u', 0.038], ['v', 0.022], 
['w', 0.016], ['x', 0.026], ['y', 0.021], ['z', 0.033]]

And then…

# Took a page out of TokenMacGuy, credit to him
distribution = []
distribution.append(0.00)  
total = 0.0 # Create a float here

for entry in my_list:
    distribution.append(entry[1])
    total += frequency
    total = round(total, 3) # Rounding to 2 decimal points

distribution.append(1.00) # Missing the 1.00 value
print(distribution) # Print to check

Output of is here:

[0.0, 0.017, 0.022, 0.033, 0.033, 0.029, 0.028, 0.035, 0.032, 
0.028, 0.027, 0.019, 0.022, 0.029, 0.03, 0.028, 0.035, 0.026, 
0.037, 0.029, 0.025, 0.025, 0.037, 0.025, 0.023, 0.026, 0.035, 
0.033, 0.031, 0.023, 0.022, 0.038, 0.022, 0.016, 0.026, 0.021, 
0.033, 1.0]

And finally, to output the final result: There isn’t anything too special, I used the pattern and format to let them look nicer. And it is pretty much according to ninjagecko’s method to calculate it. I did have to pad the 0.00, and 1.00 into the distribution, since the calculation did not show it. Pretty straight forward implementation after we figure out how to do the probability.

pattern = '{0}: [{1:1.3f}, {2:1.3f})'
count = 1 # a counter to keep track of the index

pre_p = distribution[0] 
p = distribution[1]

# Here we will print it out at the end in the format you said in the question
for entry in my_list:
    print(pattern.format(entry[0], pre_p, p))
    pre_p += distribution[count]
    p += distribution[count+1]
    count = count + 1

Output:

0: [0.000, 0.017)
1: [0.017, 0.039)
2: [0.039, 0.072)
3: [0.072, 0.105)
4: [0.105, 0.134)
5: [0.134, 0.162)
6: [0.162, 0.197)
7: [0.197, 0.229)
8: [0.229, 0.257)
9: [0.257, 0.284)
a: [0.284, 0.303)
b: [0.303, 0.325)
c: [0.325, 0.354)
d: [0.354, 0.384)
e: [0.384, 0.412)
f: [0.412, 0.447)
g: [0.447, 0.473)
h: [0.473, 0.510)
i: [0.510, 0.539)
j: [0.539, 0.564)
k: [0.564, 0.589)
l: [0.589, 0.626)
m: [0.626, 0.651)
n: [0.651, 0.674)
o: [0.674, 0.700)
p: [0.700, 0.735)
q: [0.735, 0.768)
r: [0.768, 0.799)
s: [0.799, 0.822)
t: [0.822, 0.844)
u: [0.844, 0.882)
v: [0.882, 0.904)
w: [0.904, 0.920)
x: [0.920, 0.946)
y: [0.946, 0.967)
z: [0.967, 1.000)

Full source is here: http://codepad.org/a6YkHhed

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have some data like this, where the second field is the probability of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply