I have some data like this, where the second field is the probability of the first field, so “0: 0.017” mean there is a 0.017 chance of 0. The sum of all probability is 1.
My question is: how do I “range line” from the probabilities so that I can find the lower bound and the upper bound of each character? so 0 would be [0, 0.017), [0.017, 0.022) and so on.
I am trying to implement the arithmetic encoding.
(0: 0.017,
1: 0.022,
2: 0.033,
3: 0.033,
4: 0.029,
5: 0.028,
6: 0.035,
7: 0.032,
8: 0.028,
9: 0.027,
a: 0.019,
b: 0.022,
c: 0.029,
d: 0.03,
e: 0.028,
f: 0.035,
g: 0.026,
h: 0.037,
i: 0.029,
j: 0.025,
k: 0.025,
l: 0.037,
m: 0.025,
n: 0.023,
o: 0.026,
p: 0.035,
q: 0.033,
r: 0.031,
s: 0.023,
t: 0.022,
u: 0.038,
v: 0.022,
w: 0.016,
x: 0.026,
y: 0.021,
z: 0.033,)
edit*
nvm i figured it out, just messing up on the silly math… thanks for all the inputs!!!
From this point on you can figure out what to do with your values. I documented the code (granted a really simple naive way to process the input file). But the
my_listis now clean and nicely formatted, withstring(value), andfloat(frequency). Hope this help.Output of the code from above:
And then…
Output of is here:
And finally, to output the final result: There isn’t anything too special, I used the
patternandformatto let them look nicer. And it is pretty much according to ninjagecko’s method to calculate it. I did have to pad the 0.00, and 1.00 into the distribution, since the calculation did not show it. Pretty straight forward implementation after we figure out how to do the probability.Output:
Full source is here: http://codepad.org/a6YkHhed