I am probably thinking about this wrong but here goes. A computer starts spitting

Question

0

Asked: June 16, 20262026-06-16T07:32:52+00:00 2026-06-16T07:32:52+00:00

I am probably thinking about this wrong but here goes. A computer starts spitting

0

I am probably thinking about this wrong but here goes.

A computer starts spitting out a gazillion random numbers between 11111111111111111111 and 99999999999999999999, in a linear row:

Sometimes the computer adds a number to one end of the line.
Sometimes the computer adds a number to the other end of the line.
Each number has a number that comes, or will come, before.
Each number has a number that comes, or will come, after.
Not all numbers are unique, many, but not most, are repeated.
The computer never stops spitting out numbers.

As I record all of these numbers, I need to be able to make an educated guess, at any given time:

If this is the second time I have seen a number I must know what number preceded it in line last time.
If it has appeared more than two times, I must know the probability/frequency of numbers preceding it.
If this is the second time I have seen a number, I must also know what number came after it in line last time.
If it has appeared more than two times, I must know the probability/frequency of numbers coming after it.

How the heck do I structure the tables in a MySQL database to store all these numbers? Which engine do I use and why? How do I formulate my queries? I need to know fast, but capacity is also important because when will the thing stop spitting them out?

My ill-conceived plan:

2 Tables:

1. Unique ID/#
2. #/ID/#

My thoughts:

Unique ID’s are almost always going to be shorter than the number = faster match.
Numbers repeat = fewer ID rows = faster match initially.

Select * in table2 where id=(select id in table1 where #=?)

OR:

3 Tables:

1. Unique ID/#
2. #/ID
3. ID/#

My thoughts:

If I only need left/before, or only need after/right, im shrinking the size of the second query.

SELECT # IN table2(or 3) WHERE id=(SELECT id IN table1 WHERE #=?)

OR

1 Table:

1. #/#/#

Thoughts:

Less queries = less time.

SELECT * IN table WHERE col2=#.

I’m lost…. 🙁 Each number has four attributes, that which comes before+frequency and that which comes after+frequency.

Would I be better off thinking of it in that way? If I store and increment frequency in the table, I do away with repetition and thus speed up my queries? I was initially thinking that if I store every occurrence, it would be faster to figure the frequency programmatically…….

Such simple data, but I just don’t have the knowledge of how databases function to know which is more efficient.

In light of a recent comment, I would like to add a bit of information about the actual problem: I have a string of indefinite length. I am trying to store a Markov chain frequency table of the various characters, or chunks of characters, in this string.

Given any point in the string I need to know the probability of the next state, and the probability of the previous state.

I am anticipating user input, based on a corpus of text and past user input. A major difference compared to other applications I have seen is that I am going farther down the chain, more states, at a given time and I need the frequency data to provide multiple possibilities.

I hope that clarifies the picture a lot more. I didn’t want to get into the nitty gritty of the problem, because in the past I have created questions that are not specific enough to get a specific answer.

This seems maybe a bit better. My primary question with this solution is: Would providing the “key” (first few characters of the state) increase the speed of the system? i.e query for state_key, then query only the results of that query for the full state?

Table 1:
name: state
col1:state_id - unique, auto incrementing
col2:state_key - the first X characters of the state
col3:state - fixed length string or state

Table 2:
name: occurence
col1:state_id_left - non unique key from table 1
col2:state_id_right - non unique key from table 1
col3:frequency - int, incremented every time the two states occur next to each other.

QUERY TO FIND PREVIOUS STATES:
SELECT * IN occurence WHERE state_id_right=(SELECT state_id IN state WHERE state_key=? AND state=?)

QUERY TO FIND NEXT STATES:
SELECT * IN occurence WHERE state_id_left=(SELECT state_id IN state WHERE state_key=? AND state=?)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T07:32:54+00:00

I’m not familiar with Markov Chains but here is an attempt to answer the question. Note: To simplify things, let’s call each string of numbers a ‘state’.

First of all I imagine a table like this

Table states:
order : integer autonumeric (add an index here)
state_id : integer (add an index here)
state : varchar (?)

order: just use a sequential number (1,2,3,…,n) this will make it easy to search for the previous or next state.

state_id: a unique number associated to the state. As an example, you can use the number 1 to represent the state ‘1111111111…1’ (whatever the length of the sequence is). What’s important is that a reoccurrence of a state needs to use the same state_id that was used before. You may be able to formulate the state_id based on the string (maybe substracting a number). Of course a state_id only makes sense if the number of possible states fits in a MySQL int field.

state: that is the string of numbers ‘11111111…1’ to ‘99999999…9’ … I’m guessing this can only be stored as a string but if it fits in an integer/number column you should try it as it may well be that you don’t need the state_id

The point of state_id is that searching number is quicker than searching texts, but there will always be trade-offs when it comes to performance … profile and identify your bottlenecks to make better design decisions.

So, how do you look for a previous occurrence of the state S_i ?

“SELECT order, state_id, state FROM states WHERE state_id = ” and then attach get_state_id(S_i) where get_state_id ideally uses a formula to generate a unique id for the state.

Now, with order – 1 or order + 1 you can access the neighboring states issuing an additional query.

Next we need to track the frequency of different occurrences. You can do that in a different table that could look like this:

Table state_frequencies:
state_id integer (indexed)
occurrences integer

And only add records as you get the numbers.

Finally, you can have tables to track frequency for the neighboring states:

Table prev_state_frequencies (next_state_frequencies is the same):
state_id: integer (indexed)
prev_state_id: integer (indexed)
occurrences: integer

You will be able to infer probabilities (i guess this is what you are trying to do) by looking at the number of occurrences of a state (in state_frequencies) vs the number of occurrences of it’s predecessor state (in prev_state_frequencies).

I’m not sure if I got your problem right but if this makes sense I’m guessing I have.

Hope it helps,
AH

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am probably thinking about this wrong but here goes. A computer starts spitting

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply