I got a litte problem understanding conceptually the structure of a random writing program (that takes input in form of a text file) and uses the Markov algorithm to create a somewhat sensible output.
So the data structure i am using is to use cases ranging from 0-10. Where at case 0: I count the number a letter/symbol or digit appears and base my new text on this to simulate the input. I have already implemented this by using an Map type that holds each unique letter in the input text and a array of how many there are in the text. So I can simply ask for the size of the array for the specific letter and create output text easy like this.
But now I Need to create case1/2/3 and so on… case 1 also holds what letter is most likely to appear after any letter aswell. Do i need to create 10 seperate arrays for these cases, or are there an easier way?
There are a lot of ways to model this. One approach is as you describe, with an multi-dimensional array where each index is the following character in the chain and the final result is the count.
Then to randomly generate names you would count the number of total outcomes for a given character (ex: 2 outcomes for ‘p’ in the previous example) and pick a weighted random number for one of the possible outcomes.
For smaller sizes (say up to 4 characters) that should work fine. For anything larger you may start to run into memory issues since (assuming you’re using A-Z) 26^N entries for an N-length chain.
I wrote something like a couple of years ago. I think I used random pages from Wikipedia to for seed data to generate the weights.