Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8454017
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T11:55:01+00:00 2026-06-10T11:55:01+00:00

I am writing a program that generates random text based on the Markov model.

  • 0

I am writing a program that generates random text based on the Markov model. I am running into a problem, with some files that have a lot of spaces in between words, the initial seed is seen to be a space. The problem is that all the next characters are seen as spaces as well and so the random text that is generated is just a blank documents as nextChosenChar is always a space.

Can someone suggest some solution to this problem?

I tried to come up with a solution as seen the latter part of the code below, but to no avail.

char ChooseNextChar(string seed, int order, string fileName){
    Map<string, Vector<char> > nextCharMap;
    ifstream inputStream;
    inputStream.open(fileName.c_str());
    int offset = 0;
    Vector<char> charsFollingSeedVector;
    inputStream.clear();
    char* buffer = new char [order + 1];
    char charFollowingSeed;
    static int consecutiveSpaces = 0;
    while (!inputStream.eof()) {    
        inputStream.seekg(offset);
        inputStream.read(buffer, order + 1);
        string key(buffer, order);
        if (equalsIgnoreCase(key, seed)) {
            //only insert key if not present otherwise overwriting old info 
            if (!nextCharMap.containsKey(seed)) {
                nextCharMap.put(seed, charsFollingSeedVector);
            }
            //read the char directly following seed
            charFollowingSeed = buffer[order];
            nextCharMap[seed].push_back(charFollowingSeed);
        }
        offset++;
    }
    //case where no chars following seed
    if (nextCharMap[seed].isEmpty()) {
        return EOF;
    }
    //determine which is the most frequent following char
    char nextChosenChar = MostFequentCharInVector(seed, nextCharMap);

    //TRYING TO FIX PROBLEM OF ONLY OUTPUTTING SPACES**********
     if (nextChosenChar == ' ') {
        consecutiveSpaces++;
        if (consecutiveSpaces >= 1) {
            nextChosenChar = nextCharMap[seed].get(randomInteger(0, nextCharMap[seed].size()-1));
            consecutiveSpaces = 0;
        }
    }
    return nextChosenChar;
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T11:55:02+00:00Added an answer on June 10, 2026 at 11:55 am

    If you really want a character-based model, you won’t get very natural looking text as output, but it is definitely possible, and that model will fundamentally be able to deal with sequences of space characters as well. There is no need to remove them from the input if you consider them a natural part of the text.

    What is important is that a Markov model does not always fall back to predicting the one character that has the highest probability at any given stage. Instead, it must look at the entire probability distribution of possible characters, and chooses one randomly.

    Here, randomly means it picks a character not pre-determined by the programmer. Still, the random distribution is not the uniform distribution, i.e. not all characters are equally likely. It has to take into account the relative probabilities of the various possible characters. One way to do this is to generate a cumulative probability distribution of characters, i.e. for example, if the probabilities are

    p('a') == 0.2
    p('b') == 0.4
    p('c') == 0.4
    

    we represent them as

    p('a') == 0.2
    p('b') == p('a') + 0.4 == 0.6
    p('c') == p('a') + p('b') == 1.0
    

    Then to generate a random character, we first generate a uniformly distributed random number N between 0 and 1, and then choose the first character whose cumulative probability is no less than N.

    I have implemented this in the example code below. The train() procedure generates a cumulative probability distribution of the following-characters, for every character in the training input. The ‘predict()’ procedure applies this to generate random text.

    For a full implementation, this still lacks:

    • A representation of the probability distribution for the initial character. As you see in the ‘main()’ function, my output simply always starts with ‘t’.
    • A representation of the length of the output string, or the final character. ‘main()’ simply always generates a string of length 100.

    The code was tested with GCC 4.7.0 (C++11 option) on Linux. Example output below.

    #include <iostream>
    #include <string>
    #include <vector>
    #include <utility>
    #include <map>
    #include <numeric>
    #include <algorithm>
    #include <random>
    
    template <typename Char>
    class Markov
    {
    public:
      /* Data type used to count the frequencies (integer!) of
         characters. */
      typedef std::map<Char,unsigned>            CharDistributionMap;
    
      /* Data type used to represent a cumulative probability (float!)
         distribution. */
      typedef std::vector<std::pair<Char,float>> CharDistribution;
    
      /* Data type used to represent the Markov model. Each character is
         mapped to a probality distribution of the characters that follow
         it. */
      typedef std::map<Char,CharDistribution>    MarkovModel;
    
    
      /* The model. */
      MarkovModel  _model;
    
      /* Training procedure. */
      template <typename Iterator>
      void train(Iterator from, Iterator to)
      {
        _model = {};
        if (from == to)
          return;
    
        std::map<Char,CharDistributionMap> proto_model {};
    
        /* Count frequencies. */
        Char current = *from;
        while (true) {
          ++from;
          if (from == to)
            break;
          Char next = *from;
          proto_model[current][next] += 1;
          current = next;
        }
    
        /* Transform into probability distribution. */
        for (const auto &entry : proto_model) {
          const Char current              = entry.first;
          const CharDistributionMap &freq = entry.second;
    
          /* Calculate total frequency of current character. */
          unsigned total =
             std::accumulate(std::begin(freq),std::end(freq),0,
               [](unsigned res,const std::pair<Char,unsigned> &p){
                       return res += p.second;
                   });
    
          /* Determine the probability distribution of characters that
             follow the current character. This is calculated as a cumulative
             probability. */
          CharDistribution dist {};
          float probability { 0.0 };
          std::for_each(std::begin(freq),std::end(freq),
                 [total,&probability,&dist](const std::pair<Char,unsigned> &p){
                       // using '+=' to get cumulative probability:
                       probability += static_cast<float>(p.second) / total; 
                       dist.push_back(std::make_pair(p.first,probability));
                 });
    
          /* Add probability distribution for current character to the model. */
          _model[current] = dist;
        }
      }
    
    
      /* Predict the next character, assuming that training has been
         performed. */
      template <typename RandomNumberGenerator>
      Char predict(RandomNumberGenerator &gen, const Char current)
      {
        static std::uniform_real_distribution<float> generator_dist { 0, 1 };
    
        /* Assume that the current character is known to the model. Otherwise,
           an std::out_of_range exception will be thrown. */
        const CharDistribution &dist { _model.at(current) };
    
        /* Generate random number between 0 and 1. */
        float random { generator_dist(gen) };
    
        /* Identify the character that has the greatest cumulative probabilty
           smaller than the random number generated. */
        auto res =
             std::lower_bound(std::begin(dist),std::end(dist),
                              std::make_pair(Char(),random),
                 [](const std::pair<Char,float> &p1, const std::pair<Char,float> &p2) {
                        return (p1.second < p2.second);
                 });
        if (res == std::end(dist))
          throw "Empty probability distribution. This should not happen.";
        return res->first;
      }
    
    };
    
    int main()
    {
      /* Initialize random-number generator. */
      std::random_device rd;
      std::mt19937 gen(rd());
    
    
      std::string input { "this   is    some   input text   with   many spaces." };
    
      if (input.empty())
        return 1;
    
      /* We append the first character to the end, to ensure that even the
         last character of the text gets a non-empty probability
         distribution. A more proper way of dealing with character that
         have empty distributions would be _smoothing_. */
      input += input[0];
    
      Markov<char> markov {};
      markov.train(std::begin(input),std::end(input));
    
      /* We set the initial character. In a real stochastic model, there
         would have to be a separate probality distribution for initial
         character and we would choose the initial character randomly,
         too. */
      char current_char { 't' };
    
      for (unsigned i = 0 ; i < 100 ; ++i) {
        std::cout << current_char;
        current_char = markov.predict(gen,current_char);
      }
      std::cout << current_char << std::endl;
    }
    

    Some example output generated by this program:

    t  mext s.t th   winy  iny  somaces      sputhis inpacexthispace te  iny            me   mext mexthis
    
    tes    is  manputhis.th is  wis.th with it    is  is.t  s   t   winy    it mext    is        ispany
    
    this  maces      somany  t    s        it this  winy sputhisomacext manput    somanputes  macexte iso
    
    t   wispanpaces maces  tesomacexte s  s  mes.th     isput t wit   t   somanputes   s  withit  sput ma
    

    As you can see, the distribution of space characters follows, sort of naturally, the distribution found in the input text.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am writing a multi-threaded Java program that generates lot of random numbers. Additional
I am writing a program that generates a single large table of information. The
I am writing a program that generates excel reports, currently using the Microsoft.Interop.Excel reference.
Im writing a program that should read input via stdin, so I have the
I'm writing a program that handles DBs and writes any changes into ListView for
I am writing a program that permutes a list of names based on a
For about a year I have been thinking about writing a program that writes
I am actually writing a program to generate some truely random numbers. So, i
I am writing a program that generates standard alias names per table name. In
I was writing a little program that takes a list and generates a menu

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.