Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8460089
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T13:30:39+00:00 2026-06-10T13:30:39+00:00

I have a string I want to parse that looks a bit like github

  • 0

I have a string I want to parse that looks a bit like github markdown, but I really don’t want the full implementation. The string will be a mixture of “code” blocks and “text” blocks. The code blocks will be three backticks followed by an optional “language” then some code and finally three more backticks. Non-code will be pretty much everything else. I don’t (but possibly should) care if the user can’t input three backticks in the “text” blocks. Here’s an example …

This is some text followed by a code block
```ruby
def function
   "hello"
end
```
Some more text

Of course there may be more code and text blocks interspersed. I’ve tried writing a regex for this and it seemed to work but I couldn’t get the groups (in parens) to give me all of the matches and scan() loses the ordering. I’ve looked at using a couple of ruby parsers (treetop, parselet), but the look a bit big for what I want, but I am willing to go that route if that’s my best option.

Thoughts?

A couple of people have asked for the RE I was trying (many variations of below) …

re = 
  /
    ```\s*\w+\s*          # 3 backticks followed by the language
      (?!```).*?          # The code everything that's not 3 backticks
    ```                   # 3 more backticks
    |                     # OR
    (?!```).*             # Some text that doesn't include 3 backticks
  /x                      # Ignore white space in RE

It seems though that even in simple cases for example

md = /(a|b)*/.match("abaaabaa")

I’m not able to get all of the a’s and b’s. from say md[3] which doesn’t exist. Hope that makes more sense and that’s why I don’t think a RE will work in my case, but I wouldn’t mind being proven wrong.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T13:30:41+00:00Added an answer on June 10, 2026 at 1:30 pm

    I will be making some assumptions here, based on my knowledge about Markdown(github-, stackoverflow-flavors) and your question (which isn’t very precise as to the rest of the text).

    1.
    Every code block starts with a singular line,
    that only includes three backticks, an optional
    language-name and the newline-char.

    2.
    Every code block ends with a singular line only
    containing three backticks.

    3.
    A code block is not empty.

    If you can accept these assumptions, the following code should work
    (assuming the text is in the str variable):

    regex = %r{
      ^```[[:blank:]]*(?<lang>\w+)?[[:blank:]]*\n # matches start of codeblock, and captures optional :lang.
        (?<content>.+?) # matches codeblock content and captures in :content
      \n[[:blank:]]*```[[:blank:]]*\n # matches ending of codeblock.
    }xm # free-space mode and . matches newline.
    position = 0
    matches = []
    while(match = regex.match(str,position)) do
      position = match.end 0
      matches << [match[:lang], match[:content]]
    end
    

    After this matches contains an array of arrays, in which an inner array represents a match
    with the first element being the (optional) language, which may be nil, and the second element
    being the content.

    If you have more assumptions on the text, i could alter the regular expression.

    This is the teststring i used:

    str = %{
    this is some random text.
    ```ruby
      def print
        puts "this is a code block with lang-argument"
      end
    ```
    
    some other text follows here.
    i want some ``` backticks here.
    
    ```
      def print
        puts "this is a code block without lang-argument"
      end
    ```
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string that looks like this /root/test/test2/tesstset-werew-1 And I want to parse
I have a string that looks like L\4\ and want to parse it into
Lets say I have a string that represents a date that looks like this:
So I have a JSONObject (or String..) that looks like this: {locations:[{GeocodeResponse:{result:{formatted_address:Tchibanga (TCH), Gabon,address_component:[{long_name:Tchibanga,type:
I have to parse an XML document that looks like this: <?xml version=1.0 encoding=UTF-8
I have a JSON string (from PHP's json_encode() that looks like this: [{id: 1,
I have an intent filter that looks like so: <activity android:name=com.test.Call android:label=@string/makeCall > <intent-filter>
The issue I have is that I want to parse strings in order, so
I have a date string and I want to parse it to normal date
I have a string that I want passed via the linebreaks filter. {% trans

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.