I have a string I want to parse that looks a bit like github

Question

0

Asked: June 10, 20262026-06-10T13:30:39+00:00 2026-06-10T13:30:39+00:00

I have a string I want to parse that looks a bit like github

0

I have a string I want to parse that looks a bit like github markdown, but I really don’t want the full implementation. The string will be a mixture of “code” blocks and “text” blocks. The code blocks will be three backticks followed by an optional “language” then some code and finally three more backticks. Non-code will be pretty much everything else. I don’t (but possibly should) care if the user can’t input three backticks in the “text” blocks. Here’s an example …

This is some text followed by a code block
```ruby
def function
   "hello"
end
```
Some more text

Of course there may be more code and text blocks interspersed. I’ve tried writing a regex for this and it seemed to work but I couldn’t get the groups (in parens) to give me all of the matches and scan() loses the ordering. I’ve looked at using a couple of ruby parsers (treetop, parselet), but the look a bit big for what I want, but I am willing to go that route if that’s my best option.

Thoughts?

A couple of people have asked for the RE I was trying (many variations of below) …

re = 
  /
    ```\s*\w+\s*          # 3 backticks followed by the language
      (?!```).*?          # The code everything that's not 3 backticks
    ```                   # 3 more backticks
    |                     # OR
    (?!```).*             # Some text that doesn't include 3 backticks
  /x                      # Ignore white space in RE

It seems though that even in simple cases for example

md = /(a|b)*/.match("abaaabaa")

I’m not able to get all of the a’s and b’s. from say md[3] which doesn’t exist. Hope that makes more sense and that’s why I don’t think a RE will work in my case, but I wouldn’t mind being proven wrong.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T13:30:41+00:00

I will be making some assumptions here, based on my knowledge about Markdown(github-, stackoverflow-flavors) and your question (which isn’t very precise as to the rest of the text).

1.
Every code block starts with a singular line,
that only includes three backticks, an optional
language-name and the newline-char.

2.
Every code block ends with a singular line only
containing three backticks.

3.
A code block is not empty.

If you can accept these assumptions, the following code should work
(assuming the text is in the str variable):

regex = %r{
  ^```[[:blank:]]*(?<lang>\w+)?[[:blank:]]*\n # matches start of codeblock, and captures optional :lang.
    (?<content>.+?) # matches codeblock content and captures in :content
  \n[[:blank:]]*```[[:blank:]]*\n # matches ending of codeblock.
}xm # free-space mode and . matches newline.
position = 0
matches = []
while(match = regex.match(str,position)) do
  position = match.end 0
  matches << [match[:lang], match[:content]]
end

After this matches contains an array of arrays, in which an inner array represents a match
with the first element being the (optional) language, which may be nil, and the second element
being the content.

If you have more assumptions on the text, i could alter the regular expression.

This is the teststring i used:

str = %{
this is some random text.
```ruby
  def print
    puts "this is a code block with lang-argument"
  end
```

some other text follows here.
i want some ``` backticks here.

```
  def print
    puts "this is a code block without lang-argument"
  end
```
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a string I want to parse that looks a bit like github

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply