I’m trying to parse a PDF to XML in c# and i want to

Question

0

Asked: May 23, 20262026-05-23T23:41:20+00:00 2026-05-23T23:41:20+00:00

I’m trying to parse a PDF to XML in c# and i want to

0

I’m trying to parse a PDF to XML in c# and i want to extract headings like: I. INTRODUCTION, II. PAGE LAYOUT which are categorized by roman numerals from my file. I would like to write a regex to match strings like this I tried a couple of things but doesn’t work, can anyone help?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T23:41:22+00:00

This should do what you need:

[IVXLCDM]+. [A-Z ]+

As stated here:

\. will match a period since the period character is a special character (meaning match any character) in regular expression syntax.

On the other hand, if you want to make sure that the string contains only Roman numerals and a heading name, you might want to use this:

^[IVXLCDM]+\. [A-Z ]+$

The ^ and $ are called anchors. The ^ instructs the regex engine to start matching from the very beginning of the string while the $ instructs the regex engine to stop matching at the very end of the string.
The complete list of Roman Numerals can be obtained from Wikipedia

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to parse a PDF to XML in c# and i want to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply