I am attempting to parse a Wikipedia SQL dump with the Python regular expressions

Question

0

Asked: June 9, 20262026-06-09T07:09:55+00:00 2026-06-09T07:09:55+00:00

I am attempting to parse a Wikipedia SQL dump with the Python regular expressions

0

I am attempting to parse a Wikipedia SQL dump with the Python regular expressions library. The ultimate goal is to import this dump into PostgreSQL, but I know the apostrophes in strings need to be doubled, beforehand.

Every apostrophe in a string in this dump is preceded by a backwards slash, though, and I’d rather not remove the backwards slashes.

(42,’Thirty_Years\’_War’,33,5,0,0)

Using the command

re.match(".*?([\w]+?'[\w\s]+?).*?", line)

I cannot identify the apostrophe in the middle of ‘Thirty_Years\’_War’, when ‘line’ is parsed from a text file.

For comparison, these lines work fine when parsed (sans the last line).

The person’s car

The person’s car’s gasoline

Hodges’ Harbrace Handbook

‘Hodges’ Harbrace Handbook’

portspeople’,1475,29,0,0),(42,’Thirty_Years\’_War’,33,5,0,0)

Correct and expected output (sans the last line):

The person”s car

The person”s car”s gasoline

Hodges” Harbrace Handbook

(‘Hodges” Harbrace Handbook’)

portspeople’,1475,29,0,0),(42,’Thirty_Years\’_War’,33,5,0,0)

Using the command

re.match(".*?([\w\\]+?'[\w\s]+?).*?", line)

breaks it.

The person”s car

The person””s car””s gasoline

Hodges” Harbrace Handbook

(””””Hodges”””” Harbrace Handbook””””)

portspeople””””””””,1475,29,0,0),(42,””””””””Thirty_Years\””””””””_War””””””””,33,5,0,0)

Is it stuck in some sort sort of loop? What is the correct regex code to use?

I am not thinking about SQL injection attacks because this script is only going to be used for parsing dumps of Wikipedia articles (that don’t contain examples of SQL injection attacks).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T07:09:58+00:00

Editorial Team

2026-06-09T07:09:58+00:00Added an answer on June 9, 2026 at 7:09 am

If the dump consists of things like the string you provided, you could try something like this:

re.findall(r"[^,\(\)]+")

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am attempting to parse a Wikipedia SQL dump with the Python regular expressions

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply