I’m working on a project that involves taking some source code and boiling it

Question

0

Asked: June 5, 20262026-06-05T14:24:05+00:00 2026-06-05T14:24:05+00:00

I’m working on a project that involves taking some source code and boiling it

0

I’m working on a project that involves taking some source code and boiling it down to just the words that are displayed on the page. I can get it to remove all the html tags, and all of the stuff between script tags, but I can’t figure out how to remove all the characters that start with a backslash. A page will have \t, \n, and \x** where * seem to be any lowercase letter or number.

How would I write a code that would replace all these parts of the strings with spaces? I’m working in python.

For example, this is a string from a webpage:

\n\t\n\t\n\t\tApple - Wikipedia, the free encyclopedia\n\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\n\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\tLanguage:English\xd8\xa7\xd9\x84\xd8\xb9\xd8\xb1\xd8\xa8\xd9\x8a\xd8\xa9Aragon\xc3\xa9sAsturianuAz\xc9\x99rbaycanca\xe0\xa6\xac\xe0\xa6\xbe\xe0\xa6\x82\xe0\xa6\xb2\xe0\xa6\xbeB\xc3\xa2n-l\xc3\xa2m-g\xc3\xbaBasa Banyumasan\xd0\x91\xd0\xb5\xd0\xbb\xd0\xb0\xd1\x80\xd1\x83\xd1\x81\xd0\xba\xd0

would become:

Apple - Wikipedia, the free encyclopedia Language:English sAsturianuAz rbaycanca Basa Banyumasan

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T14:24:07+00:00

s = repr('''\n\t\n\t\n\t\tApple - Wikipedia, the free encyclopedia\n\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\n\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\n\t\t\t\n\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\n\t\t\tLanguage:English\xd8\xa7\xd9\x84\xd8\xb9\xd8\xb1\xd8\xa8\xd9\x8a\xd8\xa9Aragon\xc3\xa9sAsturianuAz\xc9\x99rbaycanca\xe0\xa6\xac\xe0\xa6\xbe\xe0\xa6\x82\xe0\xa6\xb2\xe0\xa6\xbeB\xc3\xa2n-l\xc3\xa2m-g\xc3\xbaBasa Banyumasan\xd0\x91\xd0\xb5\xd0\xbb\xd0\xb0\xd1\x80\xd1\x83\xd1\x81\xd0\xba\xd0''')
s =  re.sub(r'\\[tn]', '', s)
s =  re.sub(r'\\x..', '', s)
print s

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on a project that involves taking some source code and boiling it

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply