It’s my first foray into UTF-8 land. I’m an IIS Admin, so I’ve never

Question

0

Asked: May 18, 20262026-05-18T21:00:36+00:00 2026-05-18T21:00:36+00:00

It’s my first foray into UTF-8 land. I’m an IIS Admin, so I’ve never

0

It’s my first foray into UTF-8 land. I’m an IIS Admin, so I’ve never gotten to touch this professionally. I’m trying to help a missionary who’s translated the bible into an African language and now needs to do some global matching against large UTF-8 files. We’re specifically matching for accented characters.

We’re using older XP computers here, so I cobbled together a quick script in VBS knowing the language would be installed on their boxes already. After playing around for a few minutes, it appears VBS regexes handle UTF-8 by breaking each character up into 2 characters. To match a single â, my pattern is \u00c3\u00a2. Shouldn’t this be \u00e2?

Since I’m out of my depth I thought I’d seek a little guidance. It almost looks like UTF-8 simply requires this kind of double matching (and UTF-8 is required.) Can someone tell me into which box canyon I’m coding? 🙂

Downloading and installing Perl or Java is probably outside this project’s bandwidth and technical know-how. The tool should be built in. MS Office is installed, so VBA is an option if there’s some library that offers specific support. JavaScript is installed as well, though I don’t know what versions.

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T21:00:37+00:00

Editorial Team

2026-05-18T21:00:37+00:00Added an answer on May 18, 2026 at 9:00 pm

Unless you need to match two or more consecutive dots (e.g. you have .. or … in your regex but not .*) you can use any ASCII regex library on UTF-8 and expect it to work correctly.

The trick is to know what you are looking for. UTF-8 does that kind of byte breakup, so write your regex in whatever you are familiar with and convert it to UTF-8 and it will work unless it contains “..”.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

It’s my first foray into UTF-8 land. I’m an IIS Admin, so I’ve never

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply