I’m using the DocSplit gem for Ruby 1.9.3 to create Unicode UTF-8 versions of

Question

0

Asked: June 11, 20262026-06-11T20:53:47+00:00 2026-06-11T20:53:47+00:00

I’m using the DocSplit gem for Ruby 1.9.3 to create Unicode UTF-8 versions of

0

I’m using the DocSplit gem for Ruby 1.9.3 to create Unicode UTF-8 versions of word documents. To my surprise today while I was running a test on a particular piece of one of these documents I started running into character encoding inconstencies.

I have tried a number of different methods to resolve the issue which I will list below, but the best success I’ve had so far is to remove all non-ASCII characters. This is far from ideal, as I don’t think the character’s are really going to be all that problematic in the DB.

gsub(/[^[:ascii:]]/, "")

This is a sample of what my output looks like vs. what I’m expecting:

My CODES'S APOSTROPHE

My CODES’S APOSTROPHE

The second apostrophe should look squiggly. If you paste it into irb, you get the following: \U+FFE2

I tried Regexing specifically for this character and it appears to work in Rubular. As soon as I put it in my model however, I got a syntax error.

syntax error, unexpected $end, expecting ')'
raw_title = raw_title.gsub(/’/, "")

I also tried forcing the encoding to UTF-8, but everything is already in UTF-8 and this does not appear to have an effect. I tried forcing the output to US-ASCII, but I get a byte sequence error.

I also tried a few of the encoding options found in Ruby library. These basically did the same thing as the Regex.

This all comes down to that I’m trying to match output for testing purposes. Should I even be concerned about these special characters? Is there a better way to match these characters without blindly removing them?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T20:53:49+00:00

Editorial Team

2026-06-11T20:53:49+00:00Added an answer on June 11, 2026 at 8:53 pm

Try adding:

# encoding: utf-8

at the top of the failing rspec file. This should ensure things like:

raw_title = raw_title.gsub(/’/, "")

in your spec work.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using the DocSplit gem for Ruby 1.9.3 to create Unicode UTF-8 versions of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply