I would like to remove all apostrophes from an input String of English prose, but retain the original meaning and capitalisation, ie
- isn’t –> is not
- I’m –> I am
- they’re –> they are
- shouldn’t –> should not
- can’t –> can not
- John’s –> Johns (good enough)
What’s the best/simplest way to achieve this in java?
There are some hard and fast rules for replacing contractions. Just have a method that performs those functions on your strings.
This will even preserve your possessives.
Of course, there are some contractions which are dependent upon context, such as
he'd. This could be “he could”, “he would”, “he had”, etc., and as such is beyond simple replacement algorithms and more in the realm of machine learning.Perhaps for the
'syou could check to see if the word containing it begins with a capital letter (indicating a name) and conditionally replace it with eithersoris. However, this wouldn’t catch normal contractions at the beginning of sentences, so…If you want a simple and perfect approach, I’m not sure you’ll get one. To do these more complicated things, you’ll need either a large dictionary file which you constantly reference or machine learning techniques.