How do I turn this text:
• Ban Ki-moon calls for immediate ceasefire• Residents targeted in
al-Qusayr, witnesses tell HRWIsrael ignoring expanding violence by
settlers, EU reports9.18am: Footage from activists suggests that
opposition forces continue to resist government troops.This footage…
into this text:
Ban Ki-moon calls for immediate ceasefire. Residents targeted in
al-Qusayr, witnesses tell HRW. Israel ignoring expanding violence by
settlers, EU reports. 9.18am: Footage from activists suggests that
opposition forces continue to resist government troops. This
footage…
This needs to be fixed with javascript (multiple .replace commands are possible)
- “• ” has to be removed and replaced by a “. “, however the first “• ” should just be removed
- If there is no space after a dot “.”, a space must be added (.This footage)
- If there is no space before a time (9.18am), a space must be added
- If there is no space before a capital letter (HRWIsrael) that is
followed by non-capital letters, then a dot and space “. ” must be added in front
of that non-capital letter.
Breaking down into several
replacestatements (as listed below) is the way I would go about it (working fiddle).The
fixBulletsfunction will turn all bullets into HTML Entities and thefixBulletEntitiesfixes those. I did this to normalize bullets as I’m not sure if they are just bullet characters or HTML entities in your source string.The
fixTimesfunction changes “9.18am:” into ” 9:18am. ” (otherwise, thefixPeriodsfunction makes it look like ” 9. 18am” which I am sure you do not want.One major caveat regarding the
fixCapitalsEndSentencefunction… This will also convert strings like “WOrDS” into “WO. rDS” which may not be what you want.At the least, this should get you started…