If consecutive uppercase words are in a document “I AM ALL UPPERCASE” what I return is four seperate upper case words. What I would need is to return the whole uppercase “I AM ALL UPPERCASE”. How do i do this?
String ucParensRegEx = "\([A-Z]+\)";
if (we.getParagraphText() != null) {
String[] dataArray = we.getParagraphText();
for (int i = 0; i < dataArray.length; i++) {
String data = dataArray[i].toString();
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(data);
while (m.find()) {
if (!sequences.contains(data.substring(m.start(), m.end())) && !data.equals("ARABIC") && !data.equals("ALATEC") && !data.equals("HYPERLINK")) {
sequences.add(data.substring(m.start(), m.end()));
System.out.println(data.substring(m.start(), m.end()));
Acronym acc = new Acronym(data.substring(m.start(), m.end()), data, false);
accronymList.add(acc);
}
}
}
}
Try
"\\b([A-Z][A-Z ]+[A-Z])\\b"instead of the expression you have.This should match any sequence of A-Z, or spaces, as long as they are between an upper case letter and a word boundary on both sides – this should hopefully cover the full sequence of upper case words, unless you’ve got some requirements about allowing numbers in there.