I have a text like that:
The C language is%y% widely used today in application, operating
system, and embedded system development, and its influence is seen in
most modern programming languages. UNIX has also been influential,
establishing %y% concepts and principles that are now precepts of
computing.%p%
Text has some unnecessary indicators: %y% and %p%
I use regex for split words using this regex:
Pattern p = Pattern.compile("[a-zA-Z]+");
I could split all words but this regex brings “y” and “p” letters. How can i ignore these indicators?
You could use some pre-processing to remove all of the unneccesary characters before you do your main processing. Something like this should work: