I have a String that in need to convert into a String[] of each word in the string. However I do not need any white space or any punctuation EXCEPT hyphens and Apostrophes that belong in the word.
Example Input:
Hello! This is a test and it's a short-er 1. - [ ] { } ___)
Example of the Array made from Input:
[ "Hello", "this", "is", "a", "test", "and", "it's", "a", "short-er", "1" ]
Currently this is the code I have tried
(Note: the 2nd gives an error later in the program when string.First() is called):
private string[] ConvertWordsFromFile(String NewFileText)
{
char[] delimiterChars = { ' ', ',', '.', ':', '/', '|', '<', '>', '/', '@', '#', '$', '%', '^', '&', '*', '"', '(', ')', ';' };
string[] words = NewFileText.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
return words;
}
or
private string[] ConvertWordsFromFile(String NewFileText)
{
return Regex.Split(NewFileText, @"\W+");
}
The second example crashes with the following code
private string GroupWordsByFirstLetter(List<String> words)
{
var groups =
from w in words
group w by w.First();
return FormatGroupsByAlphabet(groups);
}
specifically, when w.First() is called.
To remove unwanted characters from a String
LINQ Option 1
LINQ Option 2