I’m in a situation where I’m given a character string and need to determine if the language of the string is Spanish or English. I plan on parsing for stop words – Spanish (`de, es, si, y”) vs English (‘of’, ‘is’, ‘if’, ‘and’)? If there more Spanish occurrences than English occurrences, then, I conclude the page is Spanish.
Are there any Ruby snippets already available to do this? If not, what would be good method for string parsing or regex to do this?
If you have a string that contains a sentence (or a series of words, at least), you can use
string.split(' ')to split the string into an array of words. From there, you can use.eachto iterate through the list and process each word. For example: