I need to extract the words and phrases within a text. For example, the text is:
Hello World, “Japan and China”, Americans, Asians, “Jews and Christians”, and semi-catholics, Jehovah’s witnesses
Using preg_split(), it should return the following:
- Hello
- World
- Japan and China
- Americans
- Asians
- Jews and Christians
- and
- semi-catholics
- Jehova’s
- witnesses
I need to know the RegEx for this to work (or is it possible?). Notice the rules, phrases are enclosed in quotes (“). Alphanumerics, single quotes (‘) and dashes (-) are considered part of the word (that’s why “Jehova’s” and “semi-catholics” are considered one word), the rest separated with spaces are considered as single words, while other symbols not mentioned are ignored
You can actually do it very simply with str_getcsv like this:
output: