In the program I’m working on, I need to strip the tags around certain parts of a string, and then insert a comma after each character WITHIN the tag (not not after any other characters in the string). In case this doesn’t make sense, here’s an example of what needs to happen –
This is a string with a < a > tag < /a > (please ignore the spaces within the tag)
(needs to become)
This is a string with a t,a,g,.
Can anyone help me with this? I’ve managed to strip the tags using RegEx, but I can’t figure out how to insert the commas only after the characters contained within the tag. If someone could help that would be great.
@Dour High Arch I’ll elaborate a little bit. The code is for a text-to-speech app that won’t recognize SSML tags. When the user enters a message for the text to speech app, they have the option of enclosing a word in a < a > tag to make the speaker say the world as an acronym. Because the acronym SSML tag won’t work, I want to remove the < a > tag whenever present, and place commas after each character contained in the tag to fake it out (ex: < a > test< /a > becomes t,e,s,t,). All the non-tagged words in the string do not need commas after them, just those enclosed in tags (see my first example if need be).
Parsing XML is very problematic because you may have to deal with things like CDATA sections, nested elements, entities, surrogate characters, and on and on. I would use a state-based parser like ANTLR.
However, if you are just starting out with C# it is instructive to solve this using the built-in .Net string and array classes. No ANTLR, LINQ, or regular expressions needed:
Please be aware this does not deal with any of the issues I mentioned. But then, none of the other suggestions do either.