EDIT: A working regex (take the second group):
(^|[ ,\t\n]+)([0-9\\.]+)($|[ ,\t\n]+)
Original post:
I’m new to Haskell, trying to use Text.Regex (from regex-compat) to extract float values from a string. I want my regex to match any series of numbers and periods that is buffered by at least one separator character to the left and the right. This is what I wrote:
regex = "[^ \t\n,]+([0-9\\.])+[$ \t\n,]+"
EDIT: I originally thought this worked properly in Scala, but I now believe I simply got lucky with my test strings. This does not work in Haskell. An example:
matchRegexAll (mkRegex regex) " 12.34 "
yields
Just (" ","12.34 ","",["4"])
when it seems to me it should yield
Just (""," 12.34 ","",["12.34"])
Another example:
matchRegexAll (mkRegex regex) "12.34"
yields
Nothing
when it I think it should yield
Just ("","12.34","",["12.34"])
I’m guessing the parser treats “^” and “$” differently that does the Scala parser, but that’s all I’ve got.
Inside a character class like
[^ \t\n,], normal regex metacharacters (such as^and$) lose their special meaning; they match themselves instead.1 Something like(^|[ \t\n,])should do what you want.I’m surprised your regex works in Scala; I’ve never seen a regex implementation that doesn’t behave in this manner.
1 Although as FlopCoder points out,
^at the start of a character class actually negates it.