Under Python:
ttsiod@elrond:~$ python
>>> import re
>>> a='This is a test'
>>> re.sub(r'(.*)', 'George', a)
'George'
Under Perl:
ttsiod@elrond:~$ perl
$a="This is a test";
$a=~s/(.*)/George/;
print $a;
(Ctrl-D)
George
Under C#:
using System;
using System.Collections.Generic;
using System.Text;
using System.Threading;
using System.Text.RegularExpressions;
namespace IsThisACsharpBug
{
class Program
{
static void Main(string[] args)
{
var matchPattern = "(.*)";
var replacePattern = "George";
var newValue = Regex.Replace("This is nice", matchPattern, replacePattern);
Console.WriteLine(newValue);
}
}
}
Unfortunately, C# prints:
$ csc regexp.cs
Microsoft (R) Visual C# 2008 Compiler version 3.5.30729.5420
for Microsoft (R) .NET Framework version 3.5
Copyright (C) Microsoft Corporation. All rights reserved.
$ ./regexp.exe
GeorgeGeorge
Is this a bug in the regular expression library of C# ? Why does it print “George” two times, when Perl and Python just print it once?
In your example the difference seems to be in the semantics of the ‘replace’ function rather than in the regular expression processing itself.
.net is doing a “global” replace, i.e. it is replacing all matches rather than just the first match.
Global Replace in Perl
(notice the small ‘g’ at the end of the =~s line)
which produces
Single Replace in .NET
which produces
since it stops after the first replacement.