Is there a better way to format text from Twitter to link the hyperlinks, username and hashtags? What I have is working but I know this could be done better. I am interested in alternative techniques. I am setting this up as a HTML Helper for ASP.NET MVC.
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Web;
using System.Web.Mvc;
namespace Acme.Mvc.Extensions
{
public static class MvcExtensions
{
const string ScreenNamePattern = @"@([A-Za-z0-9\-_&;]+)";
const string HashTagPattern = @"#([A-Za-z0-9\-_&;]+)";
const string HyperLinkPattern = @"(http://\S+)\s?";
public static string TweetText(this HtmlHelper helper, string text)
{
return FormatTweetText(text);
}
public static string FormatTweetText(string text)
{
string result = text;
if (result.Contains("http://"))
{
var links = new List<string>();
foreach (Match match in Regex.Matches(result, HyperLinkPattern))
{
var url = match.Groups[1].Value;
if (!links.Contains(url))
{
links.Add(url);
result = result.Replace(url, String.Format("<a href=\"{0}\">{0}</a>", url));
}
}
}
if (result.Contains("@"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, ScreenNamePattern))
{
var screenName = match.Groups[1].Value;
if (!names.Contains(screenName))
{
names.Add(screenName);
result = result.Replace("@" + screenName,
String.Format("<a href=\"http://twitter.com/{0}\">@{0}</a>", screenName));
}
}
}
if (result.Contains("#"))
{
var names = new List<string>();
foreach (Match match in Regex.Matches(result, HashTagPattern))
{
var hashTag = match.Groups[1].Value;
if (!names.Contains(hashTag))
{
names.Add(hashTag);
result = result.Replace("#" + hashTag,
String.Format("<a href=\"http://twitter.com/search?q={0}\">#{1}</a>",
HttpUtility.UrlEncode("#" + hashTag), hashTag));
}
}
}
return result;
}
}
}
That is remarkably similar to the code I wrote that displays my Twitter status on my blog. The only further things I do that I do are
1) looking up
@nameand replacing it with<a href="http://twitter.com/name">Real Name</a>;2) multiple
@name‘s in a row get commas, if they don’t have them;3) Tweets that start with
@name(s)are formatted “To @name:”.I don’t see any reason this can’t be an effective way to parse a tweet – they are a very consistent format (good for regex) and in most situations the speed (milliseconds) is more than acceptable.
Edit:
Here is the code for my Tweet parser. It’s a bit too long to put in a Stack Overflow answer. It takes a tweet like:
And turns it into:
It also wraps all that markup in a little JavaScript:
This is so the tweet fetcher can run asynchronously as a JS and if Twitter is down or slow it won’t affect my site’s page load time.