Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 166779
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T12:09:28+00:00 2026-05-11T12:09:28+00:00

I’m writing a microformats parser in C# and am looking for some refactoring advice.

  • 0

I’m writing a microformats parser in C# and am looking for some refactoring advice. This is probably the first ‘real’ project I’ve attempted in C# for some time (I program almost exclusively in VB6 at my day job), so I have the feeling this question may become the first in a series 😉

Let me provide some background about what I have so far, so that my question will (hopefully) make sense.

Right now, I have a single class, MicroformatsParser, doing all the work. It has an overloaded constructor that lets you pass a System.Uri or a string containing a URI: upon construction, it downloads the HTML document at the given URI and loads it into an HtmlAgilityPack.HtmlDocument for easy manipulation by the class.

The basic API works like this (or will, once I finish the code…):

MicroformatsParser mp = new MicroformatsParser('http://microformats.org'); List<HCard> hcards = mp.GetAll<HCard>();  foreach(HCard hcard in hcards)  {     Console.WriteLine('Full Name: {0}', hcard.FullName);      foreach(string email in hcard.EmailAddresses)         Console.WriteLine('E-Mail Address: {0}', email); } 

The use of generics here is intentional. I got my inspiration from the way that the the Microformats library in Firefox 3 works (and the Ruby mofo gem). The idea here is that the parser does the heavy lifting (finding the actual microformat content in the HTML), and the microformat classes themselves (HCard in the above example) basically provide the schema that tells the parser how to handle the data it finds.

The code for the HCard class should make this clearer (note this is a not a complete implementation):

[ContainerName('vcard')] public class HCard {     [PropertyName('fn')]     public string FullName;      [PropertyName('email')]     public List<string> EmailAddresses;      [PropertyName('adr')]     public List<Address> Addresses;      public HCard()     {      } } 

The attributes here are used by the parser to determine how to populate an instance of the class with data from an HTML document. The parser does the following when you call GetAll<T>():

  • Checks that the type T has a ContainerName attribute (and it’s not blank)
  • Searches the HTML document for all nodes with a class attribute that matches the ContainerName. Call these the ‘container nodes’.
  • For each container node:
    • Uses reflection to create an object of type T.
    • Get the public fields (a MemberInfo[]) for type T via reflection
    • For each field’s MemberInfo
      • If the field has a PropertyName attribute
        • Get the value of the corresponding microformat property from the HTML
        • Inject the value found in the HTML into the field (i.e. set the value of the field on the object of type T created in the first step)
        • Add the object of type T to a List<T>
    • Return the List<T>, which now contains a bunch of microformat objects

I’m trying to figure out a better way to implement the step in bold. The problem is that the Type of a given field in the microformat class determines not only what node to look for in the HTML, but also how to interpret the data.

For example, going back to the HCard class I defined above, the 'email' property is bound to the EmailAddresses field, which is a List<string>. After the parser finds all the 'email' child nodes of the parent 'vcard' node in the HTML, it has to put them in a List<string>.

What’s more, if I want my HCard to be able to return phone number information, I would probably want to be able to declare a new field of type List<HCard.TelephoneNumber> (which would have its own ContainerName('tel') attribute) to hold that information, because there can be multiple 'tel' elements in the HTML, and the 'tel' format has its own sub-properties. But now the parser needs to know how to put the telephone data into a List<HCard.TelephoneNumber>.

The same problem applies to FloatS, DateTimeS, List<Float>S, List<Integer>S, etc.

The obvious answer is to have the parser switch on the type of field, and do the appropriate conversions for each case, but I want to avoid a giant switch statement. Note that I’m not planning to make the parser support every possible Type in existence, but I will want it to handle most scalar types, and the List<T> versions of them, along with the ability to recognize other microformat classes (so that a microformat class can be composed from other microformat classes).

Any advice on how best to handle this?

Since the parser has to handle primitive data types, I don’t think I can add polymorphism at the type level…

My first thought was to use method overloading, so I would have a series of a GetPropValue overloads like GetPropValue(HtmlNode node, ref string retrievedValue), GetPropValue(HtmlNode, ref List<Float> retrievedValue), etc. but I’m wondering if there is a better approach to this problem.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T12:09:29+00:00Added an answer on May 11, 2026 at 12:09 pm

    Mehrdad’s approach is basically the one I’d suggest to start with, but as the first step out of potentially more.

    You can use a simple IDictionary<Type,Delegate> (where each entry is actually from T to Func<ParseContext,T> – but that can’t be expressed with generics) for single types (strings, primitives etc) but then you’ll also want to check for lists, maps etc. You won’t be able to do this using the map, because you’d have to have an entry for each type of list (i.e. a separate entry for List<string>, List<int> etc). Generics make this quite tricky – if you’re happy to restrict yourself to just certain concrete types such as List<T> you’ll make it easier for yourself (but less flexible). For instance, detecting List<T> is straightforward:

    if (type.IsGenericType && type.GetGenericTypeDefinition() == typeof(List<>)) {     // Handle lists     // use type.GetGenericArguments() to work out the element type } 

    Detect whether a type implements IList<T> for some T (and then discovering T) can be a pain, especially as there could be multiple implementations, and the concrete type itself may or may not be generic. This effort could be worthwhile if you really need a very flexible library used by thousands of developers – but otherwise I’d keep it simple.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 104k
  • Answers 104k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer The switch/case syntax is identical between WinForms and a console… May 11, 2026 at 8:36 pm
  • Editorial Team
    Editorial Team added an answer One method that requires no programmatic setup, and is quite… May 11, 2026 at 8:36 pm
  • Editorial Team
    Editorial Team added an answer JDBC + JTable @ google: Hacking Swing: A JDBC Table… May 11, 2026 at 8:36 pm

Related Questions

I ran into a problem. Wrote the following code snippet: teksti = teksti.Trim() teksti
I am currently running into a problem where an element is coming back from
Seemingly simple, but I cannot find anything relevant on the web. What is the
Configuring TinyMCE to allow for tags, based on a customer requirement. My config is
Is it possible to replace javascript w/ HTML if JavaScript is not enabled on

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.