Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 806867
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T00:18:45+00:00 2026-05-15T00:18:45+00:00

I’m really tempted to drop RegexKit (or my own libpcre wrapper) into my project

  • 0

I’m really tempted to drop RegexKit (or my own libpcre wrapper) into my project in order to do this, but before I do that I want to know how Cocoa developers manage to do half of this basic stuff without really convoluted code or without linking with RegexKit or another regular expression library.

I find it gobsmacking that Cocoa does not include any regular expression matching features. I’ve so accustomed to using regular expressions for all kinds of things that I’m lost without them. I can do what I need without them, but the code would be rather convoluted. So, Cocoa devs, I ask you, what’s the “Cocoa way” to do this…

The problem is an everyday problem in programming as far as I’m concerned. Cocoa must have ways of doing this with the built-in features. Note that the position of the elements I want to match changes, and sometimes “quotes” are present. Whitespace is variable.

Take the following strings:

Content-Type: application/xml; charset=utf-8

Content-Type: text/html; charset="iso-8859-1"

Content-Type: text/plain;
 charset=us-ascii

Content-Type: text/plain; name="example.txt"; charset=utf-8

From all of these strings, how would you go about determining the mime type (e.g. text/plain) and the charset (e.g. utf-8) using just the built-in Cocoa classes?

I’d end up performing a series of -rangeOfString: and substring calls, with conditional checks to deal with the optional quotes etc. Is there a way to do this with NSScanner? The NSScanner class seems to have a pretty naive API to me.

Something like C’s sscanf() that works for NSString objects would be an ideal fit. Most of my string parsing needs are simple such as this example so maybe regular expressions, while I’m accustomed to them, are overkill?

EDIT | The code is a bit long winded but it turns out NSScanner is actually quite easy to work with. It basically walks along your string doing as you tell it. The most annoying part of creating the NSCharacterSet instances it needs.

- (void)testNSScannerUseCase {
  NSString *testString = @"Content-type: application/xml; name=\"test\";\n charset=\"utf-8\"";

  unsigned int a = 'a', zero = '0';

  // There's probably a quicker way than to make these character sets this way
  NSMutableCharacterSet *alphaNumSet = [NSMutableCharacterSet characterSetWithRange:NSMakeRange(a, 26)];
  [alphaNumSet addCharactersInRange:NSMakeRange(zero, 10)];

  NSMutableCharacterSet *mimeTypeSet = [NSMutableCharacterSet characterSetWithCharactersInString:@"/-"];
  [mimeTypeSet formUnionWithCharacterSet:alphaNumSet];

  NSMutableCharacterSet *charsetSet = [NSMutableCharacterSet characterSetWithCharactersInString:@"-"];
  [charsetSet formUnionWithCharacterSet:alphaNumSet];

  // Initialize a case-insensitive scanner
  NSScanner *scanner = [NSScanner scannerWithString:testString];
  [scanner setCaseSensitive:NO];

  // Prepare to capture mime-type
  NSString *mimeType = nil;

  // Skip past the Content-Type: section
  if ([scanner scanUpToString:@":" intoString:NULL] && [scanner scanString:@":" intoString:NULL]) {
    [scanner scanCharactersFromSet:mimeTypeSet intoString:&mimeType];
  }

  GHAssertEqualStrings(@"application/xml", mimeType, @"Mime-type should be application/xml");

  // Prepare to look for the charset attribute
  NSString *charset = nil;

  // Ignore quotes as well as whitespace
  [scanner setCharactersToBeSkipped:[NSCharacterSet characterSetWithCharactersInString:@"\r\n\t \""]];

  // Skip past the charset attribute declaration
  if ([scanner scanUpToString:@"charset=" intoString:NULL]
    && [scanner scanString:@"charset=" intoString:NULL]) {

    [scanner scanCharactersFromSet:charsetSet intoString:&charset];
  }

  GHAssertEqualStrings(@"utf-8", charset, @"Charset should be utf-8");
}

This could be made a little smarter by using a while loop reading up to “;” then checking to see if it’s the attribute I’m scanning for.

I dare say it benchmarks faster than using a regex and that my rather long code can be refactored down to something much smaller.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T00:18:46+00:00Added an answer on May 15, 2026 at 12:18 am

    If these are HTTP Content-Type headers, technically, the second one is illegal according to my reading of RFC2616. You don’t quote character set names. Having said that, you can’t control your input and if you are getting them, you need to deal with them.

    Anyway, assuming we are talking about HTTP headers, I’d be tempted to write a proper parser even if I did have a regex library to hand. Assuming you want to be a bit lazy, without a regex library or a parser, you need to do something like this:

    • Strip “Content-Length:”.
    • Use -componentsSeparatedByString: to split at semicolons.

    The mime type is first part trimmed of leading and trailing white space.

    Now comes the tricky part. Iterate through each of the remaining components.

    • for the part you are on, make sure the semicolon you split on was not embedded in a string. The easiest way to do this is to count the number of unescaped double quote characters and make sure zero or two. If yuou did split on a quoted semicolon, join the next component back on and repeat
    • split at the = sign
    • if the first part is charset (case insensitive) you have found the found the one you are looking for. The second part is the actual character set – strip white spaces and enclosing double quotes.

    The above is quite complex and there are probably edge cases it fails on, but then any regular expression you create to do the same will also be complex, have edge case failures, be unreadable and impossible to debug with the Xcode debugger.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a French site that I want to parse, but am running into
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
this is what i have right now Drawing an RSS feed into the php,
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I want to count how many characters a certain string has in PHP, but
For some reason, after submitting a string like this Jack’s Spindle from a text
I've got a string that has curly quotes in it. I'd like to replace
Seemingly simple, but I cannot find anything relevant on the web. What is the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.