Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7074143
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T06:00:56+00:00 2026-05-28T06:00:56+00:00

I’ve got some code to convert a large (many gigabytes) XML file into another

  • 0

I’ve got some code to convert a large (many gigabytes) XML file into another format.

Among other things, I need to store one or two gigabytes of floats in a hash table (two floats for each entry), with an int as the value’s key.

Currently, I’m using NSMutableDictionary and a custom class containing the two floats:

// create the dictionary
NSMutableDictionary *points = [[NSMutableDictionary alloc] init];

// add an entry (the data is read from an XML file using libxml)
int pointId = 213453;
float x = 42.313554; 
float y = -21.135213; 

MyPoint *point = [[MyPoint alloc] initWithX:x Y:y];
[points setObject:point forKey:[NSNumber numberWithInt:pointId]];
[point release];

// retrieve an entry (this happens later on while parsing the same XML file)
int pointId = 213453;
float x;
float y;
MyPoint *point = [points objectForKey:[NSNumber numberWithInt:pointId]];
x = point.x;
y = point.y;

This data set is consuming about 800MB of RAM with the XML file I’m working with now, and it takes quite a long time to execute. I’d like to have better performance, but even more important I need to get the memory consumption down so I can process even larger XML files.

objc_msg_send is right up there in a profile of the code, as is - [NSNumber numberWithInt:], and I’m sure I can get the memory usage down by avoiding objects altogether, but I don’t know much about C programming (this project is certainly teaching me!).

How can I replace NSMuableDictionary, NSNumber MyPoint with an efficient C data structure? Without any third party library dependencies?

I’d also like to be able to write this data structure to files on the disk, so I can work with a dataset that doesn’t entirely fit into memory, but I can probably live without this capability.

(for those not familiar with Objective-C, the NSMutableDictionary class can only store Obj-C objects, and it the keys must also be objects. NSNumber and MyPoint are dumb container classes to allow NSMutableDictionary to work with float and int values.)

EDIT:

I’ve tried using CFMutableDictionary to store structs, as per apple’s sample code. When the dictionary is empty, it performs great. But as the dictionary grows it gets slower and slower. About 25% through parsing a file (~4 million items in the dictionary) it starts to chug, two orders of magnitude slower than earlier in the file.

NSMutableDictionary doesn’t have the same performance issue. Instruments shows a lot of activity applying hashes and comparing the keys of the dictionary (the intEqual() method below). Comparing an int is fast, so something is very wrong for it to be executing so often.

Here’s my code to create the dictionary:

typedef struct {
  float lat;
  float lon;
} AGPrimitiveCoord;

void agPrimitveCoordRelease(CFAllocatorRef allocator, const void *ptr) {
    CFAllocatorDeallocate(allocator, (AGPrimitiveCoord *)ptr);
}

Boolean agPrimitveCoordEqual(const void *ptr1, const void *ptr2) {
    AGPrimitiveCoord *p1 = (AGPrimitiveCoord *)ptr1;
    AGPrimitiveCoord *p2 = (AGPrimitiveCoord *)ptr2;

    return (fabsf(p1->lat - p2->lat) < 0.0000001 && fabsf(p1->lon - p2->lon) < 0.0000001);

}

Boolean intEqual(const void *ptr1, const void *ptr2) {
    return (int)ptr1 == (int)ptr2;
}

CFHashCode intHash(const void *ptr) {
  return (CFHashCode)((int)ptr);
}

// init storage dictionary
CFDictionaryKeyCallBacks intKeyCallBacks = {0, NULL, NULL, NULL, intEqual, intHash};
CFDictionaryValueCallBacks agPrimitveCoordValueCallBacks = {0, NULL /*agPrimitveCoordRetain*/, agPrimitveCoordRelease, NULL, agPrimitveCoordEqual};
temporaryNodeStore = CFDictionaryCreateMutable(NULL, 0, &intKeyCallBacks, &agPrimitveCoordValueCallBacks);

// add an item to the dictionary
- (void)parserRecordNode:(int)nodeId lat:(float)lat lon:(float)lon
{
  AGPrimitiveCoord *coordPtr = (AGPrimitiveCoord *)CFAllocatorAllocate(NULL, sizeof(AGPrimitiveCoord), 0);
  coordPtr->lat = lat;
  coordPtr->lon = lon;

  CFDictionarySetValue(temporaryNodeStore, (void *)nodeId, coordPtr);
}

EDIT 2:

The performance problem was due to the almost useless hashing implementation in Apple’s sample code. I got the performance way up by using this:

// hash algorithm from http://burtleburtle.net/bob/hash/integer.html
uint32_t a = abs((int)ptr);
a = (a+0x7ed55d16) + (a<<12);
a = (a^0xc761c23c) ^ (a>>19);
a = (a+0x165667b1) + (a<<5);
a = (a+0xd3a2646c) ^ (a<<9);
a = (a+0xfd7046c5) + (a<<3);
a = (a^0xb55a4f09) ^ (a>>16);
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T06:00:57+00:00Added an answer on May 28, 2026 at 6:00 am

    If you want NSMutableDictionary-like behavior but with malloc’d memory, you can drop down to CFDictionary (or in your case, CFMutableDictionary). It’s actually the underpinnings of NSMutableDictionary, but it allows some customization, namely you can tell it that you’re not storing objects. When you call CFDictionaryCreateMutable() you give it a struct that describes what sort of values you’re handing it (it contains pointers that tell it how to retain, release, describe, hash, and compare your values). So if you want to use a struct containing two floats, and you’re happy using malloc’d memory for each struct, you can malloc your struct, populate it, and hand that to the CFDictionary, and then you can write the callback functions such that they work with your particular struct. The only restriction on the keys and objects you can use CFDictionary with is they need to fit inside a void *.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have just tried to save a simple *.rtf file with some websites and
link Im having trouble converting the html entites into html characters, (&# 8217;) i
We are using XSLT to translate a RIXML file to XML. Our RIXML contains
I'm parsing an XML file, the creators of it stuck in a bunch social
I ran into a problem. Wrote the following code snippet: teksti = teksti.Trim() teksti
I want to count how many characters a certain string has in PHP, but
For some reason, after submitting a string like this Jack’s Spindle from a text
I've got a string that has curly quotes in it. I'd like to replace
this is what i have right now Drawing an RSS feed into the php,
I have a French site that I want to parse, but am running into

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.