So, recently I had the unfortunate need to make a C extension for Ruby (because of performance). Since I was having problems with understanding VALUE (and still do), so I looked into the Ruby source and found: typedef unsigned long VALUE; (Link to Source, but you will notice that there are a few other ‘ways’ it’s done, but I think it’s essentially a long; correct me if I’m wrong). So, while investigating this further I found an interesting blog post, which says:
“…in some cases the VALUE object could BE the data instead of POINTING TO the data.”
What confuses me is that, when I attempt to pass a string to C from Ruby, and use RSTRING_PTR(); on the VALUE (passed to the C-function from Ruby), and try to ‘debug’ it with strlen(); it returns 4. Always 4.
example code:
VALUE test(VALUE inp) {
unsigned char* c = RSTRING_PTR(inp);
//return rb_str_new2(c); //this returns some random gibberish
return INT2FIX(strlen(c));
}
This example returns always 1 as the string length:
VALUE test(VALUE inp) {
unsigned char* c = (unsigned char*) inp;
//return rb_str_new2(c); // Always "\x03" in Ruby.
return INT2FIX(strlen(c));
}
Sometimes in ruby I see an Exception saying “Can’t convert Module to String” (or something along those lines, however I was messing with the code so much trying to figure this out that I am unable to reproduce the error now the error would happen when I tried StringValuePtr(); [I’m a bit unclear what this exactly does. Documentation says it changes the passed paramater to char*] on inp):
VALUE test(VALUE inp) {
StringValuePtr(inp);
return rb_str_new2((char*)inp); //Without the cast, I would get compiler warnings
}
So, the Ruby code in question is: MyMod::test("blahblablah")
EDIT: Fixed a few typos and updated the post a little.
The questions
- What exactly does
VALUE imphold? A pointer to the object/value?
The value itself? - If it holds the value itself: when does it do that, and is there a way to check for it?
- How do I actually access the value (since I seem to accessing almost everything but
the value)?
P.S: My understanding of C isn’t really the best, but it’s a work in progress; also, read the comments in the code snippets for some additional description (if it helps).
Thanks!
Ruby Strings vs. C strings
Let’s start with strings first. First of all, before trying to retrieve a string in C, it is good habit to call
StringValue(obj)on yourVALUEfirst. This ensures that you will really deal with a Ruby string in the end because if it is not already a string, then it will turn it into one by coercing it with a call to that object’sto_strmethod. So this makes things safer and prevents the occasional segfault you might get otherwise.The next thing to watch out for is that Ruby strings are not
\0-terminated as your C code would expect them to make things likestrlenetc. work as expected. Ruby’s strings carry their length information with them instead – that’s why in addition toRSTRING_PTR(str)there is also theRSTRING_LEN(str)macro to determine the actual length.So what
StringValuePtrnow does is returning the non-zero-terminatedchar *to you – this is great for buffers where you have a separate length, but not what you want for e.g.strlen. UseStringValueCStrinstead, it will modify the string to be zero-terminated so that it is safe for usage with functions in C that expect it to be zero-terminated. But, try to avoid this wherever possible, because this modification is much less performant than retrieving the non-zero-terminated string that does not have to be modified at all. It’s surprising if you keep an eye on this how rarely you will actually need “real” C strings.self as an implicit VALUE argument
Another reason why your current code doesn’t work as expected is that every C function to be called by Ruby gets passed
selfas an implicitVALUE.No arguments in Ruby ( e.g. obj.doit ) translates to
Fixed amount of arguments (>0, e.g. obj.doit(a, b)) translates to
Var args in Ruby ( e.g. obj.doit(a, b=nil)) translates to
in Ruby. So what you were working on in your example is not the string passed to you by Ruby but actually the current value of
self, that is the object that was the receiver when you called that function. A correct definition for your example would beI made it
staticto point out another rule that you should follow in your C extensions. Make your C functions only public if you intend to share them among several source files. Since that’s almost never the case for function that you attach to a Ruby class, you should declare them asstaticby default and only make them public if there is a good reason to do so.What is VALUE and where does it come from?
Now to the harder part. If you dig down deeply into Ruby internals, then you will find the function rb_objnew in gc.c. Here you can see that any newly created Ruby object becomes a
VALUEby being cast as one from something called thefreelist. It’s defined as:You can imagine the
objspaceas a huge map that stores each and every object that is currently alive at a given point in time in your code. This is also where the garbage collector fulfills his duty and theheapstruct in particular is the place where new objects are born. The “freelist” of the heap is again declared as being anRVALUE *. This is the C-internal representation of the Ruby built-in types. AnRVALUEis actually defined as follows:That is, basically a union of core data types that Ruby knows about. Missing something? Yes, Fixnums, Symbols,
niland boolean values are not included there. It’s because these kinds of objects are directly represented using theunsigned longthat aVALUEboils down to in the end. I think the design decision there was (besides being a cool idea) that dereferencing a pointer might be slightly less performant than the bit shifts that are currently needed when transforming theVALUEto what it actually represents. Essentiallysays give me whatever freelist points to currently and treat is as
unsigned long. This is safe because freelist is a pointer to anRVALUE– and a pointer can also be safely interpreted asunsigned long. This implies that everyVALUEexcept those carrying Fixnums, symbols, nil or Booleans are essentially pointers to anRVALUE, the others are directly represented within theVALUE.Your last question, how can you check for what a
VALUEstands for? You can use theTYPE(x)macro to check whether aVALUE‘s type would be one of the “primitive” ones.