I am working on an application that takes input from a YAML file, parses them into objects, and let’s them do their thing. The only problem I’m having now, is that the YAML parser seems to ignore the objects “initialize” method. I was counting on the constructor to fill in any instance variables the YAML file was lacking with defaults, as well as store some things in class variables. Here is an example:
class Test
@@counter = 0
def initialize(a,b)
@a = a
@b = b
@a = 29 if @b == 3
@@counter += 1
end
def self.how_many
p @@counter
end
attr_accessor :a,:b
end
require 'YAML'
a = Test.new(2,3)
s = a.to_yaml
puts s
b = YAML::load(s)
puts b.a
puts b.b
Test.how_many
puts ""
c = Test.new(4,4)
c.b = 3
t = c.to_yaml
puts t
d = YAML::load(t)
puts d.a
puts d.b
Test.how_many
I would have expected the above to output:
--- !ruby/object:Test
a: 29
b: 3
29
3
2
--- !ruby/object:Test
a: 4
b: 3
29
3
4
Instead I got:
--- !ruby/object:Test
a: 29
b: 3
29
3
1
--- !ruby/object:Test
a: 4
b: 3
4
3
2
I don’t understand how it makes these objects without using their defined initialize method. I’m also wondering if there is anyway to force the parser to use the initialize method.
Deserializing an object from Yaml doesn’t use the
initializemethod because in general there is no correspondance between the object’s instance variables (which is what the default Yaml serialization stores) and the parameters toinitialize.As an example, consider an object with an
initializethat looks like this (with no other instance variables):Now when an instance of this is deserialized, the Yaml processor has a value for
@a_variable, but theinitializemethod requires two parameters, so it can’t call it. Even if the number of instance variables matches the number of parameters toinitializeit is not necessarily the case that they correspond, and even if they did the processor doesn’t know the order they shoud be passed toinitialize.The default process for serializing and deserializing a Ruby object to Yaml is to write out all instance variables (with their names) during serialization, then when deserializing allocate a new instance of the class and simply set the same instance variables on this new instance.
Of course sometimes you need more control of this process. If you are using the Psych Yaml processor (which is the default in Ruby 1.9.3) then you should implement the
encode_with(for serialisation) or orinit_with(for deserialization) methods as appropriate.For serialization, Psych will call the
encode_withmethod of an object if it is present, passing acoderobject. This object allows you to specify how the object should be represented in Yaml – normally you just treat it like a hash.For deserialization, Psych will call the
init_withmethod if it is present on your object instead of using the default procedure described above, again passing acoderobject. This time thecoderwill contain the information about the objects representation in Yaml.Note you don’t need to provide both methods, you can just provide either one if you want. If you do provide both, the
coderobject you get passed ininit_withwill essentially be the same as the one passed toencode_withafter that method has run.As an example, consider an object that has some instance variables that are calculated from others (perhaps as an optimisation to avoid a large calculation), but shouldn’t be serialized to the Yaml.
When you dump an instance of this class to Yaml, it will look something like this, without the
calculatedvalue:When you load this Yaml back into Ruby, the created object will have the
@calculatedinstance variable set.If you wanted you could call
initializefrom withininit_with, but I think it would be better to keep the a clear separation between initializing a new instance of the class, and deserializing an existing instance from Yaml. I would recommend extracting the common logic into methods that can be called from both instead,