I need to open a YAML file with aliases used inside it:
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: other
This obviously expands out to an equivalent YAML document of:
defaults:
foo: bar
zip: button
node:
foo: other
zip: button
Which YAML::load reads it as.
I need to set new keys in this YAML document and then write it back out to disk, preserving the original structure as much as possible.
I have looked at YAML::Store, but this completely destroys the aliases and anchors.
Is there anything available that could something along the lines of:
thing = Thing.load("config.yml")
thing[:node][:foo] = "yet another"
Saving the document back as:
defaults: &defaults
foo: bar
zip: button
node:
<<: *defaults
foo: yet another
?
I opted to use YAML for this due to the fact it handles this aliasing well, but writing YAML that contains aliases appears to be a bit of a bleak-looking playing field in reality.
The use of
<<to indicate an aliased mapping should be merged in to the current mapping isn’t part of the core Yaml spec, but it is part of the tag repository.The current Yaml library provided by Ruby – Psych – provides the
dumpandloadmethods which allow easy serialization and deserialization of Ruby objects and use the various implicit type conversion in the tag repository including<<to merge hashes. It also provides tools to do more low level Yaml processing if you need it. Unfortunately it doesn’t easily allow selectively disabling or enabling specific parts of the tag repository – it’s an all or nothing affair. In particular the handling of<<is pretty baked in to the handling of hashes.One way to achieve what you want is to provide your own subclass of Psych’s
ToRubyclass and override this method, so that it just treats mapping keys of<<as literals. This involves overriding a private method in Psych, so you need to be a little careful:You would then use it like this:
With the Yaml from your example,
datawould then look something likeNote the
<<as a literal key. Also the hash under thedata["defaults"]key is the same hash as the one under thedata["node"]["<<"]key, i.e. they have the sameobject_id. You can now manipulate the data as you want, and when you write it out as Yaml the anchors and aliases will still be in place, although the anchor names will have changed:produces (Psych uses the
object_idof the hash to ensure unique anchor names (the current version of Psych now uses sequential numbers rather thanobject_id)):If you want to have control over the anchor names, you can provide your own
Psych::Visitors::Emitter. Here’s a simple example based on your example and assuming there’s only the one anchor:When used with the modified
datahash from above:the output is:
(Update: another question asked how to do this with more than one anchor, where I came up with a possibly better way to keep anchor names when serializing.)