There seems to be quite a bit of folklore knowledge floating about in restricted circles about the pitfalls of hash-consing combined with marshalling-unmarshalling of data. I am looking for citable references to these tidbits.
For instance, someone once pointed me to library aterm and mentioned that the authors had clearly thought about this and that the representation on disk was bottom-up (children of a node come before the node itself in the data stream). This is indeed the right way to do things when you need to re-share each node (with a possible identical node already in memory). This re-sharing pass needs to be done bottom-up, so the unmarshalling itself might as well be, too, so that it’s possible to do everything in a single pass.
I am in the process of describing difficulties encountered in our own context, and the solutions we found. I would appreciate any citable reference to the kind of aforementioned folklore knowledge. Some people obviously have encountered the problems before (the aterm library is only one example). But I didn’t find anything in writing. Even the little piece of information I have about aterm is hear-say. I am not worried it’s not reliable (you can’t make this up), but “personal communication” and “look how it’s done in the source code” are considered poor form in citations.
I have enough references on hash-consing alone. I am only interested in references where it interferes with other aspects of programming, such as marshalling or distribution.
OK, this is not much more use, but Andrew Kennedy wrote a functional pearl called simply Pickling Combinators, which appears in the Journal of Functional Programming, (2004), 14:6:727-739. There is extensive discussion of structure sharing and how it is handled in pickles, but no direct discussion of how this problem might relate to hash-consing in the implementation of the language. But the article does discuss structure sharing in memory as well as in a pickle, so I hope it is better than nothing.
Martin Elsman had a follow-on paper in 2005 in Trends in Functional Programming; the title is Type-specialized serialization with sharing. The article deals primarily with hash-consing by the unpickler (deserializer), not with hash-consing in the impelementation, but again it may be worth something.
The JFP paper is proprietary, but there appears to be a preprint on Andrew’s web page.
Elsman’s paper appears to be available through Google Scholar at http://tinyurl.com/yd5tw2b.
(In a previous life, I worked on a project to create ASCII pickles that people could read and edit. I stupidly failed to publish it, but I have retained an interest.)