Concrete use case: There is an abstraction for binary data, which is widely used to handle binary blobs of arbitrary size. Since the abstraction was created without though about things outside the VM, existing implementations rely on the garbage collector for their life cycle.
Now I want to add a new implementation that uses off-heap storage (e.g. in a temporary file). Since there is a lot of existing code that uses the abstraction, introducing additional methods for explicit life cycle management is impractical, I can’t rewrite every client use case using to ensure they manage the new life cycle requirements.
I can think of two solution approaches, but cant decide which one is better:
a.) Use of finalize() to manage the associated resource’s life cycle (e.g. temporary file is deleted in finalize. This seems very simple to implement.
b.) Use of a reference queue and java.lang.Reference (but which one, weak or phantom?) with some extra object that deletes the file when the reference is enqueued. This seems to be a bit more work to implement, I would need to create not only the new implementation, but separate out its cleanup data and ensure the cleanup object can’t be GC’d before the object that has been exposed to the user.
c.) Some other method I haven’t though of?
Which approach should I take (and why should I prefer it)? Implementation hints are also welcome.
Edit: Degree of reliaility required – for my purpose its perfectly fine if a temporary file is not cleaned up in case the VM terminated abruptly. The main concern is that while the VM runs, it could very well fill up the local disk (over the course of a few days) with temporary files (this has happened to me for real with apache TIKA, which created temporary files when extracting text from certain document types, zip files were the culprit I believe). I have a periodic cleanup scheduled on the machine, so if a file drops by cleanup it doesn’t mean the end of the world – as long as it doesn’t happen regularly in a short interval.
As far as I could determine finalize() works with the Oracale JRE. And if I interpret the javadocs correctly, References must work as documented (there is no way a only softly/weakly reachable reference object is not cleared before OutOfMemoryError is thrown). This would mean while the VM may decide not to reclaim a particular object for a long time, it has to do so latest when the heap gets full. In turn this means there can exist only a limited number of my file based blobs on the heap. The VM has to clean them up at some point, or it would definetly run out of memory. Or is there any loophole that allows the VM to run OOM without clearing references (assuming they aren’t stronly refered anymore)?
Edit2: As far as I see it at this point both finalize() and Reference should be reliable enough for my purposes, but I gather Reference may be the better solution since its interaction with the GC can’t revive dead objects and thus its performance impact should be less?
Edit3: Solution approaches which rely on VM termination or startup (shutdown hook or similar) are not of use to me, since typically the VM runs for extended periods of time (server environment).
Here’s a relevant item from Effective Java: Avoid finalizers
Contained within that item is a recommendation to do just what @delnan suggests in a comment: provide an explicit termination method. Plenty of examples provided as well:
InputStream.close(),Graphics.dispose(), etc. Understand that the cows may have already left the barn on that one…At any rate, here’s a sketch of how this might be accomplished with reference objects. First, an interface for binary data:
Next, a file-based implementation:
Then, a factory to create and track the file-based blobs:
Finally, a test that includes some artificial GC “pressure” to get things going:
Which should produce some output like: