I have a lost of sentences generated from http://www.ywing.net/graphicspaper.php, a random computer graphics paper title generator, some of example sentences sorted are as following:
- Abstract Ambient Occlusion using Texture Mapping
- Abstract Ambient Texture Mapping
- Abstract Anisotropic Soft Shadows
- Abstract Approximation
- Abstract Approximation of Adaptive Soft Shadows using Culling
- Abstract Approximation of Ambient Occlusion using Hardware-accelerated Clustering
- Abstract Approximation of Distributed Surfaces using Estimation
- Abstract Approximation of Geometry for Texture-mapped Ambient Occlusion
- Abstract Approximation of Mipmaps for Opacity
- Abstract Approximation of Occlusion Fields for Subsurface Scattering
- Abstract Approximation of Soft Shadows using Reflective Texturing
- Abstract Arbitrary Rendering
- Abstract Attenuation and Displacement Mapping of Geometry
- Abstract Attenuation of Ambient Occlusion using View-dependent Texture Mapping
- Abstract Attenuation of Light Fields for Mipmaps
- Abstract Attenuation of Non-linear Ambient Occlusion
- Abstract Attenuation of Pre-computed Mipmaps using Re-meshing
– …
I would like to try reverse engineering the grammar behind and learn how to do it in some sort of ways, like in common lisp way or NLTK way. Any ideas about that?
— Drake
This seems to be an interesting problem. How ever, I was under the impression that it is not easy to guess a generator from it’s generated sequence of bits. What you can get is a model that may be or may not be a close approximation of the original generator. The approximation will be closer when a large number of sequences generated is processed.
A simple technique would be to create a parse tree and create a vocabulary in each portion of the tree.
Some thing like this:
xxxx -> list of vocabularies
yyyy -> list of vocabularies