I have a Subversion repository, and I would like to extract detailed information about the history of who edited what and when and how. I know I can run svn log --xml to produce a nice easy-to-use record of which paths were altered in each revision. But what I would also like to find out is the size of the edits to each file.
I know there are many ways to define “edit distance”, but I will be happy with anything simple like “number of lines that are different” for text files.
Presumably I can get all this by parsing the output of svnadmin dump, but then I’d need to spend time learning about the dump file format, which I’d rather avoid if I can.
The svn dump format, it turns out, is very easy to parse. If I generate it with
svnadmin dump --deltasthen the dump file contains deltas for each file modification, and I can reasonably take the size of the delta (in bytes) to be the edit distance.In case anyone comes looking here, here is a simple Python script which takes an svn dump file and prints out an XML file containing all the properties. The edit sizes are contained in
//path/Text-content-lengthentries.