Is there anyone using git in such a fashion?
I would like to distribute some multimedia content from a server to some Android remote devices. I would like them sending back a log file with device usage statistics (provided by an android app I will write).
The server could be anything but I would prefer a linux box.
I thought that since git handle and sych only differences between files, It would be a nice tool for this purpose and I would have content revision history as a bonus.
I need some piece of advice on how the repositories architecture could be organized: does It have to be a star topology or something different?
The remote end of the sistem don’t need any interactivity, in other words the remote git repository could pull and push whatever It needs to, autonomously and automatically.
UPDATE: I’ve found here on SO the author of git internals (I’m downloading It right now), Scott Chacon talking about the architecture I would like to implement.
UPDATE 2: OK I read the chapter about “Non-SCM uses of Git” and here is what the author says about a Peer to Peer CDN:
You have to get new content […]
consist of any combination of xml
files, images, animations, text and
sound. You need to build a content
distribution framework that will
easily and efficiently transfer all
the necessary content to the machines
on your network. You need to
constantly determine what content each
machine has and what it needs to have
and transfer the difference as
efficiently as possible.[…]
It turns out that Git is an
excellent solution to this problem.
I don’t find anything about mentioning little portions of the book inside it, so I hope that I’m not violating any copyright. In any case I will delete It if someone complain.
So in a previous job, we used Git for exactly this and the reason was that our media assets were not often changing, so no matter what we used it was likely we would have to send the whole file anyways – thus, the issues with binary deltifying, though also an issue with other content distribution tools, was not important.
The main advantage to rsync (and presumably unison, though I’ve never used it) is that you can build the content trees in the index and store the trees in Git under a branch per client rather than having to have everything on disk to run rsync on. If you have several variations on content, it’s pretty cool to be able to record unique trees of content needed by each client – of which you could have thousands of combinations – and have a simple pull command fetch only what’s needed and update it on the client. That was the reason we choose Git instead of rsync to do that. If every client needs exactly the same set of data, perhaps rsync would be easier, however the other nice thing about Git is that you get a history of the content on each client – when and how it changed for every single client.
We also used it to record log files – since they are generally pretty uniform and text based, they delta excellently and transfer very efficiently – we were very happy with using that to record and transfer back upstream our log data.