I’m building a web application with Ruby on Rails which needs to be highly scalable. In this application, data is produced by a mobile client (approximately 20 bytes) every second. All of this data must be transferred to a server at some point, preferably as soon as possible.
To accomplish this task, I want the server to act as a RESTful service. The client could buffer locations (say every 5 to 30 seconds) and then shoot them off as a HTTP put request, where the server could then store them. I believe this model is simpler to implement, and better handles high volume traffic, as the clients could keep buffering data until they hear a response from the server.
My boss, on the other hand, wants to implement the server using socket programming. He believes socket programming will result in less data being transferred, which will increase the total efficiency of the system. I can’t disagree on this point, but I think given modern bandwidth the extra overhead with HTTP is worth it. Plus, I think trying to maintain thousands (or millions) of simultaneous connects with users will cause its own problems and greatly increase the complexity of the server.
Honestly, I don’t know the right approach to this problem, so I thought I’d post it here and get the opinions of much smarter people than myself. I’d appreciate it if any answers included the pros and cons of the proposed solution.
Thanks.
Update
We now have a few additional requirements flushed out. First, the mobile client cannot upload more than 5 GB of data per month. In this case, we’re talking one message a second for eight hours a day per month. Second is we want’s to combine messages as little as possible. This is to ensure if something happens to the mobile client (say a car crash), we lose as little data as possible.
Your boss appears to be optimizing prematurely, which is not really a good idea.
Instead of trying to fight an imaginary performance bogeyman before you’ve even started writing your code, you should examine your application’s requirements and design to them. Don’t let perceived problems drive your design.
If it comes to it, have your boss outline exactly how he’d marshal data across his socket connection and then do some quick calculations to see if you could match or beat them with HTTP. Will he use something like Google’s Protocol Buffers, or write his own marshaling protocol? If so, will it be self-describing? How about application “verbs” like what you’d get for free in HTTP? Will his connections be persistent? There’s a lot more to “sockets” than just opening a connection and spewing bytes down it.
You’ve also correctly noted that your boss seems to be favoring raw speed of sockets over everything else: scalability, maintainability, availability of development and testing tools, protocol sniffers, the helpful semantics of the HTTPS verbs, and so on. HTTP is well understood by load balancers and firewalls and the like. Your proprietary socket protocol will not be so lucky.
What I’d suggest is you look into all the options out there and evaluate them from a performance perspective through testing, prototyping and benchmarking. Then weigh those numbers against the difficulty of building and maintaining the application with that technology.