I’m currently trying to grasp the whole cloud thing, and I have already read through a lot of similar questions here on Stackoverflow.
What I am trying to build is going to be something like an high I/O storage service. It’s going to retrieve A LOT (we are talking 50 – 100 Mbit/sec constantly) of data through FTP and afterwards going to be running some post-processing on some of the data received.
The application is currently being written in C# for deployment on Windows Azure VPS. I’m doing my own simple FTP server for maximum level of control and security (like my own authentication process). This is not a problem as I’ve become quite skilled with socket servers and high performance .NET applications.
HOWEVER always running as a single instance. It’s always been about squeezing out more performance of a single Windows Service / Console Application running on some VPS server.
I must face the facts this time though. No matter how big that VM is going to get, the data can quickly overwhelm the servers I/O capacity if the data is going to increase (it’s generated by customers so more customers = more data!).
So how would you go about doing load balancing in the cloud? I have read about “cloud services” with “cloud workers” and so on, but I think that it just get’s so ** complex and the pricing just seems so blurred when I’m going to be using storage from one service, database from another and the work from a third type of service, while factoring in bandwidth and other stuff. I would really like to just keep it simple in an environment that I know and that I’m confident in working with. So VPS it is.
But how should I do the load balancing ? It’s my first time ever and I know it’s quite an ambitious project, but I really just want to learn!
To sum it up: Load balancing a custom FTP application written in C# running on Windows Azure VPS. Every instance of the application/service should have access to the same storage and database. No inter-instance communication is needed.
So throw everything you got at me and I will try to keep up. 🙂
Roles
You seem somewhat confused about what Roles are, so let me do a quick aside on that. A Role is basically a template for a VM, it defines the code and configuration. It’s a little bit like what a Puppet or Chef script is. There are a few types of Roles.
Load Balancing
In simple cases (eg, web serving) Azure will automatically load balance incoming requests between your machines. It should work just as well for any service that involves requests coming in from the internet. I think this may be more complicated with FTP PASV mode, but I’m not an expert on that topic. There is an open source project for FTP on Azure, as well as a blog post on the topic. Perhaps their approach can be applied to your custom FTP software.
From what you’ve said, I think you could use a Worker Role for your FTP servers. In your WorkerRole.cs file you would just start up your FTP code, and away you go. You could also spawn threads or processes to have each VM do double-duty as an image processor. You could do all that in a VM Role if you wanted, it’s just a question of which involves more work for you.
Storage
This is a textbook use case for Blob Storage. The uploaded files should definitely go there. It sounds like the different billing model is confusing you a bit, but Blob storage is pretty cheap. You can take a look on the Azure Pricing Calculator.