Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6733697
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T10:48:00+00:00 2026-05-26T10:48:00+00:00

This is something I am really curious about and I do not really understand

  • 0

This is something I am really curious about and I do not really understand how is that possible.

So lets say I am the owner of Facebook (ahah) and I have million of people visiting my website every day, thousands and thousands of images, videos, logs etc..

How do I store all this data?

Do I have more databases in different servers around the world and then I connect to them from a single location?

Do I use an internal API system that requests info from other servers where the data is stored?

For example I know that Facebook has a lot of data centers around the world and hundreds of servers..

How do they connect to these servers? Are the profiles stored in different locations and when I connect to my profile, I will then be using that specific server? Or is there one main server that has the support of other hundreds of servers around the world?

Is there a way to use PHP in a way that I will connect to different servers and to different mySQL (???) databases to store and retrieve data whenever I want?

Sorry if this looks like a silly question, but since it could happen a day to work on a successful website, I really want to know what I will have to do, and what is the logic behind.

Thank you very much.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T10:48:01+00:00Added an answer on May 26, 2026 at 10:48 am

    I’ll try to answer your (big) question but not from Facebook point of view since their architecture is pretty much known.

    First thing you have to know is that you would have to distribute the workload of your web application. Question is how, so in order to determine what’s going to be slow, you have to divide your app in segments.

    First up is the HTTP server, or the one that accepts all the requests. By going to “www.your-facebook.com”, you’re contacting a service on an IP. Naturally, you would probably have more than one IP but let’s say you have a single entry point.

    Now what happens? You have an HTTP server software, let’s say Apache and it handles incoming connections. Since Apache creates a thread per connected user, it requires certain amount of memory for that operation. Eventually, it will run out of memory and then shit hits the fan, stuff stops working, your site is unavailable.
    Therefore, you have to somehow scale this part of your application that connects your PHP code / MySQL db to people who want to interact with it.

    Let’s assume you successfully scaled your Apache and you have a cluster of computers which can accept new computers in order to scale-out. You solved your first problem.

    Next part is the actual layer that does the work. Accepts input from the user and saves it somewhere (MySQL) and that’s the biggest problem you’ll have – why?
    Due to the database.

    Databases store their data on mediums such as hard drives. Hard drives, be it an SSD or mechanical one – are limited by their ability to write or retrieve data. If I’m not mistaken, RAM operates at levels of around 6GB/sec transfer rate. Not to mention that the seek time is also much much lower than HDD’s one is.

    Therefore, if you have an X amount of users asking for a piece of information and you can only deliver it at a certain rate – your app crashes, or it becomes unresponsive and the layer handling database queries becomes slow since the hardware cannot match the speed at which you need the data.

    What are the options here? There are many, I won’t mention all of them

    1. Split Reads and Writes. Set your database layer in such a way that you have dedicated machines that write the data and completely different ones that read it. You have to use replication and replication has its own quirks – it never works without breaking.

    2. Optimize handling of your data set by sharding your data. Great for read / write performance, screwed up when you need to query multiple shards and merge the data.

    3. Get better hardware, especially storage (such as FusionIO)

    4. Pay for better storage engine (such as TokuDB)

    5. Alleviate load on the database by using caching. The data that your users request probably doesn’t change so often that you have to query the db every single time (say you’re viewing someone’s profile, what’s the chance they’ll change it every second?). That’s why Facebook uses Memcached extensively – a system that stores small pieces of data in RAM, it’s easily scalable and what not. Most important, it’s damn quick!

    6. Use different solutions next to MySQL. MySQL (and some other databases) aren’t good for every type of data storage or retrieval. Someone mentioned NoSQL before. NoSQL solutions are quick, but still immature. They don’t do as much as relational databases do. They use methods of delaying disk write (they keep cached copy of data they need to write in RAM) so that they can achieve fast insert rates. That’s why it’s not unusual to lose data when using NoSQL.

    Topic about MySQL vs “insert database or whatever here” is broad, I don’t want to go into that but remember – every single one of data stores out there saves data on the hard drive eventually. The difference (physical of course) is how they optimize their flushing to the disk itself.

    I also didn’t mention various reports you can run by gathering the data (how many men between 19 and 21 have clicked an advert X between 01:15 and 13:37 CET and such) which is what Facebook is actually gathering (scary stuff!).

    Third up – the language gluing the data store (MySQL) and output (HTTP server). PHP.

    As you can see, most of the work here is already done by Apache and MySQL. Optimization on PHP level is small, even facebook got small results (they claim 50%, but that’s UP TO 50%). I tried HipHop extensively, it is not as fast as it claims to be. Naturally, Facebook guys mentioned that already, so it’s no wonder. The advantage they get is because they replaced Apache with their own server built in into HipHop. Some people claim “language X is better than language Y” and they’re right, but that’s not always the case. Each language has its own advantages and disadvantages.

    For example, PHP is widely-spread but it’s slow for certain operations (implementing a Trie with over 1 billion entries for example). It’s great for things like echo some HTML after parsing the output from the db. It’s quick to insert and retrieve data from the database, and that’s about 90% of the PHP usage – talk to the db, display the data, end.

    Therefore, no matter what language you use (say we used C++ instead of PHP), your bottleneck will be the data storage / retrieval layer.

    On the other hand, why is using C++ NOT handy? Because there are more people who know how to use PHP than ones who use C++. It’s also MUCH slower to develop web apps in C++. Sure, they will execute faster, but who will notice the difference between 1 millisecond and 1 microsecond?

    This post is more like an informative blog post, I know it’s not filled with resources to back up my claims but anyone who did any work with larger data sets or websites will know that the P.I.T.A. is always the data storage component. Some things that I said probably won’t fit with everyone, but in a NUTSHELL this is how you’d go about optimizing your site.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have encountered something weird (probably not, it's more likely that I don't really
hope to get some help here because this is something that really makes me
A really simple shell script question. I have a file with something like this:
This is a small doubt and something that shouldn't really come to one's mind.So
This is something I am curious about since I learnt how to invoke an
I am sure this is something really simple, but I am going bonkers trying
This is probably something really simple, however I am quite new to PHP, and
This is probably something really simple but I cant see what! Any images I
I know SQL well but I must be missing something really dumb here. This
This might be something very straight forward and I really think it should work

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.