Many databases I’ve encountered (Like SQL Server) use a single file to store the entire database. This seems to be a pretty common approach. What are the advantages to storing the entire database in a single file, as opposed to breaking up the data into more logical units, such as a single table per file.
Also, how does a database work internally. How does it handle concurrent writes to the same file by different threads. In most applications I’ve seen you can only have 1 open write handle on a file at a time. How do the various database engines handle the concurrent writes.
A single non-fragmented large file can be treated by the server application much like a raw disk is treated by the operating system: a random-seekable block of bytes. The database server could, if it chose to, implement an entire file system on top of that block of bytes, if there was a benefit to implementing tables as separate files.
Concurrent writes to different sections of the same file are not a problem. The database uses locking strategies to make sure that multiple threads aren’t trying to access the same section of the file, and this is one of the main reasons that database transactions exist: to isolate the visible effects of one transaction from another.
For example, a database server might keep track of which rows in which tables have been accessed by which in-flight transactions; when a transaction retires, the rows which it had touched are released so that they can be freely accessed by other transactions. In this scenario, other transactions might simply block – i.e. wait – when they try to access rows that are currently being part of another transaction. If the other transaction doesn’t complete within a reasonable (configurable) time, then the waiting transaction might be aborted. Often the reason for this is a deadlock. The application using the database can then choose, if it wants, to retry the transaction.
This locking could be implemented using semaphores or other synchronization mechanisms, depending on the performance tradeoffs.