There’s 12 million posts already and people seem to be using things as a chat. I don’t know if it’s more efficient to have a bunch of little tables than having the database scan for the last 10 messages in a database with so many entries. I know I’d have to benchmark but just asking if anyone has any observations or anecdotes if they’ve ever had similar situations.
edit add schema:
create table reply(
id int(11) unsigned not null auto_increment,
thread_id int(10) unsigned not null default 0,
ownerId int(9) unsigned not null default 0,
ownerName varchar(20),
profileId int(9) unsigned,
profileName varchar(50),
creationDate dateTime,
ip int unsigned,
pic varchar(255) default '',
reply text,
index(thread_id),
primary key(id)) TYPE=MyISAM;
I assume that “thread” here means thread in a pool of postings.
The way you are going to get long-term scalability here is to develop an architecture in which you can have multiple database instances, and avoid having queries that need to be performed across all instances.
Creating multiple tables on the same DB won’t really help in terms of scalability. (In fact, it might even reduce throughput … due to increasing the load on the DB’s caches.) But it sounds like in your application you could partition into “pools” of messages in different databases, provided that you can arrange that a reply to a message goes into the same pool as the message it replies to.
The problem that arises is that certain things will involve querying across data in all DB instances. In this case, it might be listing all of a user’s messages, or doing a keyword search. So you really have to look at the entire picture to figure out how best to achieve a partitioning. You need to analyze all of the queries, taking account of their relative frequencies. And at the end of the day, the solution to might involve denormalizing the schema so that the database can be partitioned.