I am developing a issue logger for my project and am running into an issue when analyzing the logged data. The problem being that this table grows very fast and that the filters used to search on data in the table can vary in almost every way, seeing as we’re not always interested in the same fields. So indexes aren’t really an option.
The table is currently on a MySQL database, with the following structure:
CREATE TABLE `log_issues` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`id_user` int(11) DEFAULT NULL,
`type` varchar(50) NOT NULL,
`title` varchar(100) NOT NULL DEFAULT '',
`message` mediumtext NOT NULL,
`debug` mediumtext,
`duration` float DEFAULT NULL,
`date` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `date` (`date`,`title`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Now my question is, how can I run queries on this table when it has millions of entries without having to wait forever for a result? For example just filtering on the id of a user takes forever. I know I can place an index on the id_user portion, but I might want to combine it with other fields, or due to the way the query is generated by the tool that views these logs it might not utilize the indexes properly.
I think I might be better off using MongoDB or a different NoSQL database, but I don’t have any experience with them. Do document based database have an easier time filtering a large dataset without indexes or will I always be stuck with this problem no matter the database?
To summarize:
I have a table with a large amount of data, indexes can’t be used (least not if they need to be ordered) and I need to get results without waiting for over 10 seconds. What technologies can I use?
Any suggestions would be much appreciated.
First you should decide whether you want to remain in RDBMSes land or not. Nowadays it doesn’t make much sense for most scenarios, especially ones with complex data structure or a requirement to scale a lot.
You may want to check RavenDB. You can get a prototype working with it in hours, including the initial learning of concepts there.
Indexes are required anywhere, definitely in any NoSQL too. The real question is how easy it is to create and maintain them. With RavenDB you get hands-free indexing; indexes are created automatically as you go, based on the type of queries you make. It is recommended to pre-define them to reduce staleness, but yet they are the same indexes also when they’re created automatically.
I see in another answer you tackled the issue with Mongo; well, with Raven you don’t HAVE to define indexes yet they will be created for you.