Which way of storing large data would be more more effective for fast searches and reporting usage ?
{ website: "google.com",
description: "google is a search engine",
vistits: [
{date: 1334565455, referrer: "http://bing.com"},
{date: 1334565455, referrer: "http://bing.com"},
{date: 1334565455, referrer: "http://bing.com"},
{date: 1134565455, referrer: "http://bing.com"},
{date: 1334542455, referrer: "http://bing.com"},
{date: 1334555455, referrer: "http://bing.com"},
{date: 1334575455, referrer: "http://bing.com"},
{date: 1324565455, referrer: "http://bing.com"},
{date: 1334565455, referrer: "http://bing.com"},
]}
or use traditional way where visits will be stored in separate table with site id as reference ?
It depends. If you have many visits from certain sites, those documents will grow really fast. Eventually, they will be too big to load.
On the other hand, if your reporting tool always needs to load all visits, splitting them into several documents reduces performance.
Try to balance the two goals. For example, if the array
visitsgrows too big, create another document and save its document id ascontinuedIn. That way, you can limit the size of each document but still keep much information together.You can also try to group visits by day (i.e. one document contains all visits on a certain day) if your reporting tool aggregates by day anyway. That way, documents could grow but they won’t grow forever – only for a single day.
Lastly, you could stop recording after N visits (say 100’000). What’s the point to know whether you had 100001 or 100015 visits?
Note: Depending on your location, recording IP addresses and referrer information over a longer period of time is only legal if you have the written permission of each visitor. And even when it’s legal in your country, some visitors are sensitive to sites which track them. Sure, they can’t do much about it. Except stopping visiting you, configuring their web browser to stop talking to your server or post negative comments in blogs and forums.