I am currently working on system that generated product recommendations like those on Amazon : “People who bought this also bought this..”
Current Scenario:
-
Extract the Google Analytics data of the client and insert it in database.
-
On the website of the client, on load of product page the API call is made to get the recommendations of the product being viewed.
-
When API receives the product ID as request it looks in the database and retrieves (using association rules) the recommended product IDs and sends them as response.
-
The list of these product Ids will be processed to get the product details(image,price..) at the client end and displayed on website.
-
Currently I am using PHP and MYSQL with gapi package and REST api
storage on AMAZON EC2 .
My Question is:
Now, if I have to choose amongst the following, which will be the best choice to implement the above mentioned concept.
-
PHP with SimpleDB or BIGQuery.
-
R language with BIGQuery.
-
RHIPE-(R and hadoop ) with SimpleDB.
-
Apache Mahout.
Plese help!
This isn’t so easy to answer, because the constraints are fairly specialized.
The following considerations can be made, though:
As a result, this eliminates the 1st, 2nd, and 4th options.
What I don’t quite get is the need for a real-time server to utilize Hadoop and RHIPE. That should be done in your batch processing for developing the recommendation models, not in real-time. I suppose you could use RHIPE as a simple one-stop front end for firing off queries.
I’d recommend using RApache instead of RHIPE, because you can get your packages and models pre-loaded. I see no advantage to using Hadoop in the front end, but it would be a very natural back end system for the model fitting.
(Update 1) Other interface options include RServe (http://www.rforge.net/Rserve/) and possibly RStudio in server mode. There are R/PHP interfaces (see comments below), but I suspect it would be better to access R through HTTP or TCP/IP.
(Update 2) Addressing the whole process, the basic idea I see is that you could query the data from PHP and pass to R or, if you wish to query from within R, look at the link in the comments (to the OmegaHat tools) or post a new question about R & SimpleDB – I’m sure someone else on SO would be able to give better insight on this particular connection. RApache would let you instantiate many R processes already prepared with packages loaded and data in RAM; thus you would only need to pass whatever data needs to be used for prediction. If your new data is a small vector then RApache should be fine, and it seems this is correct for the data being processed in real-time.