I am doing a research based on text processing and mining. The principle is simple, we collect all post within specific date, for example, “2011Jan01”. We do not care which client post that contents and we only focus on the time when he posted it. For example, at date “2011Jan01”, here are five clients posted some thoughts about products in our forum the we delete information about client and combine the contents of their post together.
However, we have a large forum, so we may have thousands people active to post long or short threads daily. if we combine them. It would be ten thousands or even hundred thousands line for one day.
We would like to use some database like MySQL to build a table to save and later to data mining it. Our first idea about the table is quite simple:
Table
Date combinedPostContents
2011Jan01 "blablalbla everything from clients, lot of contents"
is this simple reasonable? or should we use local text file to save the contents and name the text file by the date we collect them? which one is better?
Thanks lot in advance, Gurus!!:)
Data mining text information to get customer thoughts on products will be VERY difficult. You will definitely want to use a database and you really should be doing some sort of rating system for the products they are reviewing.