I am trying to learn about Big Data and figured the only way to

Question

0

Asked: June 14, 20262026-06-14T21:52:33+00:00 2026-06-14T21:52:33+00:00

I am trying to learn about Big Data and figured the only way to

0

I am trying to learn about “Big Data” and figured the only way to start is to dive right in. Worth noting is that I am going to use a single machine that I have at home. For context, I have about 700 text files that total about 300gb’s of data. Each file contains JSON responses captured from Twitter’s firehouse over the last month.

I love R and ultimately what to use it to study my dataset, but I know that I need a way to “store” the data. I hear a lot about Hadoop and HDFS, but can’t get my head wrapped around it. Would I simply “copy” the text files to HDFS on my local machine and use the RHadoopto write Map/Reduce statements to create datasets?

Lastly, I have MongoDB up and running and was considering storing the data there but I am not sure that the I would capture analytical performance gains, although I know that there is an adaptor for Haddop.

My question: Having successfully captured the data, what is the best way to store this such that I can use R (and other tools) to analyze the data.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T21:52:34+00:00

If you do not want to do batch processing a lot and do real time queries on tweets, a non relational DB like MongoDB would suit your need very good. So for realtime queries, have a look into MongoDB’s Aggregation Framework.

So it comes down to: What you really want to do with the data? Find tweets around places and show avg follower count? Or long term Trend Analysis?

Here is an ruby/mongodb post how someone scraped 3million tweets: how-i-scraped-and-stored-over-3-million-tweets

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to learn about Big Data and figured the only way to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply