Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8953305
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T14:04:09+00:00 2026-06-15T14:04:09+00:00

I’m starting to learn some stuff about big data with a big focus on

  • 0

I’m starting to learn some stuff about big data with a big focus on predictive analysis and for that I have a case study I would like to implement:

I have a dataset of servers health information that is polled every 5sec. I want to show the data that is retrieved but more importantly: I want to run a machine learning model previously built and show the results (alert about servers going to crash).

The machine learning model will be built by a machine learning specialist so that’s completely out of scope. My job would be to integrate the machine learning model in a platform that runs the model and shows the results in a nice dashboard.

My problem is the “big picture” architecture of this system: I see that all the pieces already exist (cloudera+mahout) but I’m missing a simple integrated solution for all my needs and I don’t believe the state of art is doing some custom software…

So, can anyone shed some light on production systems like this (showing data with predictive analysis)? Reference architecture for this? Tutorials/documentation?


Notes:

  1. I’ve investigated some related technologies: cloudera/hadoop, pentaho, mahout and weka. I know that Pentaho for example is able to store big data and run ad-hoc Weka analysis on that data. Using cloudera and Impala a data specialist can also run ad-hoc queries and analyse the data but that’s not my goal. I want my system to run the ML model and show the results in a nice dashboard alongside the retrieved data. And I’m looking for a platform that already allows this usage instead of custom building.

  2. I’m focusing on Pentaho as it seems to have a nice integration of Machine Learning but every tutorial I read was more about “ad-hoc” ML analysis than real-time. Any tutorial on that subject will be welcomed.

  3. I don’t mind opensource or commercial solutions (with a trial)

  4. Depending of the specifics maybe this isn’t big data: more “traditional” solutions are also welcomed.

  5. Also real time here is a broad term: if the ML model has good performance running it every 5sec is good enough.

  6. ML model is static (isn’t real-time updating or changing its behavior)

  7. I’m not looking for a customized application for my example as my focus is on the big picture: big data with predictive analysis generic platforms.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T14:04:10+00:00Added an answer on June 15, 2026 at 2:04 pm

    (I’m an author of Mahout, and am commercializing a productization of some of the ML in Mahout, with a focus on both real-time and scale: Myrrix. I don’t know that it’s exactly what you are looking for, but seems to address some of the issues you pose here. It might be useful as another reference point.)

    You have highlighted the tension between real-time and large-scale. These aren’t the same thing. Hadoop, as a computation environment, scales well but can do nothing in real-time. Part of Mahout is built and Hadoop and so is also ML of that form. Weka, and the other parts of Mahout, are disposed to be more or less real-time, but then are challenged to scale.

    An ML system that does both well necessarily has two layers: scalable offline model-building, with real-time online serving and updates. This is how it should look, IMHO, for recommenders for example: http://myrrix.com/design/

    But, you don’t have any issue with model building, right? Someone’s going to build a static model? if so, that makes it much easier. Updating your model in real-time is useful, but complicating. If you don’t have to, you’re just generating predictions out of a static model, which is usually fast.

    I don’t think Pentaho is relevant if you are interested in ML, or, running something based on your own ML model.

    1 query every 5 seconds is not challenging — is this 1 query per 5 seconds per machine or something?

    My advice is to simply create a server that can answer queries against the model. Just reuse any old HTTP server container like Tomcat. It can load the latest model as it is published from some backing store like HDFS or a NoSQL DB. You can create N instances of the server effortlessly as they don’t seem to need to communicate.

    The only custom code there is whatever you need to wrap your ML model. This is quite a simple problem if you truly don’t need to build your own models or update them dynamically. If you do — harder question but still possible to architect for.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have just tried to save a simple *.rtf file with some websites and
I have a small JavaScript validation script that validates inputs based on Regex. I
I have a French site that I want to parse, but am running into
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
I have an array which has BIG numbers and small numbers in it. I
I don't have much knowledge about the IPv6 protocol, so sorry if the question
I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
I have a .ini file as follows: [playlist] numberofentries=2 File1=http://87.230.82.17:80 Title1=(#1 - 365/1400) Example
link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.