Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4536264
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 21, 20262026-05-21T14:30:44+00:00 2026-05-21T14:30:44+00:00

Is there anyway using Pentaho to parse a tables td’s from an html page?

  • 0

Is there anyway using Pentaho to parse a tables td’s from an html page?
Lets say I have this html content

<html>
  <body>
    <table>
      <tr>
        <td>info1</td>
        <td>info2</td>
      </tr>
      <tr>
        <td>info3</td>
        <td>info4</td>
      </tr>
    </table>
  </body>
</html>

I am using in Pentaho the "Get data from XML" with the following data:

Content::
Loop XPath: /html/body/table/tr
Fields::
Name: tableData
XPath: td


The data information I would like to have is

info1 info2 info3 info4

in any kind of way.
Any help would be truly appreciated!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-21T14:30:45+00:00Added an answer on May 21, 2026 at 2:30 pm

    I solved it by making reading every row of my file as rows. Then I added a Pentaho step “User Defined Java Class” and made it parse my table content using XSLT to a new XML file. Using that XML I was able to get the data needed to complete the task.
    Here is what I wrote in “User Defined Java Class”:

    
    import java.util.*;
    import java.io.FileOutputStream;
    
    import javax.xml.transform.Transformer;
    import javax.xml.transform.TransformerFactory;
    
    private int infilenameIndex;
    private int xsltfilenameIndex;
    private int outfilenameIndex;
    
    
    public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException {
      Object[] r=getRow();
      if (r==null) {
        setOutputDone();
        return false;
      }
    
    
      if (first == false) {
         infilenameIndex = getInputRowMeta().indexOfValue(getParameter("infilename"));
         if (infilenameIndex < 0) {
             throw new KettleException("Field not found in the input row, check parameter 'infilename'!");
         }
         xsltfilenameIndex = getInputRowMeta().indexOfValue(getParameter("xsltfilename"));
         if (xsltfilenameIndex < 0) {
             throw new KettleException("Field not found in the input row, check parameter 'xsltfilename'!");
         }
         outfilenameIndex = getInputRowMeta().indexOfValue(getParameter("outfilename"));
         if (outfilenameIndex < 0) {
             throw new KettleException("Field not found in the input row, check parameter 'outfilename'!");
         }
    
         first=false;
      }
    
      String infilename = get(Fields.In, "infilename").getString(r);
      String xsltfilename = get(Fields.In, "xsltfilename").getString(r);
      String outfilename = get(Fields.In, "outfilename").getString(r);
    
      Object[] outputRowData = RowDataUtil.resizeArray(r, data.outputRowMeta.size());
      int outputIndex = getInputRowMeta().size();
    
      transform(infilename, xsltfilename, outfilename);
    
    
      putRow(data.outputRowMeta, outputRowData);
    
      return true;
    }
    public void transform(String infilename, String xsltfilename, String outfilename) throws KettleException {
    
        javax.xml.transform.stream.StreamSource inss = null;
        javax.xml.transform.stream.StreamSource xsltss = null;
        javax.xml.transform.stream.StreamResult outss = null;
    
        logBasic("");
        logBasic("Transformerar " +  infilename + " med " + xsltfilename + " till " + outfilename );
        logBasic("");
    
        try {
           inss = new javax.xml.transform.stream.StreamSource(infilename);
        }     
        catch (Exception e) {
           logError("Infil saknas " +  infilename);
           throw new KettleException(e);
        }
    
        try {
           xsltss = new javax.xml.transform.stream.StreamSource(xsltfilename);
        }     
        catch (Exception e) {
           logError("Xsltfil saknas " +  xsltfilename);
           throw new KettleException(e);
        }
    
        try {
           outss = new javax.xml.transform.stream.StreamResult(outfilename);
        }     
        catch (Exception e) {
           logError("Outfil saknas " +  outfilename);
           throw new KettleException(e);
        }
    
        try {       
            TransformerFactory tFactory = TransformerFactory.newInstance();
    
            // Set the TransformerFactory to the SAXON implementation.
            //tFactory = new net.sf.saxon.TransformerFactoryImpl();
    
            Transformer transformer = tFactory.newTransformer(xsltss);
    
            // Do the transfromtation
            transformer.transform(inss, outss);
        }
        catch (Exception e) {
           throw new KettleException(e);
        }
        return;
    }
    
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Is there anyway of defining a pageID sort of thing to an html page,
I have a web application that is using SSL. Is there anyway to load
I'd like to do the same in C#. Is there anyway of using properties
Is there any way to clean up this type of loop using LINQ? List<Car>
Is there anyway using the ITask interface to communicate with a scheduled task? I
Is there anyway using Core Location to zoom in on the users current location,
Is there anyway using WMI/.Net to grab monitor information such as Manufacturer, Serial Number,
I'm using delphi 2010, is there anyway to know running threads count of the
I'm using the XDocument.Validate extension method to validate my instance. Is there anyway to
I am using Eclipse as a Python IDE. Is there anyway for me to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.