Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3360828
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T03:03:00+00:00 2026-05-18T03:03:00+00:00

Need to load data from a single file with a 100,000+ records into multiple

  • 0

Need to load data from a single file with a 100,000+ records into multiple tables on MySQL maintaining the relationships defined in the file/tables; meaning the relationships already match. The solution should work on the latest version of MySQL, and needs to use the InnoDB engine; MyISAM does not support foreign keys.

I am a completely new to using Pentaho Data Integration (aka Kettle) and any pointers would be appreciated.

I might add that it is a requirement that the foreign key constraints are NOT disabled. Since it’s my understanding that if there is something wrong with the database’s referential integrity, MySQL will not check for referential integrity when the foreign key constraints are turned back on. SOURCE: 5.1.4. Server System Variables — foreign_key_checks

All approaches should include some from of validation and a rollback strategy should an insert fail, or fail to maintain referential integrity.

Again, completely new to this, and doing my best to provide as much information as possible, if you have any questions, or request for clarification — just let me know.

If you are able to post the XML from the kjb and ktr files (jobs/transformations) that would be SUPER. Might even hunt down every comment/answer you’ve every made anywhere and up vote them… 🙂 …really, it’s really important to me to find an answer for this.

Thanks!


SAMPLE DATA: To better elaborate with an example, lets assume I am trying to load a file containing employee name, the offices they have occupied in the past and their Job title history separated by a tab.

File:

EmployeeName<tab>OfficeHistory<tab>JobLevelHistory
John Smith<tab>501<tab>Engineer
John Smith<tab>601<tab>Senior Engineer
John Smith<tab>701<tab>Manager
Alex Button<tab>601<tab>Senior Assistant
Alex Button<tab>454<tab>Manager

NOTE: The single table database is completely normalized (as much as a single table may be) — and for example, in the case of “John Smith” there is only one John Smith; meaning there are no duplicates that would lead to conflicts in referential integrity.

The MyOffice database schema has the following tables:

Employee (nId, name)
Office (nId, number)
JobTitle (nId, titleName)
Employee2Office (nEmpID, nOfficeId)
Employee2JobTitle (nEmpId, nJobTitleID)

So in this case. the tables should look like:

Employee
1 John Smith
2 Alex Button

Office
1 501
2 601
3 701
4 454

JobTitle
1 Engineer
2 Senior Engineer
3 Manager
4 Senior Assistant

Employee2Office
1 1
1 2
1 3
2 2
2 4

Employee2JobTitle
1 1
1 2
1 3
2 4
2 3

Here’s the MySQL DDL to create the database and tables:

create database MyOffice2;

use MyOffice2;

CREATE TABLE Employee (
      id MEDIUMINT NOT NULL AUTO_INCREMENT,
      name CHAR(50) NOT NULL,
      PRIMARY KEY (id)
    ) ENGINE=InnoDB;

CREATE TABLE Office (
  id MEDIUMINT NOT NULL AUTO_INCREMENT,
  office_number INT NOT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB;

CREATE TABLE JobTitle (
  id MEDIUMINT NOT NULL AUTO_INCREMENT,
  title CHAR(30) NOT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB;

CREATE TABLE Employee2JobTitle (
  employee_id MEDIUMINT NOT NULL,
  job_title_id MEDIUMINT NOT NULL,
  FOREIGN KEY (employee_id) REFERENCES Employee(id),
  FOREIGN KEY (job_title_id) REFERENCES JobTitle(id),
  PRIMARY KEY (employee_id, job_title_id)
) ENGINE=InnoDB;

CREATE TABLE Employee2Office (
  employee_id MEDIUMINT NOT NULL,
  office_id MEDIUMINT NOT NULL,
  FOREIGN KEY (employee_id) REFERENCES Employee(id),
  FOREIGN KEY (office_id) REFERENCES Office(id),
  PRIMARY KEY (employee_id, office_id)
) ENGINE=InnoDB;



My Notes in Response to Selected Answer:

PREP:

  1. (a) Use the sample data, create a CSV by changing <TAB> to comma delimited.
  2. (b) Install MySQL and create sample database using the MySQL DDL sample
  3. (c) Install Kettle (it’s Java based and will run on anything that runs Java)
  4. (d) Download KTR file

Dataflow by Step: (My Notes)

  1. Open the KTR file in Kettle, and double clicked the “CSV file input” and browse to the CSV file that you created. The delimiter should already be set to comma. Then click OKAY.
  2. Double click “Insert Employees” and select DB connector then follow these directions on Creating a New Database Connection
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T03:03:01+00:00Added an answer on May 18, 2026 at 3:03 am

    I put together a sample transformation(right click and choose save link) based on what you provided. The only step I feel a bit uncertain on is the last table inputs. I’m basically writing the join data to the table and letting it fail if a specific relationship already exists.

    note:

    This solution doesn’t really meet the "All approaches should include some from of validation and a rollback strategy should an insert fail, or fail to maintain referential integrity." criteria, though it probably won’t fail. If you really want to setup something complex we can but this should definitely get you going with these transformations.

    alt text

    Dataflow by Step

    1. We start with reading in your file. In my case I converted it to CSV but tab is fine too.
    alt text

    2. Now we’re going to insert the employee names into the Employee table using a combination lookup/update.
    After the insert we append the employee_id to our datastream as id and remove the EmployeeName from the data stream.

    alt text

    3. Here we’re just using a Select Values step to rename the id field to employee_id
    alt text

    4. Insert Job Titles just like we did employees and append the title id to our datastream also deleting the JobLevelHistory from the datastream.

    alt text

    5. Simple rename of the title id to title_id(see step 3)
    alt text

    6. Insert offices, get id’s, remove OfficeHistory from the stream.

    alt text

    7. Simple rename of the office id to office_id(see step 3)

    alt text

    8. Copy Data from the last step into two streams with the values employee_id,office_id and employee_id,title_id respectively.

    alt text
    alt text

    9. Use a table insert to insert the join data. I’ve got it selected to ignore insert errors as there could be duplicates and the PK constraints will make some rows fail.

    Output Tables

    alt text

    alt text

    alt text

    alt text

    alt text

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Need to load data from a single file with a 100,000+ records into multiple
Basically, I will need to combine product data from multiple vendors into a single
I need to load a lot of data fetched from a mysql db in
I need to load a .xml file from a URL adress into an NSData
I need to load data from different remote database into our own database. I
Hello i need to create a progressView when i load data from my webservice.
I need to load data into my treestore. My ajax request give me XML
I need to load a custom function from an extern file but without causing
I need to load a model, existing of +/- 20 tables from the database
I'm working on importing data into MySQL from two business systems. I've written some

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.