Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8651397
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T14:03:36+00:00 2026-06-12T14:03:36+00:00

Our organization is currently in the process of building a new data warehouse. We

  • 0

Our organization is currently in the process of building a new data warehouse. We are actually able to use some techniques borrowed from the DW community such as ETL processing to conform data, de-normalized dimensions in the “kimbal” style, etc. etc. Overall, data warehousing is still fairly new to our organization, but we are learning the concepts as we go along.

The problem: We have multiple sources of data, with often conflicting sources of facts. For example, we have a Master Person Index, where we use a score-based matching algorithm during ETL to match an inbound person to an existing person, so even if the inbound record doesn’t exactly match, we can score based on other things like zip code radius.

Here’s the question: What is the standard way to handle multiple versions of a fact from two or more sources?

I understand one of the main ideas of the data warehouse is to keep a running history of any fact, which we are doing. That’s all fine and dandy when a record is being maintained by one inbound source, we keep the history of that fact over time. The problem occurs when two different sources perhaps updating on a daily basis have two different facts, e.g. source A says the name is Mary Smith, source B says the name is Mary Jane changing this value every day! Based on the matching algorithm we’re confident it’s the same person, but due to our history style table, it basically keeps flopping back and forth to both names every day because it is reading the name as a “change” from each data source.

An example table:

first_name  last_name    source    last_updated
Mary        Smith        A         5/2/12 1:00am
Mary        Jane         B         5/2/12 2:00am
Mary        Smith        A         5/3/12 1:00am
Mary        Jane         B         5/3/12 2:00am
Mary        Smith        A         5/4/12 1:00am
Mary        Jane         B         5/4/12 2:00am
...
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T14:03:38+00:00Added an answer on June 12, 2026 at 2:03 pm

    Have one table that stores your external data:

     id | first_name | last_name | source | external_unique_id | import_date
    ----+------------+-----------+--------+--------------------+-------------
      1 | Mary       | Smith     |    A   |     abcdefg123     | 5/2/12 1:00am
      2 | Mary       | Jane      |    B   |     1234567abc     | 5/2/12 2:00am
    

    Then have a second table that contains your cleaned data:

     id | first_name | last_name 
    ----+------------+-----------
      1 | Mary       | Jane-Smith     (or whatever)
    

    Then have a mapping table between the two.

     local_person_id | foreign_person_id
    -----------------+-------------------
           1         |        1 
           1         |        2
    

    Or something broadly similar.

    The objective is to load the facts from your source once, and keep them.

    Then use your fuzzy logic to relate them to master records somewhere. Which you only need to do when new facts are loaded or old facts are changed.

    Still, you have the choice on what last_name to use. But that can be almost arbitrary in the absence of determining data. For example : Whichever pick the last name from the fact loaded most recently.

    You can still quickly and simply relate the master to the child facts, to their sources, and to their corresponding data. But you have a unified entity in your warehouse to hang these external facts on.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Our team is currently in the process of moving from SVN to Git. We
Currently, our organization does not practice Continuous Integration. In order for us to get
Our organization's software is compiled for the .NET 3.5 Framework. We have some customers
Our organization is moving towards a new case management system. One of the functions
In our organization we have some projects which are (by policy) open to all
The organization I currently work for uses SVN for developing PHP applications. Our development
Our organization currently has an external customer website that allows customers to download files
problem is existing applications on our organization required to enter same data multiple times,
We are currently evaluating Liferay for our non-profit organization, and there's one concern that
we have following scenario for our new project: 1) Organization have central office where

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.