Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 629349
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T19:44:10+00:00 2026-05-13T19:44:10+00:00

I receive data files from a source I have no control over (the government)

  • 0

I receive data files from a source I have no control over (the government) and in the records they have a Company Name field that I actually need to associate with existing company records in my database. I’m concerned that some of the names will vary by minor differences such as ‘Company X, Inc.’ vs ‘Company X Inc’.

So my initial thoughts would be to create a collation key field based on the name ToLower() and apply a regex to strip out all white space, and special characters.

Is there any better methodology to apply to this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T19:44:11+00:00Added an answer on May 13, 2026 at 7:44 pm

    that may work, but there may be false matches, with no way to prevent them, because you have an algorithm solution only. Your best bet is to create an alias table. Include every variation ever found for each company name and a FK to the real company’s ID. Include a row for the actual name as well.

    AliasID CompanyID CompanyAlias
    ------- --------- ------------
    1       1         Company X, Inc   <<--actual real company name
    2       1         Company X Inc
    3       1         Company X
    

    If an exact name match is not found in this table when importing data, you can use your proposed algorithm or another, or use a human input, etc to find a match or generate a new company. At that point insert into the alias table. If you find that your match was wrong for some reason, your can alter the alias table to make the proper mapping. If you only go with an algorithm, you’d need to include exceptions and your algorithm would grow large and slow. With this table and a good index, finding your matches should be fast.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a scenario in which i receive files from a source and i
I have a user interface in .net which needs to receive data from a
I have a little logging app (written in wxPython) that receives data from a
I need to send and receive data over serial connections (RS-232 and RS-422). How
Is there a library that would allow me to send and receive data on
i have little problem with boost::asio library. My app receive and process data asynchronously,
I would like to receive suggestions on the data generators that are available, for
I'm developing a server that should receive nightly reports from hundreds of business units.
We have a SQL Server table containing Company Name, Address, and Contact name (among
I am going to develop real time application which will receive stock market data

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.