Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 460257
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T22:51:19+00:00 2026-05-12T22:51:19+00:00

I have two massive tables with about 100 million records each and I’m afraid

  • 0

I have two massive tables with about 100 million records each and I’m afraid I needed to perform an Inner Join between the two. Now, both tables are very simple; here’s the description:

BioEntity table:

  • BioEntityId (int)
  • Name (nvarchar 4000, although this is an overkill)
  • TypeId (int)

EGM table (an auxiliar table, in fact, resulting of bulk import operations):

  • EMGId (int)
  • PId (int)
  • Name (nvarchar 4000, although this is an overkill)
  • TypeId (int)
  • LastModified (date)

I need to get a matching Name in order to associate BioEntityId with the PId residing in the EGM table. Originally, I tried to do everything with a single inner join but the query appeared to be taking way too long and the logfile of the database (in simple recovery mode) managed to chew up all the available disk space (that’s just over 200 GB, when the database occupies 18GB) and the query would fail after waiting for two days, If I’m not mistaken. I managed to keep the log from growing (only 33 MB now) but the query has been running non-stop for 6 days now and it doesn’t look like it’s gonna stop anytime soon.

I’m running it on a fairly decent computer (4GB RAM, Core 2 Duo (E8400) 3GHz, Windows Server 2008, SQL Server 2008) and I’ve noticed that the computer jams occasionally every 30 seconds (give or take) for a couple of seconds. This makes it quite hard to use it for anything else, which is really getting on my nerves.

Now, here’s the query:

 SELECT EGM.Name, BioEntity.BioEntityId INTO AUX
 FROM EGM INNER JOIN BioEntity 
 ON EGM.name LIKE BioEntity.Name AND EGM.TypeId = BioEntity.TypeId

I had manually setup some indexes; both EGM and BioEntity had a non-clustered covering index containing TypeId and Name. However, the query ran for five days and it did not end either, so I tried running Database Tuning Advisor to get the thing to work. It suggested deleting my older indexes and creating statistics and two clustered indexes instead (one on each table, just containing the TypeId which I find rather odd – or just plain dumb – but I gave it a go anyway).

It has been running for 6 days now and I’m still not sure what to do…
Any ideas guys? How can I make this faster (or, at least, finite)?

Update:
– Ok, I’ve canceled the query and rebooted the server to get the OS up and running again
– I’m rerunning the workflow with your proposed changes, specifically cropping the nvarchar field to a much smaller size and swapping “like” for “=”. This is gonna take at least two hours, so I’ll be posting further updates later on

Update 2 (1PM GMT time, 18/11/09):
– The estimated execution plan reveals a 67% cost regarding table scans followed by a 33% hash match. Next comes 0% parallelism (isn’t this strange? This is the first time I’m using the estimated execution plan but this particular fact just lifted my eyebrow), 0% hash match, more 0% parallelism, 0% top, 0% table insert and finally another 0% select into. Seems the indexes are crap, as expected, so I’ll be making manual indexes and discard the crappy suggested ones.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T22:51:20+00:00Added an answer on May 12, 2026 at 10:51 pm

    For huge joins, sometimes explicitly choosing a loop join speeds things up:

    SELECT EGM.Name, BioEntity.BioEntityId INTO AUX
    FROM EGM 
    INNER LOOP JOIN BioEntity 
        ON EGM.name LIKE BioEntity.Name AND EGM.TypeId = BioEntity.TypeId
    

    As always, posting your estimated execution plan could help us provide better answers.

    EDIT: If both inputs are sorted (they should be, with the covering index), you can try a MERGE JOIN:

    SELECT EGM.Name, BioEntity.BioEntityId INTO AUX
    FROM EGM 
    INNER JOIN BioEntity 
        ON EGM.name LIKE BioEntity.Name AND EGM.TypeId = BioEntity.TypeId
    OPTION (MERGE JOIN)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm porting a process which creates a MASSIVE CROSS JOIN of two tables. The
I have two applications written in Java that communicate with each other using XML
I have two identical tables and need to copy rows from table to another.
Have two tables with a linking table between them. USERS +-------+---------+ | userID| Username|
I have two tables: CREATE TABLE 'sales_sheet' ( `_id` int(11) NOT NULL AUTO_INCREMENT, `_typed_by`
I have two tables in my database, table1 and table2. They are identical. But
I have two tables in a legacy database... tblParentTable (int id, string specialIdentifier, ...)
I have two arrays of animals (for example). $array = array( array( 'id' =>
I have two arrays of System.Data.DataRow objects which I want to compare. The rows
I have two elements: <input a> <input b onclick=...> When b is clicked, I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.