Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 669337
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T00:08:25+00:00 2026-05-14T00:08:25+00:00

I am importing massive amounts of data from Excel that have various table layouts.

  • 0

I am importing massive amounts of data from Excel that have various table layouts. I have good enough table detection routines and merge cell handling, but I am running into a problem when it comes to dealing with borders. Namely performance. The bordered regions in some of these files have meaning.

Data Setup:
I am importing directly from Office Open XML using VB6 and MSXML. The data is parsed from the XML into a dictionary of cell data. This wonks wonderfully and is just as fast as using docmd.transferspreadsheet in Access, but returns much better results. Each cell contains a pointer to a style element which contains a pointer to a border element that defines the visibility and weight of each border (this is how the data is structured inside OpenXML, also).

Challenge:
What I’m trying to do is find every region that is enclosed inside borders, and create a list of cells that are inside that region.

What I have done:
I initially created a BFS(breadth first search) fill routine to find these areas. This works wonderfully and fast for “normal” sized spreadsheets, but gets way too slow for imports into the thousands of rows. One problem is that a border in Excel could be stored in the cell you are checking or the opposing border in the adjacent cell. That’s ok, I can consolidate that data on import to reduce the number of checks needed.

One thing I thought about doing is to create a separate graph that outlines the cells using the borders as my edges and using a graph algorithm to find regions that way, but I’m having trouble figuring out how to implement the algorithm. I’ve used Dijkstra in the past and thought I could do similar with this. So I can span out using no endpoint to search the entire graph, and if I encounter a closed node I know that I just found an enclosed region, but how can I know if the route I’ve found is the optimal one? I guess I could flag that to run a separate check for the found closed node to the previous node ignoring that one edge.

This could work, but wouldn’t be much better performance wise on dense graphs. Can anyone else suggest a better method? Thanks for taking the time to read this.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T00:08:26+00:00Added an answer on May 14, 2026 at 12:08 am

    Your question is pretty complicated, but it sounds as though you need an algorithm to find the connected components of a graph (connected component = set of nodes all connected to one another but to no other nodes), which can be accomplished in linear time by repeated traversals. Pseudocode:

    FindComponents(G):
        For all vertices v in G:
            Let C be a mutable empty collection
            Traverse(G, C, v)
            If C is nonempty, then it is a connected component
    
    Traverse(G, C, v):
        If v has not been visited:
            Mark v as visited
            Add v to C
            For each neighbor w of v in G:
                Traverse(G, C, w)
    

    Iterative variant of Traverse:

    Traverse(G, C, r):
        Let S be an empty stack
        Push r onto S
        While S is not empty:
            Pop the top element v of S
            If v is not marked as visited:
                Mark v as visited
                Add v to C
                For each neighbor w of v in G:
                    Push w onto S
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm importing some data from a CSV file, and numbers that are larger than
I'm importing massive data into a JackRabbit JCR repository. A good UI management tool
I'm importing a CSV into a MySQL table with LOAD DATA INFILE . One
When importing numbers from a csv file, I need to convert them to floats
I am importing the CreateICeeFileGen() function from the unmanaged DLL mscorpe.dll in a C#
Which method makes the most sense for importing a module in python that is
After installing the VSTS Database GDR and importing a SQL Server 2005 database that
When I type 'from' (in a LINQ query) after importing System.Linq namespace , it
I have this large (and oddly formatted txt file) from the USDA's website. It
I'm searching for improving my algorithm for an allocate massive data riddle. If anyone

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.