I have an Access database of 4M rows, each representing an individual customer order.
I need to run a query from Excel (I use VBA) in order to retrieve only the orders from customers in REGION1.
I tried the following (names should be pretty self-explanatory):
Sub Query()
Dim cn As Object
Dim strFile As String
Dim strCon As String
Dim strSQL As String
strFile = "C:\Users\MyName\Desktop\DataBase.accdb"
strCon = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & strFile
Set cn = CreateObject("ADODB.Connection")
Set rs = CreateObject("ADODB.Recordset")
cn.Open strCon
strSQL = "SELECT [CUSTOMER], [DATE], [REVENUE]" _
& "FROM [SALES DB]" _
& "WHERE [REGION]='REGION1'"
rs.Open strSQL, cn, 0, 1
Worksheets(1).Cells(2, 1).CopyFromRecordset rs
rs.Close
Set rs = Nothing
cn.Close
Set cn = Nothing
End Sub
This works nicely but it’s a bit slow, as it returns ~600k rows.
So I thought: “Who cares about the detailed list of all customer orders? I just need the monthly aggregate. This should reduce the number of returned rows and hence inprove speed!”.
So I changed my code to:
strSQL = "SELECT [CUSTOMER], MONTH([DATE]), YEAR([DATE]), SUM([REVENUE])" _
& "FROM [SALES DB]" _
& "WHERE [REGION]='REGION1'"
& "GROUP BY [CUSTOMER], MONTH([DATE]), YEAR([DATE])"
As I expected, now only ~450K results show up. The thing is, the query actually became slower.
I’m actually better off extracting the ungrouped data and then aggregating it with a simple pivot table.
How can less data be slower to extract? I know there’s some calculations to be performed in between, but still.
Does anybody out there have any idea how I can overcome this problem?
You don’t mention the actual time taken by the queries, but here are a few thoughts:
Make sure you have indexes for all the fields in the database that you are grouping or filtering.
If you are the only user of that database, open it in exclusive mode:
For
ADO, use the connection string:Use
DAOinstead ofADO.DAOis native to Access and generally faster.First thing is to add a reference to the Access Engine to Excel:
From the IDE, under Tools > References, go down the list and check:
Microsoft Office 12.0 Access database engine Object Library.If you have Access 2010, the reference will be:
Microsoft Office 14.0 Access database engine Object Library.With older versions of Access (2003 and previous), using the Jet engine instead (MDB files only), it would be:
Microsoft DAO 3.6 Object Library.Then use the VBA code below to load the data into your worksheet:
If you are mostly importing your data from Access to display them in a Pivot, you may be better served by the pivot table in Access itself.
On that subject, did you know that you can split your database to share the data backend and use the free Access Runtime to allow all your users to view your reports and play with the data on their machine?
Moving to SQL Server or another database may/may not solve your issues at all:
if SQL Server is on your Machine, it will take more or less as much resources to calculate your query as if the MS Access database was on your machine.
if SQL Server is on a remote machine, most of the time will be spent on the network data transfer.
your bottleneck is probably not the database, it’s the time to import that much data into the spreadsheet itself. You can try and execute the query from Access itself and see how long it takes.
If you have that much data to sift through, Excel is probably not the best tool for the job, and you may be better served by a dedicated reporting or Business Intelligence application.
There are plenty of OpenSource and commercial platform, for instance: