We assume that there is no primary key defined for a table T. In that case, how does one count all the rows in T quickly/efficiently for these databases – Oracle 11g, MySql, Mssql ?
It seems that count(*) and count(column_name) can be slow and inaccurate respectively. The following seems to be the fastest and most reliable way to do it-
select count(rowid) from MySchema.TableInMySchema;
Can you tell me if the above statement also has any shortcomings ? If it is good, then do we have similar statements for mysql and mssql ?
Thanks in advance.
Source –
http://www.thewellroundedgeek.com/2007/09/most-people-use-oracle-count-function.html
count(column_name)is not inaccurate, it’s simply something completely different than count(*).The SQL standard defines
count(column_name)as equivalent tocount(*) where column_name IS NOT NULL. To the result is bound to be different if column_name is nullable.In Oracle (and possibly other DBMS as well),
count(*)will use an available index on a not null column to count the rows (e.g. PK index). So it will be just as fasAdditionally there is nothing similar to the rowid in SQL Server or MySQL (in PostgreSQL it would be
ctid).Do use
count(*). It’s the best option to get the row count. Let the DBMS do any optimization in the background if adequate indexes are available.Edit
A quick demo on how Oracle automatically uses an index if available and how that reduces the amount of work done by the database:
The setup of the test table:
That generates 10000 rows with each row filling up some space in order to make sure the table has a realistic size.
Now in SQL*Plus I run the following:
SQL> set autotrace traceonly explain statistics; SQL> select count(*) from foo; Execution Plan ---------------------------------------------------------- Plan hash value: 1342139204 ------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2740 (1)| 00:00:33 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| FOO | 9999 | 2740 (1)| 00:00:33 | ------------------------------------------------------------------- Statistics ---------------------------------------------------------- 181 recursive calls 0 db block gets 10130 consistent gets 0 physical reads 0 redo size 430 bytes sent via SQL*Net to client 420 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 5 sorts (memory) 0 sorts (disk) 1 rows processed SQL>As you can see a full table scan is done on the table which requires 10130 “IO Operations” (I know that that is not the right term, but for the sake of the demo it should be a good enough explanation for someone never seen this before)
Now I create an index on that column and run the count(*) again:
SQL> create index i1 on foo (id); Index created. SQL> select count(*) from foo; Execution Plan ---------------------------------------------------------- Plan hash value: 129980005 ---------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | ---------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 7 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | INDEX FAST FULL SCAN| I1 | 9999 | 7 (0)| 00:00:01 | ---------------------------------------------------------------------- Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 27 consistent gets 21 physical reads 0 redo size 430 bytes sent via SQL*Net to client 420 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL>As you can see Oracle did use the index on the (not null!) column and the amount of IO went drastically down (from 10130 to 27 – not something I’d call “grossly ineffecient“).
The “physical reads” stem from the fact that the index was just created and was not yet in the cache.
I would expect other DBMS to apply the same optimizations.