Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3358044
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T02:42:18+00:00 2026-05-18T02:42:18+00:00

I’m trying to learn SQLite and searching for techniques to speed up my query.

  • 0

I’m trying to learn SQLite and searching for techniques to speed up my query. I see some here trying to go squeeze out ms, when I’m easily in the mega seconds. I have one SQLite db with four tables, although I’m only querying three tables. Here’s the query (I am using R to invoke the query):

SELECT a.date, a.symbol, SUM (a.oi*a.contract_close) AS oi, c.ret, c.prc
    FROM (SELECT date, symbol, oi, contract_close FROM ann
            UNION
            SELECT date, symbol AS sym, oi, contract_close FROM qtr
            WHERE oi > 100 AND contract_close > 0 AND date > 20090600) a
    INNER JOIN
    (SELECT date, symbol || '1C' AS sym, ret, prc FROM crsp
            WHERE prc > 5 AND date>20090600) c
    ON a.date = c.date AND a.symbol = c.sym
    GROUP BY a.date, a.symbol

I have a an index on each table by date and symbol and just VACUUMed, but it’s still very slow, as in an hour plus (and notice that I’m looking for a six month subset… I really want to query back to 2003).

Is this just a cache size issue? I have a relatively new laptop (MacBook Pro with 4gb RAM). Thanks!

Here’s the .schema:

CREATE TABLE ann 
( "date" INTEGER,
 symbol TEXT,
 contract_type_1 TEXT,
 contract_type_2 TEXT,
 product_type TEXT,
 block_volume INTEGER,
 oi_change INTEGER,
 oi INTEGER,
 efp_volume INTEGER,
 total_volume INTEGER,
 name TEXT,
 contract_change INTEGER,
 contract_open INTEGER,
 contract_high INTEGER,
 contract_low INTEGER,
 contract_close INTEGER,
 contract_settle INTEGER 
);
CREATE TABLE crsp 
( "date" INTEGER,
 symbol TEXT,
 permno INTEGER,
 prc REAL,
 ret REAL,
 vwretd REAL,
 ewretd REAL,
 sprtrn REAL 
);
CREATE TABLE dly 
( "date" INTEGER,
 symbol TEXT,
 expiration INTEGER,
 product_type TEXT,
 shares_per_contract INTEGER,
 "open" REAL,
 high REAL,
 low REAL,
 "last" REAL,
 settle REAL,
 change REAL,
 total_volume INTEGER,
 efp_volume INTEGER,
 block_volume INTEGER,
 oi INTEGER 
);
CREATE TABLE qtr 
( "date" INTEGER,
 symbol TEXT,
 total_volume INTEGER,
 block_volume INTEGER,
 efp_volume INTEGER,
 contract_high INTEGER,
 contract_low INTEGER,
 contract_open INTEGER,
 contract_close INTEGER,
 contract_settle INTEGER,
 oi INTEGER,
 oi_change INTEGER,
 shares_per_contract INTEGER,
 expiration INTEGER,
 product_type TEXT,
 unk TEXT,
 name TEXT 
);
CREATE INDEX idx_ann_date_sym ON ann (date, symbol);
CREATE INDEX idx_crsp_date_sym ON ann (date, symbol);
CREATE INDEX idx_dly_date_sym ON ann (date, symbol);
CREATE INDEX idx_qtr_date_sym ON ann (date, symbol);
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T02:42:19+00:00Added an answer on May 18, 2026 at 2:42 am

    You don’t mention the critical piece of information, which is how many rows are in each table and how many are in your result set. A query shouldn’t take an hour unless you have really enormous data sets.

    That said, a few things I notice about your query:

    1. I assume you’re aware that in your UNION the WHERE clause only applies to the second table and you’re getting the entire “ann” table included?

    2. UNION ALL is generally faster than plain UNION unless you really need the de-duplication provided by plain UNION.

    3. You do not need to repeat the filter for the date field on both sides on the JOIN. One side is enough, and you may achieve different speed results depending on which side of the JOIN you put the filter. By using it in both places you could possibly be tricking the query optimizer.

    4. I’m not sure what “AS sym” is doing in the second SELECT in the UNION, because that column will be named “symbol” in the output (from the first SELECT in the UNION) and you’re relying on the name symbol in your main SELECT statement.

    5. In your main SELECT statement you don’t have c.ret and c.prc in aggregate functions, but you don’t include them in the GROUP BY, so it’s not clear to me what value you expect to see in the results in the event that c contains multiple rows for a GROUP BY set.

    6. The JOIN cannot be optimized because you’re calculating one of the JOIN values as part of an inner SELECT. I’m not sure if there’s a clever way to rewrite the JOIN conditions to be optimizable without storing a calculated symbol value in crsp.

    7. Depending on the distribution of symbol and date values, you might want to reverse the order of the columns in your indexes (but only if you solve the problem of calculating the symbol value).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I am trying to understand how to use SyndicationItem to display feed which is
Basically, what I'm trying to create is a page of div tags, each has
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have just tried to save a simple *.rtf file with some websites and
I am trying to loop through a bunch of documents I have to put
I'm parsing an RSS feed that has an ’ in it. SimpleXML turns this
I have some data like this: 1 2 3 4 5 9 2 6
I'm new to using the Perl treebuilder module for HTML parsing and can't figure
Seemingly simple, but I cannot find anything relevant on the web. What is the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.