Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 193109
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T16:28:57+00:00 2026-05-11T16:28:57+00:00

We use grep, cut, sort, uniq, and join at the command line all the

  • 0

We use grep, cut, sort, uniq, and join at the command line all the time to do data analysis. They work great, although there are shortcomings. For example, you have to give column numbers to each tool. We often have wide files (many columns) and a column header that gives column names. In fact, our files look a lot like SQL tables. I’m sure there is a driver (ODBC?) that will operate on delimited text files, and some query engine that will use that driver, so we could just use SQL queries on our text files. Since doing analysis is usually ad hoc, it would have to be minimal setup to query new files (just use the files I specify in this directory) rather than declaring particular tables in some config.

Practically speaking, what’s the easiest? That is, the SQL engine and driver that is easiest to set up and use to apply against text files?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-11T16:28:58+00:00Added an answer on May 11, 2026 at 4:28 pm

    Riffing off someone else’s suggestion, here is a Python script for sqlite3. A little verbose, but it works.

    I don’t like having to completely copy the file to drop the header line, but I don’t know how else to convince sqlite3’s .import to skip it. I could create INSERT statements, but that seems just as bad if not worse.

    Sample invocation:

    $ sql.py --file foo --sql "select count(*) from data"
    

    The code:

    #!/usr/bin/env python
    
    """Run a SQL statement on a text file"""
    
    import os
    import sys
    import getopt
    import tempfile
    import re
    
    class Usage(Exception):
        def __init__(self, msg):
            self.msg = msg
    
    def runCmd(cmd):
        if os.system(cmd):
            print "Error running " + cmd
            sys.exit(1)
            # TODO(dan): Return actual exit code
    
    def usage():
        print >>sys.stderr, "Usage: sql.py --file file --sql sql"
    
    def main(argv=None):
        if argv is None:
            argv = sys.argv
    
        try:
            try:
                opts, args = getopt.getopt(argv[1:], "h",
                                           ["help", "file=", "sql="])
            except getopt.error, msg:
                raise Usage(msg)
        except Usage, err:
            print >>sys.stderr, err.msg
            print >>sys.stderr, "for help use --help"
            return 2
    
        filename = None
        sql = None
        for o, a in opts:
            if o in ("-h", "--help"):
                usage()
                return 0
            elif o in ("--file"):
                filename = a
            elif o in ("--sql"):
                sql = a
            else:
                print "Found unexpected option " + o
    
        if not filename:
            print >>sys.stderr, "Must give --file"
            sys.exit(1)
        if not sql:
            print >>sys.stderr, "Must give --sql"
            sys.exit(1)
    
        # Get the first line of the file to make a CREATE statement
        #
        # Copy the rest of the lines into a new file (datafile) so that
        # sqlite3 can import data without header.  If sqlite3 could skip
        # the first line with .import, this copy would be unnecessary.
        foo = open(filename)
        datafile = tempfile.NamedTemporaryFile()
        first = True
        for line in foo.readlines():
            if first:
                headers = line.rstrip().split()
                first = False
            else:
                print >>datafile, line,
        datafile.flush()
        #print datafile.name
        #runCmd("cat %s" % datafile.name)
        # Create columns with NUMERIC affinity so that if they are numbers,
        # SQL queries will treat them as such.
        create_statement = "CREATE TABLE data (" + ",".join(
            map(lambda x: "`%s` NUMERIC" % x, headers)) + ");"
    
        cmdfile = tempfile.NamedTemporaryFile()
        #print cmdfile.name
        print >>cmdfile,create_statement
        print >>cmdfile,".separator ' '"
        print >>cmdfile,".import '" + datafile.name + "' data"
        print >>cmdfile, sql + ";"
        cmdfile.flush()
        #runCmd("cat %s" % cmdfile.name)
        runCmd("cat %s | sqlite3" % cmdfile.name)
    
    if __name__ == "__main__":
        sys.exit(main())
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 132k
  • Answers 132k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer You do not have to initialise member variables, but it… May 12, 2026 at 6:30 am
  • Editorial Team
    Editorial Team added an answer Solved the issue just by chance (and don't have the… May 12, 2026 at 6:30 am
  • Editorial Team
    Editorial Team added an answer That's a really big message, but yes, if you must… May 12, 2026 at 6:30 am

Related Questions

My folder Structure /UNIX /Find /Grep /Find-Grep I made a symlink to the UNIX
What would be your suggestions for a good bash/ksh script template to use as
We want to build a script that run every night (kills and restart a
Here at work, we are working on a newsletter system that our clients can

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.