Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 230267
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T19:50:06+00:00 2026-05-11T19:50:06+00:00

This is sort of a general question that has come up in several contexts,

  • 0

This is sort of a general question that has come up in several contexts, the example below is representative but not exhaustive. I am interested in any ways of learning to work with Postgres on imperfect (but close enough) data sources.

The specific case — I am using Postgres with PostGIS for working with government data published in shapefiles and xml. Using the shp2pgsql module distributed with PostGIS (for example on this dataset) I often get schema like this:

   Column   |         Type          | 
------------+-----------------------+-
 gid        | integer               |
 st_fips    | character varying(7)  | 
 sfips      | character varying(5)  | 
 county_fip | character varying(12) | 
 cfips      | character varying(6)  | 
 pl_fips    | character varying(7)  | 
 id         | character varying(7)  | 
 elevation  | character varying(11) | 
 pop_1990   | integer               | 
 population | character varying(12) | 
 name       | character varying(32) | 
 st         | character varying(12) | 
 state      | character varying(16) | 
 warngenlev | character varying(13) | 
 warngentyp | character varying(13) | 
 watch_warn | character varying(14) | 
 zwatch_war | bigint                | 
 prog_disc  | bigint                | 
 zprog_disc | bigint                | 
 comboflag  | bigint                | 
 land_water | character varying(13) | 
 recnum     | integer               | 
 lon        | numeric               | 
 lat        | numeric               | 
 the_geom   | geometry              |

I know that at least 10 of those varchars — the fips, elevation, population, etc., should be ints; but when trying to cast them as such I get errors. In general I think I could solve most of my problems by allowing Postgres to accept an empty string as a default value for a column — say 0 or -1 for an int type — when altering a column and changing the type. Is this possible?

If I create the table before importing with the type declarations generated from the original data source, I get better types than with shp2pgsql, and can iterate over the source entries feeding them to the database, discarding any failed inserts. The fundamental problem is that if I have 1% bad fields, evenly distributed over 25 columns, I will lose 25% of my data since a given insert will fail if any field is bad. I would love to be able to make a best-effort insert and fix any problems later, rather than lose that many rows.

Any input from people having dealt with similar problems is welcome — I am not a MySQL guy trying to batter PostgreSQL into making all the same mistakes I am used to — just dealing with data I don’t have full control over.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-11T19:50:06+00:00Added an answer on May 11, 2026 at 7:50 pm

    Could you produce a SQL file from shp2pgsql and do some massaging of the data before executing it? If the data is in COPY format, it should be easy to parse and change “” to “\N” (insert as null) for columns.

    Another possibility would be to use shp2pgsql to load the data into a staging table where all the fields are defined as just ‘text’ type, and then use an INSERT…SELECT statement to copy the data to your final location, with the possibility of massaging the data in the SELECT to convert blank strings to null etc.

    I don’t think there’s a way to override the behaviour of how strings are converted to ints and so on: possibly you could create your own type or domain, and define an implicit cast that was more lenient… but this sounds pretty nasty, since the types are really just artifacts of how your data arrives in the system and not something you want to keep around after that.

    You asked about fixing it up when changing the column type: you can do that too, for example:

    steve@steve@[local] =# create table test_table(id serial primary key, testvalue text not null);
    NOTICE:  CREATE TABLE will create implicit sequence "test_table_id_seq" for serial column "test_table.id"
    NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "test_table_pkey" for table "test_table"
    CREATE TABLE
    steve@steve@[local] =# insert into test_table(testvalue) values('1'),('0'),('');
    INSERT 0 3
    steve@steve@[local] =# alter table test_table alter column testvalue type int using case testvalue when '' then 0 else testvalue::int end;
    ALTER TABLE
    steve@steve@[local] =# select * from test_table;
     id | testvalue
    ----+-----------
      1 |         1
      2 |         0
      3 |         0
    (3 rows)
    

    Which is almost equivalent to the “staging table” idea I suggested above, except that now the staging table is your final table. Altering a column type like this requires rewriting the entire table anyway: so actually, using a staging table and reformatting multiple columns at once is likely to be more efficient.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a sort of general question but i think that if I tried
I know this sort of code is not best practice, but nevertheless in certain
This question is related but not the same as this: How do you give
I know that this type of question has been asked over and over again,
This is more of a general design question, but it will be implemented in
This is sort of a general implementation question. If I have an arbitrarily deep
Sorry this is such a long question but it touches on a general issue
Background: This is really a general best-practices question, but some background about the specific
So this may seem like a general question..but for learning a programming language quickly
I asked this sort of question before ( Application fails to dynamically _re_load JavaScript

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.