Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 570411
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T13:23:55+00:00 2026-05-13T13:23:55+00:00

I’ve been working on text parsing task when I found strange Postgres behavior. My

  • 0

I’ve been working on text parsing task when I found strange Postgres behavior. My original code exposing strange error was written in Java with JDBC connectivity for PostgreSQL (v8.3.3 and v8.4.2 tested), here is my original post: Is it an error of PostgreSQL SQL engine and how to avoid (workaround) it?. I’ve just ported my Java code given there to pure plpgsql and it gives the same errors (same behavior as described in original post).

Simplified code has now nothing to do with parsing – it just generates pseudo-random (but repeatable) words and inserts them after normalizing (table spb_word holds unique words and ids, they are referenced by id in final table spb_obj_word and table spb_word4obj works as input buffer).

Here are my tables (c&p from OP):

create sequence spb_word_seq;

create table spb_word (
  id bigint not null primary key default nextval('spb_word_seq'),
  word varchar(410) not null unique
);

create sequence spb_obj_word_seq;

create table spb_obj_word (
  id int not null primary key default nextval('spb_obj_word_seq'),
  doc_id int not null,
  idx int not null,
  word_id bigint not null references spb_word (id),
  constraint spb_ak_obj_word unique (doc_id, word_id, idx)
);

create sequence spb_word4obj_seq;

create table spb_word4obj (
  id int not null primary key default nextval('spb_word4obj_seq'),
  doc_id int not null,
  idx int not null,
  word varchar(410) not null,
  word_id bigint null references spb_word (id),
  constraint spb_ak_word4obj unique (doc_id, word_id, idx),
  constraint spb_ak_word4obj2 unique (doc_id, word, idx)
);

and code ported to plpgsql from original Java code:

create sequence spb_wordnum_seq;

create or replace function spb_getWord() returns text as $$
declare
  rn int;
  letters varchar(255) :=   'ąćęłńóśźżjklmnopqrstuvwxyz';
                          --'abcdefghijklmnopqrstuvwxyz';
  llen int := length(letters);
  res text := '';
  wordnum int;
begin
  select nextval('spb_wordnum_seq') into wordnum;

  rn := 3 * (wordnum + llen * llen * llen);
  rn := (rn + llen) / (rn % llen + 1);
  rn := rn % (rn / 2 + 10);

  loop
    res := res || substring(letters, rn % llen, 1);
    rn := floor(rn / llen);
    exit when rn = 0;
  end loop;

  --raise notice 'word for wordnum=% is %', wordnum, res;

  return res;
end;
$$ language plpgsql;



create or replace function spb_runme() returns void as $$
begin
  perform setval('spb_wordnum_seq', 1, false);
  truncate table spb_word4obj, spb_word, spb_obj_word;

  for j in 0 .. 50000-1 loop

    if j % 100 = 0 then raise notice 'j = %', j; end if;

    delete from spb_word4obj where doc_id = j;

    for i in 0 .. 20 - 1 loop
      insert into spb_word4obj (word, idx, doc_id) values (spb_getWord(), i, j);         
    end loop;

    update spb_word4obj set word_id = w.id from spb_word w 
    where w.word = spb_word4obj.word and doc_id = j;

    insert into spb_word (word) 
    select distinct word from spb_word4obj 
    where word_id is null and doc_id = j;

    update spb_word4obj set word_id = w.id 
    from spb_word w 
    where w.word = spb_word4obj.word and 
    word_id is null and doc_id = j;

    insert into spb_obj_word (word_id, idx, doc_id) 
    select word_id, idx, doc_id from spb_word4obj where doc_id = j;
  end loop;
end;
$$ language plpgsql;

To run this simply execute select spb_runme() as SQL statement.

Here is first example of error:

NOTICE:  j = 8200
ERROR:  duplicate key value violates unique constraint "spb_word_word_key"
CONTEXT:  SQL statement "insert into spb_word (word) select distinct word from spb_word4obj where word_id is null and doc_id =  $1 "
PL/pgSQL function "spb_runme" line 18 at SQL statement

and second one:

NOTICE:  j = 500
ERROR:  null value in column "word_id" violates not-null constraint
CONTEXT:  SQL statement "insert into spb_obj_word (word_id, idx, doc_id) select word_id, idx, doc_id from spb_word4obj where doc_id =  $1 "
PL/pgSQL function "spb_runme" line 27 at SQL statement

These errors occur in unpredictable manner – every time in different iteration (j) and with different words causing error.

When polish national characters (ąćęłńóśźż) are eliminated from generated words (line letters varchar(255) := 'ąćęłńóśźżjklmnopqrstuvwxyz'; becomes letters varchar(255) := 'abcdefghijklmnopqrstuvwxyz';) there is no error! My DB is created with UTF-8 encoding, so there should be no problem with non-ascii chars, but apparently it is very important!

Now my question: what is wrong with my code? Or is it something serious wrong with PostgreSQL? How to work around this error?

BTW: If it is an error in PostgreSQL engine, then how this DB can be trustworthy? Should I move to one of free alternatives (e.g. MySQL)?


UPDATE: extra explanations (mainly for OMG Ponies)

If I remove unnecessary delete – I still have the same errors.

Function spb_getWord() must generate words with duplicates – it simulates text parsing and dividing it to words – and some words get repeated – that is normal and rest of my code is dealing with duplicates. Because of possible duplicates generated by spb_getWord() I insert words to buffer table spb_word4obj and then I update word_id in this table for already processed words from spb_word. So now – if row in spb_word4obj has word_id not null – then it is a duplicate, so I will not insert this word into spb_word. But – as OMG Ponies mentioned, I get error duplicate key value violates unique constraint which means my code that handles duplicates properly fails. I.e. my code fails because of internal Postgres error – proper code is somehow badly executed by Postgres and fails.

After inserting new words (duplicates recognized and marked not to be inserted) into spb_word my code finally inserts normalized words into spb_obj_word – replacing word body with reference to unduplicated entry in spb_word, but this again fails sometimes because of Postgres internal error. Again I think my code is proper but it fail because there is problem in Postgres SQL engine itself.

Adding or removing polish national letters from generated words by spb_getWord only assures me that it is strange Postgres error – all unique/duplicates considerations remain the same but allowing/disallowing some letters from words leads to errors or eliminates them. So it is not the case of error in my code – unproper handling of duplicates.

Second thing that assures me that it is not error in my code is unpredictable moment of errors that are detected. Every run of my code does the same sequence of words so it should always break in the same place with the same value causing the error. But it is not – it is quite random moment that it fails.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T13:23:55+00:00Added an answer on May 13, 2026 at 1:23 pm

    I’ve managed to simplify test code – now it uses one table. Simplified problem was posted on pgsql-bugs mailing list: http://archives.postgresql.org/pgsql-bugs/2010-01/msg00182.php. It is confirmed to occur on other machines (not only mine).

    Here is this simplified version of main test function (it needs one table spb_word, sequences spb_wordnum_seq and spb_word_seq and one function spb_getWord given in my question).

    create or replace function spb_runmeSimple2(cnt int) returns void as $$
    declare
      w varchar(410);
      wordId int;
    begin
      perform setval('spb_wordnum_seq', 1, false);
      truncate table spb_word cascade;
    
      for i in 1 .. cnt loop
    
        if i % 100 = 0 then raise notice 'i = %', i; end if;
    
        select spb_getWord() into w;
        select id into wordId from spb_word where word = w;
        if wordId is null then 
          insert into spb_word (word) values (w);
        end if;
    
      end loop;
    end;
    $$ language plpgsql;
    

    Now error occurs (but in unpredictable manner) while executing select spb_runmeSimple2(10000000).

    Here is workaround: change database collation from polish to standard ‘C’. With ‘C’ collation there is no error. But without polish collation polish words are sorted incorrectly (with respect to polish national characters), so problem should be fixed in Postgres itself.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 313k
  • Answers 313k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer You could use a pictureBox as a button. May 13, 2026 at 10:53 pm
  • Editorial Team
    Editorial Team added an answer Your suggestion of appending the extension to the result of… May 13, 2026 at 10:53 pm
  • Editorial Team
    Editorial Team added an answer If you have any special characters, they should be escaped,… May 13, 2026 at 10:53 pm

Related Questions

I've got a string that has curly quotes in it. I'd like to replace
I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
I ran into a problem. Wrote the following code snippet: teksti = teksti.Trim() teksti
I have a French site that I want to parse, but am running into
I have text I am displaying in SIlverlight that is coming from a CMS

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.