For legacy reasons, we have a VARCHAR2 column in our Oracle 10 database—where the

Question

0

Asked: June 12, 20262026-06-12T15:34:41+00:00 2026-06-12T15:34:41+00:00

For legacy reasons, we have a VARCHAR2 column in our Oracle 10 database—where the

0

For legacy reasons, we have a VARCHAR2 column in our Oracle 10 database—where the character encoding is set to AL32UTF8—that contain some non-UTF-8 values. The values are always in one of these character sets:

US-ASCII
UTF-8
CP1252
Latin-1

I’ve written a Perl function to fix broken values outside the database. For a value from this database column, it loops through this list of encodings and tries to convert the value to UTF-8. If the conversion fails, it tries the next encoding. The first one to convert without error is the value we keep. Now, I would like to replicate this functionality inside the database so that anyone can use it.

However, all I can find for this is the CONVERT function, which never fails, but inserts a replacement character for characters it does not recognize. So there is no way, as far as I can tell, to know when the conversion failed.

Therefor, I have two questions:

Is there some existing interface that tries to convert a string into one of list of encodings, returning the first that succeeds?
And if not, is there some other interface that indicates failure if it’s not able to convert a string to an encoding? If so, then I could write the previous function.

UPDATE:

For reference, I have written this PostgreSQL function in PL/pgSQL that does exactly what I need:

CREATE OR REPLACE FUNCTION encoding_utf8(
    bytea
) RETURNS TEXT LANGUAGE PLPGSQL STRICT IMMUTABLE AS $$
DECLARE
    encoding TEXT;
BEGIN
    FOREACH encoding IN ARRAY ARRAY[
        'UTF8',
        'WIN1252',
        'LATIN1'
    ] LOOP
        BEGIN
            RETURN convert_from($1, encoding);
        EXCEPTION WHEN character_not_in_repertoire OR untranslatable_character THEN
            CONTINUE;
        END;
    END LOOP;
END;
$$;

I’d dearly love to know how to do the equivalent in Oracle.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T15:34:42+00:00

Thanks to the key information about the illegal characters in UTF-8 from @collapsar, as well as some digging by a co-worker, I’ve come up with this:

CREATE OR REPLACE FUNCTION reencode(string IN VARCHAR2) RETURN VARCHAR2
AS
    encoded VARCHAR2(32767);
    type  array_t IS varray(3) OF VARCHAR2(15);
    array array_t := array_t('AL32UTF8', 'WE8MSWIN1252', 'WE8ISO8859P1');
BEGIN
    FOR I IN 1..array.count LOOP
        encoded := CASE array(i)
            WHEN 'AL32UTF8' THEN string
            ELSE CONVERT(string, 'AL32UTF8', array(i))
        END;
        IF instr(
            rawtohex(
                utl_raw.cast_to_raw(
                    utl_i18n.raw_to_char(utl_raw.cast_to_raw(encoded), 'utf8')
                )
            ),
            'EFBFBD'
        ) = 0 THEN
            RETURN encoded;
        END IF;
    END LOOP;
    RAISE VALUE_ERROR;
END;

Curiously, it never gets to WE8ISO8859P1: WE8MSWIN1252 converts every single one of the list of 800 or so bad values I have without complaint. The same is not true for my Perl or PostgreSQL implementations, where CP1252 fails for some values but ISO-8859-1 succeeds. Still, the values from Oracle seem adequate, and appear to be valid Unicode (tested by loading them into PostgreSQL), so I can’t complain. This will be good enough to sanitize my data, I think.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

For legacy reasons, we have a VARCHAR2 column in our Oracle 10 database—where the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply