I’m quite inexperienced with RegEx – just an occasional straighforward RegEx for a programming

Question

0

Asked: May 22, 20262026-05-22T18:56:41+00:00 2026-05-22T18:56:41+00:00

I’m quite inexperienced with RegEx – just an occasional straighforward RegEx for a programming

0

I’m quite inexperienced with RegEx – just an occasional straighforward RegEx for a programming task that I worked out by trial and error, but now I have a serious regEx challenge:

I have about 970 text files containing Sybase Transact SQL snippets, and I need to find every table name in those files and preface the table name with ‘ #’. So my options are to either spend a week editing the files by hand or write a script or application using regEx (Python 3 or Delphi-PRCE) that will perform this task.

The rules are as follows:

Table names are ALWAYS upperCase – so I’m only looking for upperCase
words;

Column names, SQL expressions and variables are ALWAYS lowerCase;

SQL keywords, Table aliases and column values CAN BE upperCase, but must NOT be prefixed with ‘ #’;

Table aliases (must not be prefixed) will always have whiteSpace preceding them until the end of the
previous word, which will be a table name.

Column values (must not be prefixed) will either be numerical values or characters enclosed in
quotes.

Here is some sample text requiring application of all the above mentioned rules:

update SYBASE_TABLE
set ok = convert(char(10),MB.limit)
from MOVE_BOOKS MB, PEOPLEPLACES PPL
where MB.move_num = PPL.move_num
AND PPL.mot_ind = 'B'
AND PPL.trade_type_ind = 'P'

So far with I’ve gotten only this far: (not too far…)

(?-i)[[:upper:]]

Any help would be most appreciated.
TIA,

MN

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T18:56:42+00:00

This is not doable with a simple regex-replacement. You will not be able to make a distinction between upper case words that are tables, are string literals or are commented:

update TABLE set x='NOT_A_TABLE' where y='NOT TABLES EITHER' 
-- AND NO TABLES HERE AS WELL

EDIT

You seem to think that determining if a word is inside a string literal or not is easy, then consider SQL like this:

-- a quote: '
update TABLE set x=42 where y=666
-- another quote: '

or

update TABLE set x='not '' A '''' table' where y=666

EDIT II

Okay, I (obsessively) hammered on the fact that a simple regex replacements is not doable. But I didn’t offer a (possible) solution yet. What you could do is create some sort of “hybrid-lexer” based on a couple of different regex-es. What you do is scan through the input file and at the start of each character, try to match either a comment, a string literal, a keyword, or a capitalized word. And if none of these 4 previous patterns matched, then just consume a single character and repeat the process.

A little demo in Python:

#!/usr/bin/env python
import re 

input = """
UPDATE SYBASE_TABLE
SET ok = convert(char(10),MB.limit) -- ignore me!
from MOVE_BOOKS MB, PEOPLEPLACES PPL
where MB.move_num = PPL.move_num
-- comment '
AND PPL.mot_ind = 'B '' X'
-- another comment '
AND PPL.trade_type_ind = 'P -- not a comment'
"""

regex = r"""(?xs)          # x = enable inline comments, s = enable DOT-ALL
  (--[^\r\n]*)             # [1] comments
  |                        # OR
  ('(?:''|[^\r\n'])*')     # [2] string literal
  |                        # OR
  (\b(?:AND|UPDATE|SET)\b) # [3] keywords
  |                        # OR
  ([A-Z][A-Z_]*)           # [4] capitalized word
  |                        # OR
  .                        # [5] fall through: matches any char
"""

output = ''

for m in re.finditer(regex, input): 
    # append a `#` if group(4) matched
    if m.group(4): output += '#'
    # append the matched text (any of the groups!)
    output +=  m.group()

# print the adjusted SQL
print output

which produces:

UPDATE #SYBASE_TABLE
SET ok = convert(char(10),#MB.limit) -- ignore me!
from #MOVE_BOOKS #MB, #PEOPLEPLACES #PPL
where #MB.move_num = #PPL.move_num
-- comment '
AND #PPL.mot_ind = 'B '' X'
-- another comment '
AND #PPL.trade_type_ind = 'P -- not a comment'

This may not be the exact output you want, but I’m hoping the script is simple enought for you to adjust to your needs.

Good luck.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m quite inexperienced with RegEx – just an occasional straighforward RegEx for a programming

Leave an answerCancel reply

1 Answer

EDIT

EDIT II

Leave an answer
Cancel reply