I have a csv file which has two columns, a numeric ID ( IDVAR

Question

0

Asked: May 30, 20262026-05-30T01:56:23+00:00 2026-05-30T01:56:23+00:00

I have a csv file which has two columns, a numeric ID ( IDVAR

0

I have a csv file which has two columns, a numeric ID (IDVAR) and an associated value (VAL). The second variable contains non-alphabetic garbage characters which need cleaning up. The structure looks like this:

IDVAR   VAL
001     abc - 1
002     zfas $^6
003     asdf_78
004     hg :65

I want to throw out the "-", "_", "1", "$", "^" etc. from the 2nd variable only, i.e. remove a specified set of characters from VAL, without touching IDVAR.

Post-Solution Edit: Many thanks to SiegeX for such an elegant solution. Please note that my file is indeed comma-separated, so I just have to add an “-F,” option to his awk command.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T01:56:24+00:00

This will work for you:

awk 'NR>1{t=$1;gsub(/[^[:alpha:]]/,"");$0=t "\t" $0}1' file

Example

$ awk 'NR>1{t=$1;gsub(/[^[:alpha:]]/,"");$0=t "\t" $0}1' file
IDVAR   VAL
001     abc
002     zfas
003     asdf
004     hg

Explanation

NR>1 : Skip the header row containing IDVAR VAL
t=$1 : Save the first field (IDVAR) into temporary variable ‘t’
gsub(/[^[:alpha:]]/,"") : Regex that says to replace all non-alphanumeric characters with the empty string. Note gsub() applies to the entire line which is why we used ‘t’ above
$0=t "\t" $0 : Prepend the variable ‘t’ to the beginning of the line separated by a tab
1 : Awk shortcut for print $0 since ‘1’ is always true and the default behavior for a true statement when not explicitly specified is to print the current line.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a csv file which has two columns, a numeric ID ( IDVAR

Leave an answerCancel reply

1 Answer

Example

Explanation

Leave an answer
Cancel reply