I have a CSV file with two columns: cat @ c a t dog

Question

0

Asked: May 28, 20262026-05-28T00:31:50+00:00 2026-05-28T00:31:50+00:00

I have a CSV file with two columns: cat @ c a t dog

0

I have a CSV file with two columns:

cat @ c a t
dog @ d o g
bat @ b a t

To simplify communication, I’ve used English letters for this example, but I’m dealing with CJK in UTF-8.

I would like to delete any character appearing in the second column, which appears on fewer than 20 lines within the first column (characters could be anything from numbers, letters, to Chinese characters, and punctuation, but not spaces).

For e.g., if “o” appears on 15 lines in the first column, all appearances of “o” are deleted from the second column. If “a” appears on 35 lines in the first column, no change is made.

The first column must not be changed.
I don’t need to count multiple appearances of a letter on a single line. For e.g. “robot” has 2 o’s, but this detail is not important, only that “robot” has an “o”, so that is counted as one line.

How can I delete the characters that appear less than 20 times?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T00:31:51+00:00

Here is a script using awk. Change the var num to be your frequency cutoff point. I’ve set it to 1 to show how it works against a small sample file. Note how f is still deleted even though it shows up three times on a single line. Also, passing the same input file twice is not a typo.

awk -v num=1 '
BEGIN { OFS=FS="@" }
FNR==NR{
    split($1,a,"")
    for (x in a)
        if(a[x] != " " && !c[a[x]]++)
            l[a[x]]++
    delete c
    next
}
!flag++{
    for (x in l)
        if (l[x] <= num)
            cclass = cclass x
}
{
     gsub("["cclass"]", " " , $2)
}1' ./infile.csv ./infile.csv

Sample Input

$ cat ./infile
fff @ f f f
cat @ c a t
dog @ d o g
bat @ b a t

Output

$ ./delchar.sh
fff @
cat @  a t
dog @
bat @  a t

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a CSV file with two columns: cat @ c a t dog

Leave an answerCancel reply

1 Answer

Sample Input

Output

Leave an answer
Cancel reply