I am trying to make an executable that will take in any number of

Question

0

Asked: June 18, 20262026-06-18T11:05:19+00:00 2026-06-18T11:05:19+00:00

I am trying to make an executable that will take in any number of

0

I am trying to make an executable that will take in any number of text files and give an output that is the distribution of words by number of occurrences. This is to be done in bash scripting, and what I have so far is:

#!/bin/bash
y=$(cat $* | wc -w)

cat $* | tr ' ' '//' |  tr '[:upper:]' '[:lower:]' | tr -d '[:punct:]' | 
grep -v '[^a-z]'| sort | uniq -c | sort -rn | head -$y

I get an error trying to set y and I can’t figure out how to get head to print out every word otherwise.

Is there a better way to print it out?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T11:05:22+00:00

Why run head at all? There’s no guarantee that there will be as many words as there are words in the files; indeed, it is practically guaranteed that there won’t be (since there’ll be some repeated words). And if you want to see all the data, then show all the data; don’t filter the output from sort -nr.

The first tr only needs one slash, I think. Normally, you’d map blanks and punctuation to newlines (with a -s option to tr to squeeze adjacent newlines to one). The slashes from the first tr count as punctuation in the third tr, so it isn’t obvious what you’re up to there. I think I’d expect to see something like:

cat "$@" |
tr -cs '[:alpha:]' '\n' |      # Convert any non-alpha character to newline
tr '[:upper:]' '[:lower:]' |   # Case-convert to lower case
sort | uniq -c | sort -nr

Note the use of "$@" rather than $*; there’s no difference when the file names you specify don’t contain blanks (newlines, tabs, etc); when they do, the "$@" form is correct and $* is not, so you may as well always use "$@". It is correct far more often than $* is.

For some C source code I had lying around, the output from the script was:

 246 n
 217 i
 153 int
 141 list
 124 if
 118 t
 103 char
  99 a
  97 size
  90 buffer
  89 context
  82 d
  81 void
  79 include
  79 h
  78 s
  65 for
  62 j
  55 ptr
  54 r
  54 const
  53 static
  53 sem
  51 pthread
  49 z
  49 oldneedle
  49 err
  47 to
  47 return
  46 mutex
  44 printf
  43 error
  43 c

Note that the word ‘h’ appears as often as the word ‘include’; there’s a reason for that! The word t appears a lot, but that’s because, for example, size_t is treated as two words by the filtering. Preserving underscores is possible; change the first tr to use '[:alpha:]_' (note the underscore). You eliminated digits, but you can keep those too if you want.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to make an executable that will take in any number of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply