Ok, so I need to create a command that lists the 100 most frequent words in any given file, in a block of text.
What I have at the moment:
$ alias words='tr " " "\012" <hamlet.txt | sort -n | uniq -c | sort -r | head -n 10'
outputs
$ words
14 the
14 of
8 to
7 and
5 To
5 The
5 And
5 a
4 we
4 that
I need it to output in the following format:
the of to and To The And a we that
((On that note, how would I tell it to print the output in all caps?))
And I need to change it so that I can pipe ‘words’ to any file, so instead of having the file specified within the pipe, the initial input would name the file & the pipe would do the rest.
Okay, taking your points one by one, though not necessarily in order.
You can change
wordsto use standard input just by removing the<hamlet.txtbit sincetrwill take its input from standard input by default. Then, if you want to process a specific file, use:or:
You can remove the effects of capital letters by making the first part of the pipeline:
which will lower-case your input before doing anything else.
Lastly, if you take that entire pipeline (with the suggested modifications above) and then pass it through a few more commands:
This prints the second argument of each line (the word) followed by a space, then prints an empty string with terminating newline at the end.
For example, the following script
words.shwill give you what you need:(on one line: I’ve split it for readability) as per the following transcript:
You can achieve the same end with the following alias:
(again, one line) but, when things get this complex, I prefer a script, if only to avoid interminable escape characters 🙂