I am currently visualizing word and phrase frequency across a large database of textual information (approximately 108MB spread across 307 text files). My goal is to have a way to quickly see what files are the most relevant and in a visually attractive format (although this project will probably also demonstrate that just having textual representation is always clearer).
Right now I have the following:
SetDirectory["/MYMATHEMATICADIRECTORY/"];
filelist = FileNames[];
viewerCount1 = {0};
viewerCount2 = {0};
word1 = "freedom";
word2 = "liberty";
Do[
searchDB = StringSplit[Import[filename]];
AppendTo[viewerCount1, Count[searchDB, word1]];
AppendTo[viewerCount2, Count[searchDB, word2]];
, {filename, filelist}]
list3 = Take[viewerCount1, {2, -1}]
list4 = Take[viewerCount2, {2, -1}]
The FileNames[ ] generates a list such as: {“001ABbenevolat.txt-cleaned.txt”, “002abnature.txt-cleaned.txt”, “003aboriginaldocs.txt-cleaned.txt”, “004ABpresse.txt-cleaned.txt”, “005acadian.txt-cleaned.txt”, “006acadiedelile.txt-cleaned.txt”,”007acfa.txt-cleaned.txt”} [except with 307 entries, all numbered].
list3 generates a list such as: {0, 0, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 100, 2, 0, 0, 0, 10, 1, 7, 0, 0, 0, 0, 23, 3, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 9, 0, 1, 0, 1, 0, 5, 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 4, 0, 0, 0, 1, 11, 0, 2, 0, 0, 2, 7, 1, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 13,…} and so on.
The command:
BarChart3D[{list3, list4}, BarSpacing -> {0.5, 0}, ChartLayout -> "Grid"]
Generates something close to what I want (imagining them as file folders sticking up). However, I want to add meaningful tool-tips. By default, it comes up with frequency. Would there be a quick way to also include the filename the frequency is attached to, as well as the frequency? i.e. a tool-tip that brings up ‘007acfa.txt-cleaned.txt — 32’ where 32 occurrences appear in file 7?
As an example, suppose you data is something like
Then you could do something like
Edit
Another way is to use
LabelingFunction: