I’m trying to search across a large array of textual files in Mathematica 8

Question

0

Asked: May 25, 20262026-05-25T17:38:04+00:00 2026-05-25T17:38:04+00:00

I’m trying to search across a large array of textual files in Mathematica 8

0

I’m trying to search across a large array of textual files in Mathematica 8 (12k+). So far, I’ve been able to plot the sheer numbers of times that a word appears (i.e. the word “love” appears 5,000 times across those 12k files). However, I’m running into difficulty determining the number of files in which “love” appears once – which might only be in 1,000 files, with it repeating several times in others.

I’m finding the documentation WRT FindList, streams, RecordSeparators, etc. a bit murky. Is there a way to set it up so it finds an incidence of a term once in a file and then moves onto the next?

Example of filelist:

{“89001.txt”, “89002.txt”, “89003.txt”, “89004.txt”, “89005.txt”, “89006.txt”, “89007.txt”, “89008.txt”, “89009.txt”, “89010.txt”, “89011.txt”, “89012.txt”, “89013.txt”, “89014.txt”, “89015.txt”, “89016.txt”, “89017.txt”, “89018.txt”, “89019.txt”, “89020.txt”, “89021.txt”, “89022.txt”, “89023.txt”, “89024.txt”}

The following returns all of the lines with love across every file. Is there a way to return only the first incidence of love in each file before moving onto the next one?

FindList[filelist, "love"]

Thanks so much. This is my first post and I’m largely learning Mathematica through peer/supervisory help, online tutorials, and the documentation.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T17:38:04+00:00

In addition to Daniel’s answer, you also seem to be asking for a list of files where the word only occurs once. To do that, I’d continue to run FindList across all the files

res =FindList[filelist, "love"]

Then, reduce the results to single lines only, via

lines = Select[ res, Length[#]==1& ]

But, this doesn’t eliminate the cases where there is more than one occurrence in a single line. To do that, you could use StringCount and only accept instances where it is 1, as follows

Select[ lines, StringCount[ #, RegularExpression[ "\\blove\\b" ] ] == 1& ]

The RegularExpression specifies that “love” must be a distinct word using the word boundary marker (\\b), so that words like “lovely” won’t be included.

Edit: It appears that FindList when passed a list of files returns a flattened list, so you can’t determine which item goes with which file. For instance, if you have 3 files, and they contain the word “love”, 0, 1, and 2 times, respectively, you’d get a list that looked like

{, love, love, love }

which is clearly not useful. To overcome this, you’ll have to process each file individually, and that is best done via Map (/@), as follows

res = FindList[#, "love"]& /@ filelist

and the rest of the above code works as expected.

But, if you want to associate the results with a file name, you have to change it a little.

res = {#, FindList[#, "love"]}& /@ filelist
lines = Select[res, 
         Length[ #[[2]] ] ==1 &&  (* <-- Note the use of [[2]] *)
         StringCount[ #[[2]], RegularExpression[ "\\blove\\b" ] ] == 1&
        ]

which returns a list of the form

{ {filename, { "string with love in it" }, 
  {filename, { "string with love in it" }, ...}

To extract the file names, you simply type lines[[All, 1]].

Note, in order to Select on the properties you wanted, I used Part ([[ ]]) to specify the second element in each datum, and the same goes for extracting the file names.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to search across a large array of textual files in Mathematica 8

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply