What I currently do is, Parse texts from a URL, and then clean the

Question

0

Editorial Team

Asked: June 1, 20262026-06-01T19:54:01+00:00 2026-06-01T19:54:01+00:00

What I currently do is, Parse texts from a URL, and then clean the

0

What I currently do is, Parse texts from a URL, and then clean the texts and explode them by spaces and save to a file.

What I find hard is,

Saving only unique files incase of scraping multiple urls:

case : scraped words from site.com/page1 and saved unique words to file. When scraping site.com/page2, I need to check if each word is in the file already and save it only if its not present.

What I have in my mind is, take $word[0], and fgets each line from the file and check and save if its not found. But that would be like thousands – hundred thousand times of iterations.

I am not looking for any codes, but just an idea how to handle it efficiently and fast.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T19:54:03+00:00

I’m assuming that you have already stored unique words you got from site1 in a file called site1.txt, and you’ve already scraped words from site2 in an array called $site2, now you’d like to store $site2 line by line in a file site2.txt, only storing unique words:

$wordsInFile1 = file('site1.txt');
$wordsInFile1 = array_flip($wordsInFile1);

foreach($site2 as $i => $word) {
    if(isset($wordsInFile1[$word])) {
       unset($site2[$i]);
    }
}

// now $site2 contains unique words from site2 and words that are not in site1.txt

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

What I currently do is, Parse texts from a URL, and then clean the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply