Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6655655
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T01:32:48+00:00 2026-05-26T01:32:48+00:00

I am scraping a site, and it will scrape two address boxes – each

  • 0

I am scraping a site, and it will scrape two address boxes – each of which may have minor differences.

One of the addresses is like this:

ONE MICROSOFT WAY
REDMOND WA 98052-6399
425-882-8080

And the other is like this:

ONE MICROSOFT WAY
REDMOND WA 98052-6399

I save the entire string for both (there’s HTML tags around them in the original but I didn’t think it was necessary to illustrate my point), and then separate based on those HTML tags. This means it processes each newline (i.e. ONE MICROSOFT WAY) as a separate variable.

What I want to do is to see if there are duplicates between the addresses, the problem is that they’re separate values in the first array, and then in the internal array, it is separating each of the two addresses line by line.

So basically, is there a way to check for duplicate values?

Here is sample data:

<div class="mailer">
Mailing Address
<span class="mailerAddress">ONE MICROSOFT WAY</span>
<span class="mailerAddress">REDMOND WA 98052-6399</span>
<div class="mailer">
Business Address
<span class="mailerAddress">ONE MICROSOFT WAY</span>
<span class="mailerAddress">REDMOND WA 98052-6399</span>
<span class="mailerAddress">425-882-8080</span>
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T01:32:49+00:00Added an answer on May 26, 2026 at 1:32 am

    I used the following method to clean data..
    first, find the pattern, maybe like if the array A is array('Hello', 'World') and array B is array('Hello World') you can merge by saying if(count(array) > 1) array = array[0] . ' ' .array[1]
    as for your case, lets say, each line is wrapped by HTML tags, but each line is stored within different array, am i wrong?

    I would be glad if you can give sample data..
    I’ll use whatever for my sample code below…

    <?
    $sampleData = array(
      array('<p>ONE MICROSOFT WAY</p>', 'REDMOND'),
      array('<p>ONE MICROSOFT WAY</p>', 'REDMOND', 'Number')
    );
    
    foreach($data as $value) {
      unset($newKey);
      $newKey = trim(strip_tags($value[0])).trim(strip_tags($value[1]));
      $cleanData[$newKey] = $value;
    }
    ?>
    

    the point is, same keys will overwrite, ending up outputting unique keys (which stores unique values)…

    another sample is that if you wanted to clear similar email address out of data stored in csv/array..

    <?    
    foreach($data as $value) {
      $cleanData[$value['email']] = $value;
    }
    ?>
    

    as simple as that.. $cleanData should now contain no data with similar email addresses..

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

The site I was screen scraping (Which I have creds for) recently changed their
We have a series of products with built in web servers each of which
I have a PHP scraper script which I use to scrape a page on
As i was scraping one site and i was able to make it to
I'm building an HTTP API which does a heavy site scraping in the background.
I've done site scraping of secure page of any site on http by below
I am scraping data from web site using my java application and want to
So, I'm trying to do some screen scraping off of a certain site using
I have a web scraping script that gets new data once every minute, but
I'm scraping data from the web, and I have several processes of my scraper

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.