I have a collection of documents and I’m trying to pull the dates out

Question

0

Asked: May 26, 20262026-05-26T17:41:18+00:00 2026-05-26T17:41:18+00:00

I have a collection of documents and I’m trying to pull the dates out

0

I have a collection of documents and I’m trying to pull the dates out of them. They are plain text and HTML mostly but the date formats they use very greatly (though they are all English dates). How can I find and parse dates like this in a long string of text?

updated 2011-03-21T00:43:14
Sunday, March 20, 2011
Wednesday, March 16, 2011 | 11:25 AM
March 20, 2011 @ 12:21 pm
May 5, 2011
Published March 19, 2011
Some text here (March 19, 2011)
10/28/2011 21:16
<a href="#>Author Name</a> on Mar 17th 2011 ...
Location, ABBR., Jan. 8, 2008
01/07/2008 (6:00 pm)
By Author Name and Company 03/19/2011 09:59
Posted by Author Name on March 16, 2011 at 03:20 PM EDT

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T17:41:19+00:00

Have a look at the strtotime function.

// Output: March 20th, 2011 12:00:00 AM
echo date( 'F jS, Y h:i:s A', strtotime( "Sunday, March 20, 2011"));

Edit: Here is a more complete example showing how to parse a bunch of the dates provided.

<?php
$dates = array( '03/19/2011 09:59', 'Wednesday, March 16, 2011 | 11:25 AM', 'Sunday, March 20, 2011', 'March 20, 2011 @ 12:21 pm', 'May 5, 2011');
foreach( $dates as $date)
{
    echo $date . ' ---- ' . date( 'F jS, Y h:i:s A', strtotime( str_replace( array( '@', '|'), '', $date))) . "<br />\n";
}

Demo

Of course, some dates will not parse as-is since they are not supported by the list of date formats – For those, you’ll need to do some additional filtering / parsing to either extract their date or form them into a string suitable for strtotime.

Edit: Since there’s an interest in further processing of the input string, here is an example of how you can parse the text without using a regex to get the dates out. Notice how some of the dates just can’t be extracted, for this you will either need more string processing, or to use a regex.

As a side note, I would investigate using a regex if the provided string is only one of many variants of lines that contain dates. However, if the provided string is the only formats that the dates will be found in, string processing should be enough.

$str = 'updated 2011-03-21T00:43:14
Sunday, March 20, 2011
Wednesday, March 16, 2011 | 11:25 AM
March 20, 2011 @ 12:21 pm
May 5, 2011
Published March 19, 2011
Some text here (March 19, 2011)
10/28/2011 21:16
<a href="#">Author Name</a> on Mar 17th 2011 ...
Location, ABBR., Jan. 8, 2008
01/07/2008 (6:00 pm)
By Author Name and Company 03/19/2011 09:59
Posted by Author Name on March 16, 2011 at 03:20 PM EDT';

foreach( explode( "\n", $str) as $line)
{
    $line = str_replace( array( '@', '|', '(', ')'), '', trim( $line));
    $line = strip_tags( $line);
    if( ($time = strtotime( $line)) === false)
    {
        echo "Could not parse line - '" . $line . "'\n"; // Need additional processing / regex here
        continue;
    }
    echo "Converted '" . $line . "' to '" . date( 'F jS, Y h:i:s A', $time) . "'\n";
}

Demo.

Final Edit:

Finally, an example how to do some text processing to get more of the dates to parse.

foreach( explode( "\n", $str) as $line)
{
    $line = str_replace( array( '@', '|', '(', ')', 'Published', '...'), '', trim( $line));
    $line = strip_tags( trim( $line));
    if( ($time = strtotime( $line)) === false)
    {
        if( !(($on_position = stripos( $line, 'on')) === false))
        {
            $line = substr( $line, $on_position + 3);
            if( ($time = strtotime( trim( $line))) === false)
            {
                echo "Could not parse line that contains 'on' - '" . $line . "'\n";
                continue;
            }
        }
        echo "Could not parse line - '" . $line . "'\n";
        continue;
    }
    echo "Converted '" . $line . "' to '" . date( 'F jS, Y h:i:s A', $time) . "'\n";
}

Demo

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a collection of documents and I’m trying to pull the dates out

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply