I am not sure if this is even possible but I am trying to extract all the anchor tag links in a few HTML files on my website. I have currently written a php script that scans a few directories and sub directories that builds an array of HTML file links. Here is that code:
$di = new RecursiveDirectoryIterator('Migration');
$migrate = array();
foreach (new RecursiveIteratorIterator($di) as $filename => $file) {
if (eregi("\.html",$file) || eregi("\.htm",$file) ) {
$migrate[] .= $filename;
}
}
This method successfully produces the HTML File links that I need. Ex:
Migration/administration/billing/Billing.htm
Migration/administration/billing/_notes/Billing.htm.mno
Migration/administration/new business/_notes/New Business.htm.mno
Migration/administration/new business/New Business.htm
Migration/account/nycds/_notes/NYCDS Index.htm.mno
Migration/account/nycds/NYCDS Index.htm
There’s more links but this gives you an idea. The next part is where I am stuck. I was thinking that I would need a for loop to loop through each array element, open the file, extract the links, then store those links somewhere. I am just not sure how I would go about this process. I tried to google this question but I never seemed to get results that matched what I was looking to do. Here is the simplified for loop that I have.
var obj = <?php echo json_encode($migrate); ?>;
for(var i=0;i< obj.length;i++){
// alert(obj[i]);
}
The above code is in javascript. From what I am reading, It seems that I shouldn’t be using javascript but should maybe continue using PHP. I am confused on what my next steps should be. If someone can point me in the right direction I would really appreciate it. Thank you so much for your time.
Use
DOMDocument::getElementsByTagNameto retrieve all<a>tagshttp://www.php.net/manual/en/domdocument.getelementsbytagname.php
Example,