Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7780133
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T18:49:05+00:00 2026-06-01T18:49:05+00:00

Im pretty sure this is really basic. However I have no knowledge of Perl

  • 0

Im pretty sure this is really basic. However I have no knowledge of Perl and only need to use it this once. So I appreciate your patience.

I am trying to remove unwanted text from a single line below which is in HTML:

    <a target="_blank"          href="http://sharepoint/sites/cerner/quickreferenceguides/Documents/EXP001_Run_Printable_TCI_List.pdf" onmouseover="return overlib('This guide outlines the process for running a printable TCI List', CAPTION, 'TCI LIST');" onmouseout="return nd();">Run Printable TCI List (<i>Revised<i>)</a> 

All I want to be left with is Run Printable TCI List (<i>Revised</i>) which is the text at the end before the </a>. I have around 500 of these lines and since they could be changed in the future it makes sense to create a program. Below is my Perl code so far:

open (SEARK, 'C:\\HTMLsorter\\sources.txt');
open (OUTSEARK, '>C:\\HTMLsorter\\outseark.txt');
while(<SEARK>) {
  chomp;

  if ($_=~/<a target/) {
    $_ =~ s/\<i>//g;
    $_ =~ s/\<\/i>//g;
    @itemsa = split(/>/);
    @itemsb = split(/</, $itemsa[1]);
    print OUTSEARK ("$itemsb[0]\n");
  }
}
close (SEARK);
close (OUTSEARK);

I’m sure you can read this but just to explain I am opening a file called sources.txt where there are the 500 lines to be sorted. The output file will be outseark.txt. So far it will output this:

Run Printable TCI List (Revised)

This is obviously due to the split aiming at everything in and around the arrows. Any ideas how I keep the italics inside the brackets? To be left with:

Run Printable TCI List (<i>Revised<i>)

Thanks for looking.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T18:49:07+00:00Added an answer on June 1, 2026 at 6:49 pm

    You should use a proper HTML parser, such as HTML::TreeBuilder. The code is no more complex as this program demonstrates

    use strict;
    use warnings;
    
    use HTML::TreeBuilder;
    
    my $tree = HTML::TreeBuilder->new_from_file(*DATA);
    
    print $_->as_text, "\n" for $tree->look_down(_tag => 'a', target => qr/./);
    
    __DATA__
        <a target="_blank"          href="http://sharepoint/sites/cerner/quickreferenceguides/Documents/EXP001_Run_Printable_TCI_List.pdf" onmouseover="return overlib('This guide outlines the process for running a printable TCI List', CAPTION, 'TCI LIST');" onmouseout="return nd();">Run Printable TCI List (<i>Revised<i>)</a> 
    

    output

    Run Printable TCI List (Revised)
    

    Edit

    To use this technique on the files in your example, the code looks like this

    use strict;
    use warnings;
    
    use HTML::TreeBuilder;
    
    my $tree = HTML::TreeBuilder->new_from_file('C:\HTMLsorter\sources.txt');
    
    open my $out, '>', 'C:\HTMLsorter\outseark.txt' or die $!;
    
    print $out $_->as_text, "\n" for $tree->look_down(_tag => 'a', target => qr/./);
    

    Edit 2

    Now that I understand better what you need I can offer this alternative solution. It uses the HTML::DOM module to access the Document Object Model of an HTML document, as getting the result you needed with HTML::TreeBuilder is relatively difficult.

    I’ve also noticed that your sample HTML contains <i>Revised<i> which clearly should be <i>Revised</i>, and I have corrected it for this sample test. Regardless, Perl trieds to parse bad HTML as a browser would, and even with the error the output is useable.

    use strict;
    use warnings;
    
    use HTML::DOM;
    
    my $dom = HTML::DOM->new;
    $dom->parse_file('C:\HTMLsorter\sources.txt') or die $!;
    
    open my $out, '>', 'C:\HTMLsorter\outseark.txt' or die $!;
    print $out $_->innerHTML, "\n" for grep $_->attr('target'), $dom->getElementsByTagName('a');
    

    output

    (With tags corrected)

    Run Printable TCI List (<i>Revised</i>)
    

    (With original tags)

    Run Printable TCI List (<i>Revised<i>)</i></i>
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm pretty sure that this is a basic question but I really don't know
This is probably a really newbie question (well, I'm pretty sure it is), but
I'm pretty sure this is something really silly. Instead of spending another 10 minutes
I am pretty sure this is a basic syntax error, I am new at
I'm pretty sure this is a simple question, but I have no idea where
I'm pretty sure this is a really fundamental concept in Python, I'd love it
Pretty sure this question counts as blasphemy to most web 2.0 proponents, but I
Im pretty sure this has a simple solution. I am using jCarousellite, and i
Am pretty sure this is a cake 1.3 question - the plugin is not
I'm pretty sure this is a simple question in regards to formatting but here's

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.