I’m trying to match a regex string against a data file in perl, but it constantly keeps skipping the exact line that I’m heading for… what can possibly be wrong here?
My file says:
<div class="definitionBox details" id="id-udt">
<span class="stempel">Udtale</span>
<span class="tekstmedium allow-glossing">
<span class="lydskrift"><span class="diskret">[</span>beˈgønˀə<span class="diskret">]</span></span>
</span>
I’m going for the class “lydskrift” line, so I tried to grab its content in multiple ways until I ended up trying to match just everything like so:
while (<FILE>) {
if ( <FILE> =~ m/(.+)/ ) {
open FARA, '>>:encoding(UTF-8)', 'udtale.txt';
print (FARA $1 . "\n");
close (FARA);
}
}
Surprisingly it keeps giving me this:
<div class="definitionBox details" id="id-udt">
<span class="tekstmedium allow-glossing">
</span>
Interestingly enough, it matches all four lines if I put them in a DATA area inside the same perl file! But that’s not what I want, so what makes the difference here?
First of all, I think your file has one more line at the top that you’re not including. The reason for my suspicion is below.
Your problem isn’t the regex, your problem is that
<FILE>reads a line each time you call it. So every run through your loop reads one line in thewhile(<FILE>)and then another one in theif(<FILE> =~ m/(.+)/). Yourifshould be just this:so that it uses the default
$_variable that thewhile(<FILE>)will be populating.Furthermore, your
whileloop is doing a lot more work than it has to, you could just do this:or even this:
If you’re trying to skip blank lines, then maybe this: