Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6030065
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T04:58:52+00:00 2026-05-23T04:58:52+00:00

I am completely lost with encoding issues, I have no idea what’s going on,

  • 0

I am completely lost with encoding issues, I have no idea what’s going on, what the problem is exactly and how to fix it.

Basically I’m just trying to read an HTML file from a Zip file, parse it then output pieces to XML. Now something funky is happening with the text I get out of the parser.

When parsing the HTML, instead of a space I get á only if I write to the screen. If I keep it in a variable and write to a file it looks fine in the file. However even though it looks right in the XML something is wrong with it, my PHP parser can’t parse that XML nor does IE seem to like it.

I had to first mb_convert_encoding($xmlcontent, "ASCII"); so I could get that XML to parse in PHP.

Any idea what my problem is?

  1. extract HTML from a .tar.gz file using Perl

    my $tar = Archive::Tar->new;
    $tar->read("myfile.tar.gz");
    $tar->extract_file('index.html', 'output.html');
    
  2. load HTML, this is where it starts to get funky, I get output like Numberáofásourceálines

    my $tree = HTML::TreeBuilder->new;
    $tree->parse_file('output.html') or die $!;
    $tree->elementify;
    
  3. write to XML

    my $output = new IO::File(">output.xml");
    my $writer = new XML::Writer(OUTPUT => $output, DATA_MODE => 1,DATA_INDENT => 2);
    
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T04:58:53+00:00Added an answer on May 23, 2026 at 4:58 am

    I think I just fixed it by processing this on the html before parsing it, thanks for all the great pointers!

    s/\&nbsp\;/ /g;
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

We've completely lost our repository and we have 8 developers with uncomitted changes. Restoring
I am completely lost on this one-- I have a .Rpt file that runs
This is very stupid but I seem to be completely lost trying to test
Hey I am completely lost with this one. Basically the website i'm working on
I have an assignment to correct an ambiguous BNF, but I am completely lost.
I'm completely new to AIR but what I'm trying to do feels like it
I'm completely lost on this one: System.getProperty(user.home) and System.getProperty(user.name) returns a questionmark ?. System-Specs:
I'm completely lost as to why this isn't working. Should work precisely, right? UserName
I'm struggling with a deployment issue which leaves me completely lost. It goes like
Hey all, I am completely lost on this one. I found a code snippet

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.