Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7025995
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T00:03:31+00:00 2026-05-28T00:03:31+00:00

A PHP function I am writing pulls a small bit of HTML data from

  • 0

A PHP function I am writing pulls a small bit of HTML data from another webpage using file_get_contents(), then parses out a piece of text and tries to store it in a database. The problem is, the data it gets must be encoded with a different charset or something (I’m not positive how to check this) because it often adds  (at seemingly random places in the string, not always at beginning or end) and every once in a while adds a new line where I don’t want one. The  is annoying but when the newline is added it causes the javascript function to fail. The javascript function is printed from a php script as follows:

print <<<END
    setUpSend("${a}", "${b}", "${c}", "${d}");
END;

And when the newline is entered, the function no longer works (I suppose because of the newline), and viewing the source shows something like this:

print <<<END
        setUpSend("a information", "b information
", "c information", "d information");
END;

I did some research and found that this  is the UTF-8 BOM (Byte Order Mark) and it is suggested to parse the information as xml not as a string – I found that there are some php libraries to do this (http://php.net/manual/en/book.xml.php) but was thinking there might be an easier way, like a simple php function that will convert it automatically, or strip unwanted characters.

Also, sometimes the information can contain quotes, but since that would mess up the js function as well, I tried to use PHP’s addslashes function and it just doesn’t add any slashes, not working at all. If I manually write the same exact string in php however, and use addslashes on that, it adds the slashes normally, so it makes me think that somehow php can’t understand the encoding of this text I am getting. Something weird is going on but I’m lost on how to fix it.

I’d be more than open to any suggestions as I’ve looked up a lot of stuff but can’t figure out a good way to solve this.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T00:03:32+00:00Added an answer on May 28, 2026 at 12:03 am

    The  might be an UTF-8 encoded BOM. You can normally safely remove it if you know the source encoding is UTF-8.

    That’s a simple string operation:

    $withOutUTF8BOM = remove_UTF8BOM($withOrWithOutUTF8BOM);
    
    
    /**
     * Remove UTF8BOM from the beginning of a string (if it exists)
     *
     * @return string
     */
    function remove_UTF8BOM($str)
    {
        $UTF8BOM = "\xEF\xBB\xBF";
        (0 === strpos($str, $UTF8BOM)) && $str = (string) substr($str, 3);
        return $str;
    }
    

    However, it looks like that you should make your code input encoding aware. HTML data can be in different encodings, so it’s probably worth to normalize the HTML encoding upfront (e.g. convert all non UTF-8 charsets to UTF-8) and then make your own functions properly deal with UTF-8 encoded data.

    A PHP function I am writing pulls a small bit of HTML data from another webpage using file_get_contents(), then parses out a piece of text and tries to store it in a database. The problem is, the data it gets must be encoded with a different charset or something (I’m not positive how to check this)

    You can obtain the response headers after you retrieved the data with file_get_contents. Those are stored in $http_response_header. The following example demonstrates this
    (see HEAD first with PHP Streams for the parse_http_response_header function):

    $url = 'http://example.com/';
    
    $body = file_get_contents($url);
    
    $responses = parse_http_response_header($http_response_header);
    
    $contentType = $responses[0]['fields']['CONTENT-TYPE']; // CONTENT-TYPE
    
    echo "Content-Type: $contentType\n";  # Content-Type: text/html; charset=UTF-8
    

    You only need to check if that header line exists and if a charset has been specified. See the Content-Type­RFC 2616 header specification how it is written:

    list($typeAndSubType, $parameter) = explode(';' $contentType, 2) + array(NULL,NULL);
    

    If there is no media-type given (type and sub-type), you can (but must not) try to guess it. As you’re dealing with HTML, this is normally text/html.

       Content-Type   = "Content-Type" ":" media-type
    
       media-type     = type "/" subtype *( ";" parameter )
       type           = token
       subtype        = token
    

    If no charset parameter is given, take the default charset for that type (text). In HTTP this is ISO‑8859 (ref).

    To properly parse the parameter(s), please see Section 3.6:

       parameter               = attribute "=" value
       attribute               = token
       value                   = token | quoted-string
    

    To properly parse the parameter string I leave as an exercise.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Coming from a PHP background, I'm used to writing small functions that return a
I am writing a tail function in PHP, and using jQuery to refresh a
Basically I am writing a report script where it pulls data from a database
I am writing a query in php using a string sent from a android
I am writing a custom error handling / reporting function for PHP file upload
I'm writing a construct in PHP where a parser determins which function to call
I'm using the PHP function imagettftext() to convert text into a GIF image. The
I writing a php function to check existence of bad whole words (keep in
I am writing a PHP function that will need to loop over an array
i have php function that parses a xml url and gives me an array.this

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.