Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8712897
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T05:12:30+00:00 2026-06-13T05:12:30+00:00

In a previous question I was told that Google passes UTF-8 encoded responses to

  • 0

In a previous question I was told that Google passes UTF-8 encoded responses to queries. This solved a problem with non-breaking spaces (A0) being muddled after being passed by curl to my terminal. This was solved by piping the curl output to inconv and converting to UTF-8. However, even with this solution in place, I am still getting some strange output.

Consider the following conversion of 2 m to feet:

http://www.google.com/ig/calculator?hl=en&q=2%20m%20in%20feet

This is the output I’m seeing in my browser and elsewhere:

{lhs: "2 meters",rhs: "6.56167979 feet (6 feet 6\x3csup\x3e47\x3c/sup\x3e\x26#8260;\x3csub\x3e64\x3c/sub\x3e inches)",error: "",icc: false}

The expected output is:

{lhs: "2 meters",rhs: "6.56167979 feet (6 feet 6 47/64 inches)",error: "",icc: false}

I could just do a text replace using regular expressions or some other solution, but I would like to know what’s happening here. Any insight?

I am running Mac OS X Mountain Lion 10.8.2

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T05:12:32+00:00Added an answer on June 13, 2026 at 5:12 am

    Google Calculator as accessed via curl is returning JSON. Google is using \xHH notation as that is standard for JSON. If the output was being sent to a browser (or anything else that parses HTML) instead of standard output, only a good JSON decoder would be necessary.

    Let’s see what we can do from the command line to parse the JSON.

    echo -en $(curl -s 'http://www.google.com/ig/calculator?hl=en&q=4^22') > ~/temp.html

    This gets us valid HTML which we can view via a browser, but we need to reduce everything to something that can display via standard output.

    echo -en "$(curl -s --connect-timeout 10 "http://www.google.com/ig/calculator?hl=en&q=2%20m%20in%20feet")" | sed -e 's/<sup>/ &/g' -e :a -e 's/<[^>]*>//g;/</N;//ba' | perl -MHTML::Entities -ne 'print decode_entities($_)' | iconv -f ISO-8859-1 -t UTF-8

    For the echo command, the -e interprets escapes such as \x3e, \x3c, and \x26 (<, >, and & respectively), while the -n suppresses the newline that echo would normally add.

    The pipe to sed adds a space before all (superscript) tags and then removes all HTML tags.

    The pipe to perl then decodes all the HTML entities such as ⁄ to ⁄ (fraction slash).
    http://en.wikipedia.org/wiki/Html_special_characters#Character_entity_references_in_HTML

    The pipe to iconv converts the ISO-8859-1 output to the expected UTF-8. This is done last since the perl line can produce UTF-8 entities that will need to be properly converted.

    This is still going to have issues with distinguishing between fractions and exponents (47/64 where 47 is wrapped in superscript tags and 64 is wrapped in subscript tags, and 10^13 where 13 is wrapped in superscript tags).

    We could get super silly and make a really long sed line to parse all the special characters (the following is in AppleScript so you can see just how ridiculous the syntax gets):

    set jsonResponse to do shell script "curl " & queryURL & " | sed -e 's/[†]/,/g' -e 's/\\\\x26#215;/*/g' -e 's/\\\\x26#188;/ 1\\/4/g' -e 's/\\\\x26#189;/ 1\\/2/g' -e 's/\\\\x26#190;/ 3\\/4/g' -e 's/\\\\x26#8539;/ 1\\/8/g' -e 's/\\\\x26#8540;/ 3\\/8/g' -e 's/\\\\x26#8541;/ 5\\/8/g' -e 's/\\\\x26#8542;/ 7\\/8/g' -e 's/\\\\x3csup\\\\x3e\\([0-9]*\\)\\\\x3c\\/sup\\\\x3e\\\\x26#8260;\\\\x3csub\\\\x3e\\([0-9]*\\)\\\\x3c\\/sub\\\\x3e/ \\1\\/\\2/g' -e 's/\\\\x3csup\\\\x3e\\([0-9]*\\)\\\\x3c\\/sup\\\\x3e/^\\1/' -e 's/( /(/g'"

    The † (dagger) character is 160 in decimal within the MacRoman set (Macintosh encoding). In hexadecimal this is 0xA0 or \xA0 and encodes to the Non-Breaking Space in UTF-8 encoding, which is what Google is passing. So in AppleScript, in order to replace the Non-Breaking Space from UTF-8, we have to use the † (dagger) due to the Macintosh encoding.

    • http://en.wikipedia.org/wiki/Mac_Roman#Codepage_layout
    • http://en.wikipedia.org/wiki/UTF-8
    • http://en.wikipedia.org/wiki/C1_Controls_and_Latin-1_Supplement

    There are also several special fraction symbols that the sed line deals with:
    http://tlt.its.psu.edu/suggestions/international/bylanguage/mathchart.html#fractions

    The moral of the story is that when dealing with JSON, just use a good JSON parser.

    A sub-moral is: don’t use AppleScript to deal with JSON.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

This question is related to a previous question of mine That's my current code
This question follows on from a previous question, that has raised a further issue.
I was told in a previous question that my query is prone to SQL
I was told in a previous question that there can be issues when using
My previous question has solved my problem, but left me with a lack of
I was getting help related to a previous question but then told to ask
In previous question of mine, someone had meantioned that using Semaphores were expensive in
This question is extended part of my previous question, Finding number position in string
This question is based on my previous question which I got a working answer
This question is related to my previous question How to generate Cartesian Coordinate (x,y)

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.