I’m creating an app which screen scrapes a site with the help of user input (street, city). However, a street or a city could have the characters ‘å’, ‘ä’ and ‘ö’, which needs to be encoded.
I’ve tried encodeURIComponent, but it doesn’t output the same as the site I’m about to scrape when inputing street and city directly in the form on the page (se below). What can I use/do instead to get the desired output?
var url = 'http://www.foosite.com/result.jspv?street=' +
encodeURIComponent(street) + '&city=' + encodeURIComponent(city);
From my app: http://www.foosite.com/result.jspv?street=Vaktarev%C3%A4gen&city=M%C3%B6nster%C3%A5s
From the site: http://www.foosite.com/result.jspv?street=Vaktarev%E4gen&city=M%F6nster%E5s
The site is probably using
escape, which returns a Unicode code point. In contrast,encodeURIComponentgenerates a UTF-8 value. (Scroll down to theU+00E4code point on http://www.utf8-chartable.de/ for the differing values ofä.)I would generally discourage use of the
escapefunction, since it is not specified in any kind of standard. However, since this probably the easiest way to match the behavior of the site (which sadly does not use the preferredencodeURIComponent), it’s certainly the best option for you here. Despite being nonstandard, it should work in virtually all browsers.