I need to fetch HTML from Turkish webpages using Java. However, I am finding

Question

0

Asked: May 31, 20262026-05-31T05:52:18+00:00 2026-05-31T05:52:18+00:00

I need to fetch HTML from Turkish webpages using Java. However, I am finding

0

I need to fetch HTML from Turkish webpages using Java. However, I am finding that my Java code is not able to pick up certain Turkish characters. Here is the Java code I am using:

import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.InputStream;
import java.net.URL;

public class fetchHTML {
public static void main(String[] args) throws Exception {

    URL urls = new URL("http://www.parkbravo.com.tr/pantolon.php");
    InputStream is = urls.openStream();  
    DataInputStream dis = new DataInputStream(new BufferedInputStream(is));

    String line;

    while ((line = dis.readLine()) != null) {
        System.out.println(line);
    }
}
}

The first few lines of output of this code are:

ï»¿<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />
<html lang="tr" xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml">
<head>
<title>ParkBravo - ÃrÃ¼nler - Pantolonlar</title>

You can see that the title is incorrect: ÃrÃ¼nler should be Ürünler

If I use the following Python code to get the HTML:

import urllib2

url = 'http://www.parkbravo.com.tr/pantolon.php' 

usock = urllib2.urlopen(url)
data = usock.read()
usock.close()

print data

then the output is correct. Title comes out as:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />
<html lang="tr" xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml">
<head>
<title>ParkBravo - Ürünler - Pantolonlar</title>

But I want to be able to get the HTML with Java. Does anyone know how I can get this working?

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T05:52:19+00:00

readLine() in DataInputStream is Deprecated. You should use a Reader, which handles the conversion from bytes to characters correctly.

If you use InputStreamReader, you can specify the encoding in the constructor and if you wrap it in BufferedReader, you can read lines.

Instead of

 DataInputStream dis = new DataInputStream(new BufferedInputStream(is));

you can have

 BufferedReader reader = new BufferedReader(new InputStreamReader(is, "UTF-8"))

Where “UTF-8” can be replaced by whatever encoding you need.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to fetch HTML from Turkish webpages using Java. However, I am finding

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply