I wrote a tokenizer for HTTP messages in Java. It has a method nextToken() which supposed to return a string containing the whole HTTP message that was received. The problem is that the message ends before the expected body size has been read.
I read the input stream all the way to the beginning of the body. Then I try to read n bytes from the stream where n is the size in bytes of the body which is stated in the Content-Length header. The problem is that inside the while loop, the line charsRead = in.read(buffer) blocks because there is no more input in the input stream. But it happens before n bytes were read.
Example: In a body with size 12,493, it blocks when there are more 675 bytes expected to be read.
The input stream works with UTF-8 so every byte is encoded to one char.
/* Somewhere else in the code:
InputStreamReader _isr =
new InputStreamReader(clientSocket.getInputStream(), "UTF-8")
*/
BufferedReader in = new BufferedReader(_isr);
StringBuilder tmp = new StringBuilder();
String line = "";
boolean body = false;
int bodylen = -1;
for (;;) {
line = in.readLine();
if (line == null)
break;
if (line.equals("")) { /* We've reached the body */
body = true;
break;
}
tmp.append(line + "\r\n");
if ((bodylen == -1) && (line.contains("Content-Length:"))) {
/* Make `bodylen` hold the length of the body */
String[] splitted = line.split("Content-Length:");
bodylen = Integer.parseInt(splitted[1].trim());
}
}
if (body == true) {
int charsRead;
char[] buffer = new char[1024];
while (bodylen > 0) {
charsRead = in.read(buffer);
if (charsRead == -1)
break;
bodylen -= charsRead;
tmp.append(buffer);
}
}
Why does it happen and how to solve it?
It seems you are confusing characters with bytes. Content-Length is in bytes, but your are counting characters.