I am getting a stream of byte data from a telnet session via TcpClient.GetStream().ReadByte(). I am then converting this byte data to ASCII via char casting. The data comes through fine, but with a lot of extra junk like 1[01;001H[0k[01.
Anyone have any idea what this extra junk might be?
UPDATE
More detailed response stream below
1[01;001H[0K[01;017H[0;1;4mTitle of Page Here[0;1m[0;1m[02;001H[02;051H[0KWed Mar 28, 2012 03:03 pm[02;051HDate Time Here[0J[03;001H[0J[23;001H[0J[0;1;7mPrompt Here[P]– [0;1m[23;044H
When it should read
Title of Page Here
Date Time Here
Prompt Here
Parts of the ‘junk’ you’re seeing are part of the Telnet protocol. The remote is trying to negotiate some options with you, and may also send you some other commands (although that’s relatively rare in practice). See the
TELNET COMMAND STRUCTUREsection of the applicable RFC for the exact format and meaning of all possible commands.In most cases, you’ll be able to simply ignore any Telnet commands (including option negotiation) received, but you do have to filter them: as you discovered, simply treating a Telnet session as a clean TCP stream won’t work.
In addition to protocol-level options, the remote may also assume you’re a terminal, and send escape sequences to ensure the data is properly displayed. Interpreting or filtering those codes will depend on the type of terminal the remote is configured to use — it’s not unlikely you’ll encounter a VT100, for example.
There’s no real need to delve too deeply into the specs, by the way: it’s entirely feasible to use something pre-built like this minimalistic Telnet library to deal with the most important details for you.
EDIT, 29 March 2012: The additional examples of the ‘junk’ you’re seeing confirm that the remote is treating you as a VT100. For example:
[0;1;4mTitle of Page Herecorresponds toSet Attribute Mode: <ESC>[{attr1};...;{attrn}mand tries to make the page title appear bright (1) and underlined (4).Simplest option here: as soon as you see an ESCape character (ASCII 27), ignore everything after that up to and including the first character that isn’t in the list
[;0123456789. That will strip the most common VT100 codes: there are a few that may require special handling, but those are rare, and anyway, you have the specs now.But even if you strip the control codes, you may still end up with an unparseable data stream, especially if the host tries to maintain a fancy screen layout. For example, it may randomly update a status field (e.g. a clock) in the middle of a stream of values that you’re interested in. If that’s the case, you’ll need a (virtual) VT100 emulator annex screen scraper. Those kinds of solutions mostly seem to involve expensive commercial software, although libvt100 – A purely .net/C# library for parsing a VT100/ANSI stream may work for you.