Here is a snippet that includes my string.
'ls\r\n\x1b[00m\x1b[01;31mexamplefile.zip\x1b[00m\r\n\x1b[01;31m'
The string was returned from an SSH command that I executed. I can’t use the string in its current state because it contains ANSI standardized escape sequences. How can I programmatically remove the escape sequences so that the only part of the string remaining is 'examplefile.zip'.
Delete them with a regular expression:
or, without the
VERBOSEflag, in condensed form:Demo:
The above regular expression covers all 7-bit ANSI C1 escape sequences, but not the 8-bit C1 escape sequence openers. The latter are never used in today’s UTF-8 world where the same range of bytes have a different meaning.
If you do need to cover the 8-bit codes too (and are then, presumably, working with
bytesvalues) then the regular expression becomes a bytes pattern like this:which can be condensed down to
For more information, see:
The example you gave contains 4 CSI (Control Sequence Introducer) codes, as marked by the
\x1B[or ESC[opening bytes, and each contains a SGR (Select Graphic Rendition) code, because they each end inm. The parameters (separated by;semicolons) in between those tell your terminal what graphic rendition attributes to use. So for each\x1B[....msequence, the 3 codes that are used are:00in this example): reset, disable all attributes01in the example): boldHowever, there is more to ANSI than just CSI SGR codes. With CSI alone you can also control the cursor, clear lines or the whole display, or scroll (provided the terminal supports this of course). And beyond CSI, there are codes to select alternative fonts (
SS2andSS3), to send ‘private messages’ (think passwords), to communicate with the terminal (DCS), the OS (OSC), or the application itself (APC, a way for applications to piggy-back custom control codes on to the communication stream), and further codes to help define strings (SOS, Start of String,STString Terminator) or to reset everything back to a base state (RIS). The above regexes cover all of these.Note that the above regex only removes the ANSI C1 codes, however, and not any additional data that those codes may be marking up (such as the strings sent between an OSC opener and the terminating ST code). Removing those would require additional work outside the scope of this answer.