I want to use sed to take any arbitrary stream and append a null

Question

0

Editorial Team

Asked: May 23, 20262026-05-23T15:35:57+00:00 2026-05-23T15:35:57+00:00

I want to use sed to take any arbitrary stream and append a null

0

I want to use sed to take any arbitrary stream and append a null byte to each byte.

I’ve tried a number of things, but have trouble with:

matching any byte – . seems to be a subset, i.e. any character, not any byte.
adding a null byte – I thought it should be \0, but that doesn’t work.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T15:35:58+00:00

Answer for original question

I suggest using Perl or Python; here’s a (verbose) Perl solution:

#!/usr/bin/env perl
use strict;
use warnings;
while (<>)
{
    s/./$&\0/gs;
    print;
}

For ASCII text input, this gives you UTF-16LE output (without a BOM). Given that it is Perl, TMTOWTDI, and it can be reduced to a one-line; see the answer by paxdiablo.

Given this explicit loop structure, the easiest way to print the BOM is to add a print statement before the loop:

printf "%c%c", 0xFF, 0xFE;

Given a one-liner, you need a BEGIN block:

perl -pe 'BEGIN{printf "%c%c", 0xFF, 0xFE;} s/(.)/\1\0/gs;' "$@"

There are at least 4, arguably 5, superfluous spaces in that script.

Answer for revised then reverted question

The modified question was:

I want to use sed to take any arbitrary a UTF-8 stream and append a null byte convert it to each byte UTF-16. What’s the magic sauce to make this happen?

The revised question is a very different proposition from the original. Converting UTF-8 to UTF-16 is, in general, moderately complex; you have to read 1-4 bytes of input, and generate 2 or 4 bytes of output, worrying about surrogates and malformed input, etc. The original question – how to add a NUL (or zero) byte after each character in the input – is much, much, much simpler. (It remains true that if the input is ASCII – 7-bit byte values between 0 and 127 – then the ‘add a NUL afterwards’ gives you UTF-16LE. But only if the UTF-8 data is in the ASCII subset.)

However, for accurate translation, the tool of choice should be iconv:

Usage: iconv [OPTION...] [-f ENCODING] [-t ENCODING] [INPUTFILE...]
or:    iconv -l

Converts text from one encoding to another encoding.

Options controlling the input and output format:
  -f ENCODING, --from-code=ENCODING
                              the encoding of the input
  -t ENCODING, --to-code=ENCODING
                              the encoding of the output

Options controlling conversion problems:
  -c                          discard unconvertible characters
  --unicode-subst=FORMATSTRING
                              substitution for unconvertible Unicode characters
  --byte-subst=FORMATSTRING   substitution for unconvertible bytes
  --widechar-subst=FORMATSTRING
                              substitution for unconvertible wide characters

Options controlling error output:
  -s, --silent                suppress error messages about conversion problems

Informative output:
  -l, --list                  list the supported encodings
  --help                      display this help and exit
  --version                   output version information and exit

Hence, to convert from UTF-8 to UTF-16LE:

iconv -f UTF-8 -t UTF-16LE  input > output

Interestingly, I don’t see an option to add a BOM to the output, at least not with iconv version 1.11 from 2007 on RHEL 5 (nor the same version on MacOS X, dated 2006 — don’t ask, I don’t know!).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to use sed to take any arbitrary stream and append a null

Leave an answerCancel reply

1 Answer

Answer for original question

Answer for revised then reverted question

Leave an answer
Cancel reply