Common widsom is that rep movsb is much slower than rep movsd (or on

Question

0

Asked: June 11, 20262026-06-11T03:45:19+00:00 2026-06-11T03:45:19+00:00

Common widsom is that rep movsb is much slower than rep movsd (or on

0

Common widsom is that rep movsb is much slower than rep movsd (or on 64-bit, rep movsq) when performing identical operations. However, I’ve been testing on a few modern machines, and the run times are coming out identical (up to measurement noise) across a huge range of buffer sizes (10 bytes to 2 megs). So far I have just tested on 2 machines (32-bit Intel Atom D510 and 64-bit AMD FX 8120).

Are there any modern x86 (32- or 64-bit) machines where rep movsb is slower than rep movsd (or rep movsq)?
If not, what was the last machine where the difference was significant, and how significant was it?

I’m asking this question from a standpoint of wanting to avoid cargo-culting a bunch of tests to break memory up into unaligned head/tail and aligned middle for the sake of using rep movsd or rep movsq if there’s no actual benefit to doing this…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T03:45:21+00:00

Lots of benchmarks here: instlatx64.atw.hu

For example (Intel Core 2 Duo E6700):

REP MOVSB   BW in L1D:13.04 B/c  34829MiB/s
REP MOVSW   BW in L1D:13.29 B/c  35493MiB/s
REP MOVSD   BW in L1D:13.40 B/c  35783MiB/s

Which shows that there is a difference, but it’s tiny.

This one for SandyBridge is a little weird:

REP MOVSB   BW in L1D:25.50 B/c  86986MiB/s
REP MOVSW   BW in L1D:18.09 B/c  61721MiB/s
REP MOVSD   BW in L1D:27.47 B/c  93693MiB/s

Seems there is a big difference on some Atoms (seems to have disappeared with the D5xx, so you just missed it):

REP MOVSB   BW in L1D: 0.53 B/c    990MiB/s
REP MOVSW   BW in L1D: 1.93 B/c   3598MiB/s
REP MOVSD   BW in L1D: 3.74 B/c   6960MiB/s

I haven’t found such big difference on anything else that can be considered new.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Common widsom is that rep movsb is much slower than rep movsd (or on

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply