Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8046777
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 5, 20262026-06-05T05:54:04+00:00 2026-06-05T05:54:04+00:00

My task is to calculate RAM Read/Write speed. I using asm inserts to avoid

  • 0

My task is to calculate RAM Read/Write speed.
I using asm inserts to avoid compiler optimizations. To measure time I use TSC and CPU frequency. To move data I use asm instruction MOVNTDQ which doesn’t use cache hierarchy.

Problem is in results. Data rate (by datasheet) is 800 Mbps, and I got by my test > 2000 Mbps write speed.

void memory_notCache_write_128(void* src, long blocks_amount) 
{
    _asm    
    {
        mov ecx, blocks_amount
        mov     esi, src
    a20:
        movntdq [esi], xmm0
        movntdq [esi + 16], xmm1
        movntdq [esi + 32], xmm2
        movntdq [esi + 48], xmm3
        movntdq [esi + 64], xmm4
        movntdq [esi + 80], xmm5
        movntdq [esi + 96], xmm6
        movntdq [esi + 112], xmm7
        add esi,  128
        loop    a20;
    }
}

int main()
{ 
    unsigned __int64 tick1, tick2;
    const long nBytes = 32*KByte;   

    char* source = (char*)_mm_malloc(nBytes*sizeof(char),16);

    tick1 =  getTicks();
    memory_notCache_write_128(source, current_times.t128);
    tick2 =  getTicks();

    double time = (double)(tick2-tick1)/(ProcSpeedCalc());
    cout << "Time WRITE_128[seconds]:" << time << endl;
    cout << (double) nBytes / time / MByte << endl;

    return 0;
}

Datasheet of RAM, that I used – http://www.alldatasheet.com/datasheet-pdf/pdf/308537/ELPIDA/EBE11UE6ACUA-8G-E.html

Source code (was written for Win patform): https://bitbucket.org/closed_eyes/ram_speed_for_win/downloads/memory_test.cpp

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-05T05:54:05+00:00Added an answer on June 5, 2026 at 5:54 am

    You shouldn’t use non-temporal operations for this sort of code. The real way to build a memory performance tester is to use the access pattern to make sure that you never hit in the cache. Generally, this is done by looping over a very large chunk of memory that is bigger than the last level of cache in your system where your stride is the same as the cache line size. If you do this, you’ll ensure that every access will be a cache miss in all levels. Don’t forget though that when you read just one byte from memory, the processor will fetch a whole cache line, so if you do a 64-bit load, on a machine with a 64-byte cache line (very common), you should be counting 64-bytes as being read from memory.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

My task is it to write a script using opencv which will later run
Pretty simple task: using jquery I need to calculate the width of an input
Consider the task: Read a polygon from a KML file using library A Intersect
I'd like to calculate a path in a MsBuild task, to be used by
My task is to build a postfix calculator using a linked list to represent
I have a task, where I have to calculate optimal policy (Reinforcement Learning -
My task is to create a function which should calculate the arcsin for my
I have the interesting task of doing some graphs using VB.NET. So far, everything
I would like to be able to schedule a task at a specific time
I am currently doing a task in a book which asks me to calculate

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.