I’m writing an algorithm in C++ that scans a file with a sliding window,

Question

0

Asked: June 13, 20262026-06-13T09:56:59+00:00 2026-06-13T09:56:59+00:00

I’m writing an algorithm in C++ that scans a file with a sliding window,

0

I’m writing an algorithm in C++ that scans a file with a “sliding window,” meaning it will scan bytes 0 to n, do something, then scan bytes 1 to n+1, do something, and so forth, until the end is reached.

My first algorithm was to read the first n bytes, do something, dump one byte, read a new byte, and repeat. This was very slow because to “ReadFile” from HDD one byte at a time was inefficient. (About 100kB/s)

My second algorithm involves reading a chunk of the file (perhaps n*1000 bytes, meaning the whole file if it’s not too large) into a buffer and reading individual bytes off the buffer. Now I get about 10MB/s (decent SSD + Core i5, 1.6GHz laptop).

My question: Do you have suggestions for even faster models?

edit: My big buffer (relative to the window size) is implemented as follows:
– for a rolling window of 5kB, the buffer is initialized to 5MB
– read the first 5MB of the file into the buffer
– the window pointer starts at the beginning of the buffer
– upon shifting, the window pointer is incremented
– when the window pointer nears the end of the 5MB buffer, (say at 4.99MB), copy the remaining 0.01MB to the beginning of the buffer, reset the window pointer to the beginning, and read an additional 4.99MB into the buffer.
– repeat

edit 2 – the actual implementation (removed)

Thank you all for many insightful response. It was hard to select a “best answer”; they were all excellent and helped with my coding.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T09:57:00+00:00

I use a sliding window in one of my apps (actually, several layers of sliding windows working on top of each other, but that is outside the scope of this discussion). The window uses a memory-mapped file view via CreateFileMapping() and MapViewOfFile(), then I have an an abstraction layer on top of that. I ask the abstraction layer for any range of bytes I need, and it ensures that the file mapping and file view are adjusted accordingly so those bytes are in memory. Every time a new range of bytes is requested, the file view is adjusted only if needed.

The file view is positioned and sized on page boundaries that are even multiples of the system granularity as reported by GetSystemInfo(). Just because a scan reaches the end of a given byte range does not necessarily mean it has reached the end of a page boundary yet, so the next scan may not need to alter the file view at all, the next bytes are already in memory. If the first requested byte of a range exceeds the right-hand boundary of a mapped page, the left edge of the file view is adjusted to the left-hand boundary of the requested page and any pages to the left are unmapped. If the last requested byte in the range exceeds the right-hand boundary of the right-most mapped page, a new page is mapped and added to the file view.

It sounds more complex than it really is to implement once you get into the coding of it:

Creating a View Within a File

It sounds like you are scanning bytes in fixed-sized blocks, so this approach is very fast and very efficient for that. Based on this technique, I can sequentially scan multi-GIGBYTE files from start to end fairly quickly, usually a minute or less on my slowest machine. If your files are smaller then the system granularity, or even just a few megabytes, you will hardly notice any time elapsed at all (unless your scans themselves are slow).

Update: here is a simplified variation of what I use:

class FileView
{
private:
    DWORD m_AllocGran;
    DWORD m_PageSize;

    HANDLE m_File;
    unsigned __int64 m_FileSize;

    HANDLE m_Map;
    unsigned __int64 m_MapSize;

    LPBYTE m_View;
    unsigned __int64 m_ViewOffset;
    DWORD m_ViewSize;

    void CloseMap()
    {
        CloseView();

        if (m_Map != NULL)
        {
            CloseHandle(m_Map);
            m_Map = NULL;
        }
        m_MapSize = 0;
    }

    void CloseView()
    {
        if (m_View != NULL)
        {
            UnmapViewOfFile(m_View);
            m_View = NULL;
        }
        m_ViewOffset = 0;
        m_ViewSize = 0;
    }

    bool EnsureMap(unsigned __int64 Size)
    {
        // do not exceed EOF or else the file on disk will grow!
        Size = min(Size, m_FileSize);

        if ((m_Map == NULL) ||
            (m_MapSize != Size))
        {
            // a new map is needed...

            CloseMap();

            ULARGE_INTEGER ul;
            ul.QuadPart = Size;

            m_Map = CreateFileMapping(m_File, NULL, PAGE_READONLY, ul.HighPart, ul.LowPart, NULL);
            if (m_Map == NULL)
                return false;

            m_MapSize = Size;
        }

        return true;
    }

    bool EnsureView(unsigned __int64 Offset, DWORD Size)
    {
        if ((m_View == NULL) ||
            (Offset < m_ViewOffset) ||
            ((Offset + Size) > (m_ViewOffset + m_ViewSize)))
        {
            // the requested range is not already in view...

            // round down the offset to the nearest allocation boundary
            unsigned __int64 ulNewOffset = ((Offset / m_AllocGran) * m_AllocGran);

            // round up the size to the next page boundary
            DWORD dwNewSize = ((((Offset - ulNewOffset) + Size) + (m_PageSize-1)) & ~(m_PageSize-1));

            // if the new view will exceed EOF, truncate it
            unsigned __int64 ulOffsetInFile = (ulNewOffset + dwNewSize);
            if (ulOffsetInFile > m_FileSize)
                dwNewViewSize -= (ulOffsetInFile - m_FileSize);

            if ((m_View == NULL) ||
                (m_ViewOffset != ulNewOffset) ||
                (m_ViewSize != ulNewSize))
            {
                // a new view is needed...

                CloseView();

                // make sure the memory map is large enough to contain the entire view
                if (!EnsureMap(ulNewOffset + dwNewSize))
                    return false;

                ULARGE_INTEGER ul;
                ul.QuadPart = ulNewOffset;

                m_View = (LPBYTE) MapViewOfFile(m_Map, FILE_MAP_READ, ul.HighPart, ul.LowPart, dwNewSize);
                if (m_View == NULL)
                    return false;

                m_ViewOffset = ulNewOffset;
                m_ViewSize = dwNewSize;
            }
        }

        return true;
    }

public:
    FileView() :
        m_AllocGran(0),
        m_PageSize(0),
        m_File(INVALID_HANDLE_VALUE),
        m_FileSize(0),
        m_Map(NULL),
        m_MapSize(0),
        m_View(NULL),
        m_ViewOffset(0),
        m_ViewSize(0)
    {
        // map views need to be positioned on even multiples
        // of the system allocation granularity.  let's size
        // them on even multiples of the system page size...

        SYSTEM_INFO si = {0};
        if (GetSystemInfo(&si))
        {
            m_AllocGran = si.dwAllocationGranularity;
            m_PageSize = si.dwPageSize;
        }
    }

    ~FileView()
    {
        CloseFile();
    }

    bool OpenFile(LPTSTR FileName)
    {
        CloseFile();

        if ((m_AllocGran == 0) || (m_PageSize == 0))
            return false;

        HANDLE hFile = CreateFile(FileName, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
        if (hFile == INVALID_HANDLE_VALUE)
            return false;

        ULARGE_INTEGER ul;
        ul.LowPart = GetFileSize(hFile, &ul.HighPart);
        if ((ul.LowPart == INVALID_FILE_SIZE) && (GetLastError() != 0))
        {
            CloseHandle(hFile);
            return false;
        }

        m_File = hFile;
        m_FileSize = ul.QuadPart;

        return true;
    }

    void CloseFile()
    {
        CloseMap();

        if (m_File != INVALID_HANDLE_VALUE)
        {
            CloseHandle(m_File);
            m_File = INVALID_HANDLE_VALUE;
        }
        m_FileSize = 0;
    }

    bool AccessBytes(unsigned __int64 Offset, DWORD Size, LPBYTE *Bytes, DWORD *Available)
    {
        if (Bytes) *Bytes = NULL;
        if (Available) *Available = 0;

        if ((m_FileSize != 0) && (offset < m_FileSize))
        {
            // make sure the requested range is in view
            if (!EnsureView(Offset, Size))
                return false;

            // near EOF, the available bytes may be less than requested

            DWORD dwOffsetInView = (Offset - m_ViewOffset);

            if (Bytes) *Bytes = &m_View[dwOffsetInView];
            if (Available) *Available = min(m_ViewSize - dwOffsetInView, Size);
        }

        return true;
    }
};

.

FileView fv;
if (fv.OpenFile(TEXT("C:\\path\\file.ext")))
{
    LPBYTE data;
    DWORD len;

    unsigned __int64 offset = 0, filesize = fv.FileSize();

    while (offset < filesize)
    {
        if (!fv.AccessBytes(offset, some size here, &data, &len))
            break; // error

        if (len == 0)
            break; // unexpected EOF

        // use data up to len bytes as needed...

        offset += len;
    }

    fv.CloseFile();
}

This code is designed to allow random jumping anywhere in the file at any data size. Since you are reading bytes sequentially, some of the logic can be simplified as needed.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m writing an algorithm in C++ that scans a file with a sliding window,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply