Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8648971
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T13:26:36+00:00 2026-06-12T13:26:36+00:00

I’m currently investigating how to use the SSE 4.2 String and Text Processing Instructions

  • 0

I’m currently investigating how to use the SSE 4.2 String and Text Processing Instructions STTNI (http://software.intel.com/en-us/articles/xml-parsing-accelerator-with-intel-streaming-simd-extensions-4-intel-sse4/) for efficient CSV file parsing.

My question is if this has been tried before for CSV file/in-memory CSV parsing and if examples are available online? So far I was not successful in finding good resources (except the Intel article mentioned above) on how to use SSE 4.2 for text parsing.

The current strategy I’m trying is to, for each 16 bytes, create 4 bitmasks:

  • one matching each character against the delimiter
  • one matching each character against the newline character
  • one matching each character against the quotation character (strings); and
  • one matching each character against the escape character (escaping delimiter, newlines, quotes)

with the information gained by the bitmasks it is easy to determine the offsets and lengths for each value in the CSV.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T13:26:38+00:00Added an answer on June 12, 2026 at 1:26 pm

    Why are you using the bitmasks? Wouldn’t it be better to check for all of those events with a single STTNI instruction and then use the returned index to process the event returned (if any)?

    (edit)
    let me try to be more helpful…

    (I’ll assume you are using null terminated strings of 8-bit chars. Let me know if that is not the case.)

    I think you’d do better to put the delimiter, the newline, the quotation and the escape into a single register (as a null terminated string) and use PCMPISTRI instead of PCMPISTRM using each value. For the control word you’ll want to indicate: Unsigned bytes, Equal Any, Positive Polarity, Least. (Pretty sure I got that right.)

    You can then use JA to simultaneously check to see if any of the 4 special characters were hit or the end of the string was reached. If so, escape the loop to deal with it. If not, add ECX to the xmm2/m128 pointer and jump back to the PCMPISTRI.

    First instruction of code to deal with a “hit” is to add ECX to xmm2/m128 pointer, then process each possibility in turn. I suggest ordering them from most likely to least.

    So, the asm should end up looking something like:

      XOR       ECX, ECX  
    
    TAG1:
        ADD       EAX, ECX  
        PCMPISTRI XMM1, [EAX], 0x0     ; also writes ECX = index
        JA        TAG1  
    
    ADD       EAX, ECX  
    CMP       BYTE PTR[EAX], "delimiter"  
    JE        "handle delimiter"  
    CMP       BYTE PTR[EAX], "newline"  
    JE        "handle newline"  
    CMP       BYTE PTR[EAX], "quotation"  
    JE        "handle quotation"  
    CMP       BYTE PTR[EAX], "escape"  
    JE        "handle escape"  
    CMP       BYTE PTR[EAX], "end of string"  
    JE        "handle end of string"  
    

    I’ll let you decide what the best order for testing delimiters is. 🙂

    When I was developing the instructions I used to be able to get the compiler to generate the asm code above using intrinsics. It’s been a while since I’ve done work with the instructions though so not sure if the average compiler will do well or not. (would be interesting to hear what results you get.)


    By the way, the mask versions of the instructions do have all kinds of uses, they just aren’t the best choice for finding the first or last of something since the “I” versions of the instructions will calculate the offset for you. The mask versions are good for counting or only processing certain items among other more exotic things. right now I’m using them to count A, C G, and T’s in DNA strings.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

For some reason, after submitting a string like this Jack’s Spindle from a text
I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
I want use html5's new tag to play a wav file (currently only supported
I'm trying to use string.replace('’','') to replace the dreaded weird single-quote character: ’ (aka
I have a .ini file as follows: [playlist] numberofentries=2 File1=http://87.230.82.17:80 Title1=(#1 - 365/1400) Example
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I want to count how many characters a certain string has in PHP, but
I am trying to understand how to use SyndicationItem to display feed which is
I've got a string that has curly quotes in it. I'd like to replace
Specifically, suppose I start with the string string =hello \'i am \' me And

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.