Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8122001
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T05:41:35+00:00 2026-06-06T05:41:35+00:00

I am processing a flat file, with data in line by line format, like

  • 0

I am processing a flat file, with data in line by line format, like this

... blah blah blah | sku: 01234567 | price: 150 | ... blah blah blah

I want to extract the sku field, it is the number with 8 char long. However, I am not sure if I should use split or regex, I am not very good at using regex in python.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T05:41:36+00:00Added an answer on June 6, 2026 at 5:41 am

    Assuming your sku values are always 8 char long, and are always preceded by ‘sku’, and possibly some ‘:’ (with or without spaces in the between), then I would use the regex: r'sku[\s:]*(\d{8})':

    >>> import re
    >>> string = '... | sku: 01234567 | price: 150 | ... '
    >>> re.findall(r'sku[\s:]*(\d{8})', string)[0]
    '01234533'
    

    If your sku values length may be variable, just use: r'sku[\s:]*(\d*)':

    >>> import re
    >>> string = '... | sku: 01234 | price: 150 | sku: 99872453 | blah blah ... '
    >>> re.findall(r'sku[\s:]*(\d*)', string)[0]
    '01234'
    >>> re.findall(r'sku[\s:]*(\d*)', string)[1]
    '99872453'
    

    edit

    If your ‘sku’ is followed by some other characters, like sku1, sku2, sku-sp, sku-18 or sku_anything, you could try that:

    >>> re.findall(r'sku\D*(\d*)', string)[0]
    

    This is the exact equivalent of:

    >>> re.findall(r'sku[^0-9]*([0-9]*)', string)[0]
    

    It’s very general. It will match anything that begin with sku, then that will be followed by any undetermined number of non-decimal character (\D*, or [^0-9]*), and by some decimal characters (\d*, or [0-9]*). It will return the latter (a string of undetermined length of decimal characters).

    Now, what do mean the things I used to build these expressions:

    quantifiers

    • *: when following a single character or a class of characters, this symbol means that the expression will match any undetermined number of the character or class it follows (* means "0 or some", + means "at least one", ? means "0 or 1").
    • the {} are used in the same ways than the *, the + and the ?, ie. they follow a character or a class of characters. They also are quantifiers. If you say c{4}, it will match any string composed of exactly 4 ‘c’s. If you say c{1,6} it will match any string composed of between 1 and 6 ‘c’.

    classes

    • []: define a class of characters. [abc] means any of the characters ‘a’, ‘b’, or ‘c’. [a-z] means any of the lower case letters. [A-Z], any of the upper case letters, [a-zA-Z] any of the lower and upper case letters, [0-9] any of the decimal characters. If you want to match decimals with dots, or commas, with plus, minus and ‘e’ (for exponentials, for example), just say [0-9,\.+-e].
    • the ^ inside of a class – defined with [], means ‘inverted class’, everything but the class. Then, [^0-9] means anything but decimal characters, [^a-z] anything but lower case letters, and so on, and so forth.

    predefined classes

    These are classes that are predefined in python, for making the regexes syntax more friendly:

    • \s: will match any spacing character (space, tabulation, etc.)
    • \d: will match any decimal character (0, 1, 2, 3, 4, 5, 6, 7, 8, 9 … This is equivalent to [0-9], which is another way to express a characters class in regexes)
    • \D: will match any non-decimal character … This is equivalent to [^0-9], which is another way to express an exluded class of characters in regexes.
    • \S: will match any non-spacing character …
    • \w: will match any ‘word character’
    • \W: will match any non-word character
    • …

    groups

    • () defines some groups. They have many usages. Here, in findall, the group highlights what you want to be returned by the expression … ie. (\d{8}) or [0-9]{8} means you want the expression returns to you only the strings of 8 decimal characters in the matching full string.

    Regular expressions are really easy to use, and very useful. You just have to very well understand what they can do and what they can’t (they are limited to regular languages. If you need to deal with levels of nested things for example, or other languages defined with context-free grammars, regexes won’t be enough). You would probably want to have a look on the following pages:

    • http://docs.python.org/library/re.html
    • http://www.regular-expressions.info/tutorial.html
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to extract data from a DB2 table, run some processing on each
I`m processing HTML page and finally ended up with lines like this: <td class=border>AAA</td><td
I am importing a tab delimited file and get this error . Error: 0xC02020A1
Processing EXE is reporting System.Xml.XmlException: There are multiple root elements. Line 2, position 2.
after processing a upload file received from an external website, we need to send
I am processing a third party xml file and validating it against an xsd
When processing a QuickFix44.NewOrderMultileg message in C#, how do you extract the details of
I've created a flat file schema in Visual Studio from an instance of a
Processing an XML file with LINQ to add records into a table in a
In Processing.js, I'd like to have circles that represent nodes, with lines connecting linked

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.