Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 857205
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T08:21:47+00:00 2026-05-15T08:21:47+00:00

HI there, I am looking for best practice or ideas for cleaning tags or

  • 0

HI there, I am looking for best practice or ideas for cleaning tags or at least grabbing the data from within custom tags in a text.

I am sure I can code some sort of “parser” that will go through every line manually, but isnt there some smartere way today?

Data thoughts:

{Phone:555-123456789}

here we have “phone” being the key and the number as the data. Looks a lot like JSON format but its easier to write for a human.

or

{link:   article123456  ;    title:    Read about article 123456 here   } 

Could be normal (X)HTML too:

<a         href="article123456.html"      >  Read about article 123456 here  </a>

Humans aren’t always nice to “trim” their input and neither are old websites made with lazy WYSIWYG editors, so I first need to figure out which pairs belongs together and then after finding the “data within” then trim the results.

Problem is in the “title” part above, that there are no ” ” surrounding the title-text, so it could either add them automatically or show the error to the human.

Any thoughts on how to grab these data the best way? There seems to be several ways that might work, but whats your best approach to this problem?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T08:21:48+00:00Added an answer on May 15, 2026 at 8:21 am

    I would first write a “tokenizer” for the syntax of the data I was parsing. A tokenizer is a (relatively) simple process that breaks a string down into a series of fragments, or tokens. For example, in your first two cases your basic tokens would consist of: “{“, “}”, “:”, “;”, and everything else would be interpreted as a data token. This can be done with a loop, a recursive function, or a number of other ways. Tokenizing your second example would produce an array (or some other sort of list) with the following values:

    "{", "link", ":", "   article123456  ", ";", "    title", ":", "    Read about article 123456 here   ", "}"
    

    The next step would be to “sanitize” your data, though in these cases all that really means is removing unwanted whitespace. Iterate through the token array that was produced, and alter each token so that there is no beginning or ending whitespace. This step could be combined with tokenization, but I think it’s much cleaner and clearer to do it separately. Your tokens would then look like this:

    "{", "link", ":", "article123456", ";", "title", ":", "Read about article 123456 here", "}"
    

    And finally, the actual “interpretation.” You’ll need to convert your token array into whatever sort of actual data structure that you intend to be the final product of the parsing process. For this you’ll definitely want a recursive function. If the function is called on a data token, followed by a colon token, followed by a data token, it will interpret them at a key-value pair, and produce a data structure accordingly. If it is called on a series of tokens with semicolon tokens, it will split the tokens up at each semicolon and call itself on each of the resulting groups. And if it is called on tokens contained within curly-brace tokens, it will call itself on the contained tokens before doing anything else. Note that this is not necessarily the order in which you’ll want to check for these various cases; in particular, if you intend to nest curly-braces (or any other sort of grouping tokens, such as square brackets, angle brackets, or parentheses), you’ll next to make sure to interpret those tokens in the correct nested order.

    The result of these processes will be a fully parsed data structure of whatever type you’d like. Keep in mind that this process assumes that your data is all implicitly stored as the string type; if you’d like “3” and 3 to be interpreted differently, then things get a bit more complicated. This method I’ve outlined is not at all the only way to do it, but it’s how I’d approach the problem.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm looking for best practices here. Sorry. I know it's subjective, but there are
I'm looking for the BEST asp.net calendar/schedule component that it out there. I like
I'm looking for the best (free/cheap) international weather PHP API out there. Any suggestions?
Is there a best way to edit CSS? I'm looking for a designer tool.
I'm looking to set up a blog. There are many what's the best blogging
I am currently looking for an intelligent best practice solution for the following problem.
I am looking for a solution/best practice to do a swap of values for
I'm looking for a best practice for managing configuration on a project with multiple
I'm looking for some best practice advice here. I have a library that I'd
I'm looking for a best practise way to handle incoming time series data. One

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.