Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7059285
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T04:12:50+00:00 2026-05-28T04:12:50+00:00

Intro This post is long, but I consider it thorough. I hope this post

  • 0

Intro

This post is long, but I consider it thorough. I hope this post might be helpful (addresses) to others while teaching complex VIM regexes. Thank you for your time.

Worldwide addresses:

American, Canadian and a few other countries are offered 5 fields on a form, which is then displayed in a comma delimited format that I need to further dissect. Ideally, the comma-separated content looks like:

Some Really Nice Place, 111 Street, Beautiful Town, StateOrProvince, zip

where zip can be either a series of just numbers (US) or numbers and letters (Canada).

Invariably, people throw an extra comma into their text box field input and that adds some complexity to the parsing of this data. For example:

Some Really Nice Place, 111 Street, suite 101, Beautiful Town, StateOrProvince, zip

Further complicating this parse is that the data from non-US and non-Canadian countries contains an extra comma-delimited field that was somehow provided to them – adding a place for them to enter their country. (No, there is no “US” or “Canada” field for their entries. So, it’s “in addition” to the original 5 comma-delimited fields.) Such as:

Foreign Name of Building, A street name, A City, ,zip, Country

The “,,” is usually empty as non-US countries do are not segmented into states. And, yes, the same “additional commas” as described above happens here too.

Foreign Name of Building, cross streets, district, A street name, A City, ,zip, Country

Parsing Strategy:

A country name will never include a digit, whereas a US or Canadian zip will always have at least some digits. If you go backwards using this assumption about the contents of the last field then you should be able to place the country, zip, State (if not empty “,,”), City and Street into their respect positions – which are the most important fields to get right. Anything beyond those sections could be lumped together in the first or or two lines as descriptions of the address (i.e. building, name, suite, cross streets, etc). For example:

Some Really Nice Place, 111 Street, suite 101, Beautiful Town, Lovely State, Digits&Letters

  1. Last section has a digit (therefore a US or Canadian address)
  2. There a total of 6 sections, so that’s one more than the original 5
  3. Knowing that sections 5-2 are zip, state, town, address…
  4. 6 minus 5 (original) = add an extra Address (Address2) field and leave the first section as the header, resulting in:

Header: Some Really Nice Place, Address1: 111 Street, Address2: Suite 101, Town: Beautiful Town, State/Province: Lovely State, Zip: Digits&Letters

Whereas there might be a discrepancy on where “111 Street” or “Suite 101” goes (Address1 or Address2), it at least gets the zip, state, city and address(s) lumped together and leaves the first section as the “Header” to the email address for data entry purposes.

Under this approach, foreign address get parsed like:

Foreign Name of Building, cross streets, district, A street name, A
City, ,zip, Country

  1. Last section has no digit, so it must be a Country
  2. That means, moving right to left, the second section is the zip
  3. So now (foreign) you have an “original 6 sections” to subtract from the total of 7 in the example
  4. 7th section = country, 6th = zip, 5th = state (mostly blank on foreign address), 4th = City, 3rd = address1, 2nd = address2, 1st = header
  5. We knew to use two address fields because the example had 7 sections and foreign addresses have a base of 6 sections. Any number of sections above the base are added to a second address2 field. If there are 3 sections above the base section count then they are appended to each inside the address2 field.

Coding

In this approach using VIM, how would I initially read the number of comma-delimited sections (after I’ve captured the entire address in a register)? How do I do submatch(es) on a series of comma-delimited sections for which I am not sure the number of sections that exist?

Example Addresses

Here are some practice address (US and Foreign) if you are so inclined to help:

City Gas & Electric – Bldg 4, 222 Middle Park Ct, CP4120F, Dallas, Texas, 44984

MHG Engineering, Inc. Suite 200, 9899 Balboa Ave, San Diego, California, 92123-1502

SolarWind Turbines, 2nd Floor Conference Room, 2300 Ruffin Road, Seattle, Washington, 84444

123 Aeronautics, 2239 Industry Parkway, Salt Lake City, Utah, 55344

Ongwanda Gov’t Resources, 6000 Portsmouth Avenue, Ottawa, Ontario, K7M 8A6

Graylang Seray Center, 6600 Haig Rd, Singapore, , 437848, Singapore

Lot 459, Block 14, Jalan Sultan Tengah, Petra Jaya, Kuching, , 93050, Malaysia

Virtual Steel, 1 Umgazi Rd Aspec Park, Pretoria, , 0075, South Africa

Idiom Towers South, Fifth Floor, Jasmen Conference Room, 1500 Freedom Street, Pretoria, , 0002, South Africa

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T04:12:50+00:00Added an answer on May 28, 2026 at 4:12 am

    The following code is a draft-quality Vim script (hopefully) implementing the
    address parsing routine described in the question.

    function! ParseAddress(line)
        let r = split(a:line, ',\s*', 1)
        let hadcountry = r[-1] !~ '\d'
        let a = {}
        let a.country = hadcountry ? r[-1] : ''
        let r = r[:-1-hadcountry]
        let a.zip = r[-1]
        let a.state = r[-2]
        let a.city = r[-3]
        let a.header = r[0]
        let nleft = len(r) - 4
        if hadcountry
            let a.address1 = r[-4]
            let a.address2 = join(r[1:nleft-1], ', ')
        else
            let a.address1 = r[1]
            let a.address2 = join(r[2:nleft], ', ')
        endif
        return a
    endfunction
    
    function! FormatAddress(a)
        let t = map([
        \   ['Header', 'header'],
        \   ['Address 1', 'address1'],
        \   ['Address 2', 'address2'],
        \   ['Town', 'city'],
        \   ['State/Province', 'state'],
        \   ['Country', 'country'],
        \   ['Zip', 'zip']],
        \   'has_key(a:a, v:val[1]) && !empty(a:a[v:val[1]])' .
        \       '? v:val[0] . ": " . a:a[v:val[1]] : ""')
        return join(filter(t, '!empty(v:val)'), '; ')
    endfunction
    

    The command below can be used to test the above parsing routines.

    :g/\w/call setline(line('.'), FormatAddress(ParseAddress(getline('.'))))
    

    (One can provide a range to the :global command to run it through fewer
    number of test address lines.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Preface: This has become a quite a long post. While I'm not new to
Sorry for the long post, but this forum always asks for use cases :-).
I apologize for the long post, but this problem is not easily stated. I
intro: I am pretty sure this is my fault. But I just don't see
This is a relatively long post. F# has a matrix and vector type(in PowerPack
Excuse the title of this post, but I can't really think of a more
I ran into this question today and thought I should post it for the
Update: This turned into a blog post, with updated links and code, over at
This is what my browser sent, when logging into some site: POST http://www.some.site/login.php HTTP/1.0
Intro: EDIT: See solution at the bottom of this question (c++) I have a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.