Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8170831
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T21:15:45+00:00 2026-06-06T21:15:45+00:00

i am trying to extract some specific data out of a text file using

  • 0

i am trying to extract some specific data out of a text file using regular expressions with shell script

that is using a multiline grep .. and the tool i am using is pcregrep so that i can get compatibility with perl’s regular expressions

 [58]Walid Chamoun Architects WLL
     * [59]Map
     * [60]Website
     * [61]Email
     * [62]Profile
     * [63]Display Ad

   Walid Chamoun Architects WLL

   PO Box:
          55803, Doha, Qatar

   Location:
          D-Ring Road, New Salata Shamail 40, Villa 340, Doha, Qatar

   Tel:
          (00974) 44568833

   Fax:
          (00974) 44568811

   Mob:
          (00974) 44568822

     * Accurate Budget Costing
     * Eco-Friendly Structural Design
     * Exclusive & Unique Design
     * Quality Architecture & Design

Company Profile

   Walid Chamoun Architects (WCA) was founded in Beirut, Lebanon, in 1992,
   committed to the concept of fully integrated design-build delivery of
   projects. In late '90s, company established in-house architectural and
   engineering services. As a full service provider, WCA expanded from
   multi-family projects to industrial and office construction, which
   added development services, including site acquisition and financing.
   In 2001, WCA had opportunity and facilities to experience European
   market and establish office in Puerto Banus, Marbella, Spain. By 2005,
   WCA refined its structure to focus on specific market segments and new
   office was opened in Doha, state of Qatar. From a solid foundation and
   reputation built over eighteen years, WCA continually to provide
   leadership in design-build through promotion of benefits and education
   to its practitioners.
   Project Planning: Project planning and investigation occurs before
   design begins has greatest impact on cost, schedule and ultimately the
   success of project. Creativity in Design: You can rely on our in-house
   designers for design excellence in all aspects of the project. Our
   designs have received recommendations and appreciations on national and
   international levels. Creativity in Execution: Experienced in close
   collaboration with the designers as part of the integrated team, our
   construction managers, superintendents and field staff create value
   throughout the project. Post Completion Services: Your needs can be
   served through our skills and experience long after the last
   construction crew has left the site. Performance: Corporate and
   institutional clients, developers and public agencies repeatedly select
   WCA on the basis of its consistent record of performance excellence.
   Serving clients throughout the Middle East and GCC, WCA provides
   complete planning for architectural, interior design and construction
   on a single-responsibility basis. Our expertise spans industrial,
   commercial, institutional, public and residential projects. Benefits of
   Design-Build: Design-build is a system of contracting under which one
   entity performs both design and construction. Benefits of design-build
   project delivery include: Single point responsibility Early knowledge
   of cost Time and Cost savings

   Classification:
          Architects - [64]Architects

   [65]Al Ali Consulting & Engineering
     * [66]Map
     * Website
     * Email
     * Profile
     * Display Ad

   Is this your company?
   [67]Upgrade this free listing here

   PO Box:
          467, Doha, Qatar

   Tel:
          (00974) 44360011

Company Profile

   Classification:
          Architects - [68]Architects

   [69]Al Gazeerah Consulting Engineering
     * [70]Map
     * Website
     * Email
     * Profile
     * Display Ad

   Is this your company?
   [71]Upgrade this free listing here

   PO Box:
          22414, Doha, Qatar

   Tel:
          (00974) 44352126

Company Profile

   Classification:
          Architects - [72]Architects

   [73]Al Murgab Consulting Engineering
     * [74]Map
     * Website
     * Email
     * Profile
     * Display Ad

   Is this your company?
   [75]Upgrade this free listing here

   PO Box:
          2856, Doha, Qatar

   Tel:
          (00974) 44448623

Company Profile

   Classification:
          Architects - [76]Architects
References

   Visible links
   1. http://www.qatcom.com/useraccounts/login
   2. http://www.qatcom.com/useraccounts/register
   3. http://www.qatcom.com/
   4. http://www.qatcom.com/
   5. http://www.qatcom.com/qataryellowpages/map-of-doha
   6. http://www.qatcom.com/qataryellowpages/about-qatcom
   7. http://www.qatcom.com/qataryellowpages/advertise-with-qatcom
   8. http://www.qatcom.com/qataryellowpages/advertiser_testimonials
   9. http://www.qatcom.com/useraccounts/login
  10. http://www.qatcom.com/useraccounts/register
  11. http://www.qatcom.com/contact-qatcom
  12. http://www.qatcom.com/qataryellowpages/companies
  13. http://www.qatcom.com/classifications/index/A
  14. http://www.qatcom.com/classifications/index/B
  15. http://www.qatcom.com/classifications/index/C
  16. http://www.qatcom.com/classifications/index/D
  17. http://www.qatcom.com/classifications/index/E
  18. http://www.qatcom.com/classifications/index/F
  19. http://www.qatcom.com/classifications/index/G
  20. http://www.qatcom.com/classifications/index/H
  21. http://www.qatcom.com/classifications/index/I
  22. http://www.qatcom.com/classifications/index/J
  23. http://www.qatcom.com/classifications/index/K
  24. http://www.qatcom.com/classifications/index/L
  25. http://www.qatcom.com/classifications/index/M
  26. http://www.qatcom.com/classifications/index/N
  27. http://www.qatcom.com/classifications/index/O
  28. http://www.qatcom.com/classifications/index/P

for a sample data like this, i am trying to grab the details of companies namely

company name
po box
Tel
fax
mobile
company profile 

into a .csv file
i am new to regular expressions and linux too..
all i could manage to get was something like this

\[\d*\][^\.]*[\(\d*\)\s\d*)]

can anyone help me out with this please..

improvements:

i figured out something like this

$ awk '/^\[/ && ! /Upgrade this free listing/ {print $0} /:$/ && ! /Classification/ {printf $0 ;  getline x ; print x}' file

but that still isn’t what i want it to be…

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T21:15:48+00:00Added an answer on June 6, 2026 at 9:15 pm

    You can do this in awk, but you’ll be better off parsing the HTML instead. A good tool to do that with would be Python using the Beautiful Soup module. But that’s not very exciting, so here’s how to do it the awkward (hah!) way:

    #!/usr/bin/awk -f
    
    function trim(s) {
        gsub(/(^ +)|( +$)/, "", s)
        return s
    }
    
    BEGIN {
        count = 0
        fields[0] = "company"
        fields[1] = "pobox"
        fields[2] = "tel"
        fields[3] = "fax"
        fields[4] = "mob"
        fields[5] = "profile"
    }
    
    # company name
    /^ +\[[0-9]+\].*$/ {
        sub(/^ +\[[0-9]+\]/, "") # get rid of the Lynx reference
        # this is a bit naughty: our regex also matches this other link, but there's only one of them, so we just filter it
        if ($0 != "Upgrade this free listing here") data[count,"company"]=$0
    }
    
    # two line fields, easy!
    / +PO Box:$/ { getline; data[count,"pobox"]=$0 }
    / +Tel:$/ { getline; data[count,"tel"]=$0 }
    / +Fax:$/ { getline; data[count,"fax"]=$0 }
    / +Mob:$/ { getline; data[count,"mob"]=$0 }
    
    # multi-line field, tricky because it can be empty
    /^Company Profile$/ {
        getline # skip empty line
    
        # process lines until encountering Classification field
        s = ""
        do {
            s = s $0
            getline
        } while ($0 !~ / +Classification:$/)
        data[count,"profile"]=s
        count++ # the Classification field denotes the end of the company record
    }
    
    END {
        OFS=","
    
        # output CSV header row
        for ( key in fields ) {
            printf "\"" fields[key] "\","
        }
        printf "\n"
    
        # output data
        for ( i=0; i<count; i++ ) {
            for ( key in fields ) {
                printf "\"" trim(data[i,fields[key]]) "\","
            }
            printf "\n"
        }
    }
    

    Save as parse.awk and then invoke with ./parse.awk < sample.txt. Out comes a CSV, like this:

    "tel","fax","mob","profile","company","pobox",
    "(00974) 44568833","(00974) 44568811","(00974) 44568822","Walid Chamoun Architects (WCA) was founded in Beirut, Lebanon, in 1992,   committed to the blablabla","Walid Chamoun Architects WLL","55803, Doha, Qatar",
    "(00974) 44360011","","","","Al Ali Consulting & Engineering","467, Doha, Qatar",
    "(00974) 44352126","","","","Al Gazeerah Consulting Engineering","22414, Doha, Qatar",
    "(00974) 44448623","","","","Al Murgab Consulting Engineering","2856, Doha, Qatar",
    

    There’s comments that should hopefully explain what’s going on. This will run in plain old awk and doesn’t require fancy gawk features. Keep in mind that awk arrays are arbitrarily ordered. This is prone to breaking a whole bunch with varying input data, which is just one of the many reasons why you really should parse the HTML instead of such lynx -dump shenanigans.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

im trying to extract some files from a jar-file downloaded using java-webstart. below code
I'm trying to figure out how to extract some data from a string according
I'm trying to extract some information from an xml file using xslt. I've used
I've got a xml that I'm parsing and trying to extract some data from.
I'm trying to extract some data from XML using Linq to XML, and I
I am trying to extract some data from a website using a LINQ statement,
I'm trying to extract some data from various HTML pages using a python program.
I'm trying to extract some textual data from a PDF file. To do this,
I'm trying to extract some string from a file using python re, then MD5ing
I am trying to extract some data from URLs in a LOG file and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.