Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8649423
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T13:33:29+00:00 2026-06-12T13:33:29+00:00

New to bash scripting, The previous answers didn’t helped me. I am trying to

  • 0

New to bash scripting, The previous answers didn’t helped me.

I am trying to harvest ids from web pages and I need to parse page1, get a list of ids, and use them to parse corresponding web pages.

The thing is I’m not sure how to write the script…

Here’s what I would like to do:

  1. Parse url1 according to regexp. Output: list of extracted ids (101, 102, 103, etc).
  2. Parse each url with output id, for example: parse (http://someurl/101), then parse (http://someurl/102), etc.

So far, I have come up with this command:

curl http://subtitle.co.il/browsesubtitles.php?cs=movies | grep -o -P '(?<=list.php\?mid=)\d+'  

The command above works, and gives a list of ids.

Any advice for the next steps? Am I on the right track?

Thanks!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T13:33:31+00:00Added an answer on June 12, 2026 at 1:33 pm

    You’re next step would probably do a loop on all ids:

    parse_url () {
        for id in $(grep -o -P '(?<=list.php\?mid=)\d+' "$1"); do
            # Use $id
            url="http://someurl/$id"
            # or parse for the URL with the ID
            url="$(grep -o -P 'http://[a-zA-Z./%0-9:&;=?]*list.php\?mid=$id[a-zA-Z./%0-9:&;=?]*' "$1")"
            # Get page
            new_page_file="$(mktemp)"
            wget -q -O "$new_page_file" "$url"
            # Parse url
            parse_url "$new_page_file"
            # Delete old temporary file
            rm "$new_page_file"
        done
    }
    
    wget -q -O file.html http://subtitle.co.il/browsesubtitles.php?cs=movies
    parse_url file.html
    

    Here we have defined a function called parse_url, that iterates over all ids it finds in a file passed as an argument (ie. $1 is the first argument passed to the function).

    We can then use the ID to generate a URL, or we can grep the URL from the same file, now extracting the ID. Note that the regular expression for finding the URL assumes that the URL has a specific format:

    1. It starts with “http://”
    2. It only contains the characters that are used between the square brackets

    To download the page, we create a temporary file with the mktemp command. Since you said you’re new to bash scripting, I’ll just give a quick explanation for the $(...)s that appears. They run a command or a series of commands that are specified between parenthesis, then execute them, capturing their standard output and placing it where the $(...) was. In this case, it is placed inside the double-quotes that we assign to a $new_page_file variable. Therefore $new_page_file contains the name of a random file name created for storing the temporary file.

    We can then download the URL into that temporary file, call the function to parse it, and then delete it.

    To call the function initially, we download the initial URL into a file file.html, and then call the function passing that file name as the argument.

    EDIT: Added recursion, based on Barmar‘s answer

    Hope this helps a little =)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm new to bash scripting so would need your help. I am trying to
I am new to bash scripting and trying to learn a few things. Here
I am very new to Bash scripting. I am trying to write a script
I am new to bash-scripting & trying to understand how things work. It's all
I'm very new to bash scripting and I'm trying to practice by making this
I need to generate an XML file in bash(I am new to bash/scripting languages,
I'm quiet new to bash scripting, and I would like to convert recursively all
HI, I'm completely new to Bash and StackOverflow. I need to move a set
I am new in bash scripting and I have to write a script that
I'm rather new to bash scripting, and Google isn't as useful as I'd like

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.