New to bash scripting, The previous answers didn’t helped me. I am trying to

Question

0

Asked: June 12, 20262026-06-12T13:33:29+00:00 2026-06-12T13:33:29+00:00

New to bash scripting, The previous answers didn’t helped me. I am trying to

0

New to bash scripting, The previous answers didn’t helped me.

I am trying to harvest ids from web pages and I need to parse page1, get a list of ids, and use them to parse corresponding web pages.

The thing is I’m not sure how to write the script…

Here’s what I would like to do:

Parse url1 according to regexp. Output: list of extracted ids (101, 102, 103, etc).
Parse each url with output id, for example: parse (http://someurl/101), then parse (http://someurl/102), etc.

So far, I have come up with this command:

curl http://subtitle.co.il/browsesubtitles.php?cs=movies | grep -o -P '(?<=list.php\?mid=)\d+'

The command above works, and gives a list of ids.

Any advice for the next steps? Am I on the right track?

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T13:33:31+00:00

You’re next step would probably do a loop on all ids:

parse_url () {
    for id in $(grep -o -P '(?<=list.php\?mid=)\d+' "$1"); do
        # Use $id
        url="http://someurl/$id"
        # or parse for the URL with the ID
        url="$(grep -o -P 'http://[a-zA-Z./%0-9:&;=?]*list.php\?mid=$id[a-zA-Z./%0-9:&;=?]*' "$1")"
        # Get page
        new_page_file="$(mktemp)"
        wget -q -O "$new_page_file" "$url"
        # Parse url
        parse_url "$new_page_file"
        # Delete old temporary file
        rm "$new_page_file"
    done
}

wget -q -O file.html http://subtitle.co.il/browsesubtitles.php?cs=movies
parse_url file.html

Here we have defined a function called parse_url, that iterates over all ids it finds in a file passed as an argument (ie. $1 is the first argument passed to the function).

We can then use the ID to generate a URL, or we can grep the URL from the same file, now extracting the ID. Note that the regular expression for finding the URL assumes that the URL has a specific format:

It starts with “http://”
It only contains the characters that are used between the square brackets

To download the page, we create a temporary file with the mktemp command. Since you said you’re new to bash scripting, I’ll just give a quick explanation for the $(...)s that appears. They run a command or a series of commands that are specified between parenthesis, then execute them, capturing their standard output and placing it where the $(...) was. In this case, it is placed inside the double-quotes that we assign to a $new_page_file variable. Therefore $new_page_file contains the name of a random file name created for storing the temporary file.

We can then download the URL into that temporary file, call the function to parse it, and then delete it.

To call the function initially, we download the initial URL into a file file.html, and then call the function passing that file name as the argument.

EDIT: Added recursion, based on Barmar‘s answer

Hope this helps a little =)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

New to bash scripting, The previous answers didn’t helped me. I am trying to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply