I have a page exported from a wiki and I would like to find

Question

0

Asked: June 9, 20262026-06-09T21:18:38+00:00 2026-06-09T21:18:38+00:00

I have a page exported from a wiki and I would like to find

0

I have a page exported from a wiki and I would like to find all the links on that page using bash. All the links on that page are in the form [wiki:<page_name>]. I have a script that does:

...
# First search for the links to the pages                                                                                                                                    
search=`grep '\[wiki:' pages/*`

# Check is our search turned up anything                                                                                                                                     
if [ -n "$search" ]; then
    # Now, we want to cut out the page name and find unique listings                                                                                                         
    uniquePages=`echo "$search" | cut -d'[' -f 2 | cut -d']' -f 1 | cut -d':' -f2 | cut -d' ' -f 1 | sort -u`
....

However, when presented with a grep result with multiple [wiki: text in it, it only pulls the last one and not any others. For example if $search is:

Before starting the configuration, all the required libraries must be installed to be detected by Cmake. If you have missed this step, see the [wiki:CT/Checklist/Libraries “Libr By pressing [t] you can switch to advanced mode screen with more details. The 5 pages are available [wiki:CT/Checklist/Cmake/advanced_mode here]. To obtain information about ea – ”’Installation of Cantera”’: If Cantera has not been correctly installed or if you do not have sourced the setup file ”’~/setup_cantera”’ you should receive the following message. Refer to the [wiki:CT/FormulationCantera “Cantera installation”] page to fix this problem. You can set the Cantera options to OFF if you plan to use built-in transport, thermodynamics and chemistry.

then it only returns CT/FormulationCantera and it doesn’t give me any of the other links. I know this is due to using cut so I need a replacement for the $uniquepages line.

Does anybody have any suggestions in bash? It can use sed or perl if needed, but I’m hoping for a one-liner to extract a list of page names if at all possible.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T21:18:40+00:00

Editorial Team

2026-06-09T21:18:40+00:00Added an answer on June 9, 2026 at 9:18 pm

egrep -o '\[wiki:[^]]*]' pages/* | sed 's/\[wiki://;s/]//' | sort -u

upd. to remove all after space without cut

egrep -o '\[wiki:[^]]*]' pages/* | sed 's/\[wiki://;s/]//;s/ .*//' | sort -u

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a page exported from a wiki and I would like to find

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply