If you won’t have more than one of the pattern on a single line, I’d probably use sed
:
sed -n -e 's%.*https://\([-.0-9A-Za-z]\{1,\}\.[A-Za-z]\{2,\}\).*%\1%p'
Given the data file:
Nothing here
Before https://example.com after
https://example.com and after
Before you get to https://www.example.com
And double your https://example.com for fun and happiness https://www.example.com in triplicate https://a.bb
and nothing here
The sed
script produces one entry per line, showing the last entry when there’s more than one on the line:
example.com
example.com
www.example.com
a.bb
A Perl script can be used for multiple entries per line:
$ perl -nle 'print $1 while (m%https://([-.0-9A-Za-z]+\.[A-Za-z]{2,})%g);' data
example.com
example.com
www.example.com
example.com
www.example.com
a.bb
$
1
solved sed, awk, perl or lex: find strings by prefix+regex, ignoring rest of input [closed]