[Solved] sed, awk, perl or lex: find strings by prefix+regex, ignoring rest of input [closed]


If you won’t have more than one of the pattern on a single line, I’d probably use sed:

sed -n -e 's%.*https://\([-.0-9A-Za-z]\{1,\}\.[A-Za-z]\{2,\}\).*%\1%p'

Given the data file:

Nothing here
Before https://example.com after
https://example.com and after
Before you get to https://www.example.com
And double your https://example.com for fun and happiness https://www.example.com in triplicate https://a.bb
and nothing here

The sed script produces one entry per line, showing the last entry when there’s more than one on the line:

example.com
example.com
www.example.com
a.bb

A Perl script can be used for multiple entries per line:

$ perl -nle 'print $1 while (m%https://([-.0-9A-Za-z]+\.[A-Za-z]{2,})%g);' data
example.com
example.com
www.example.com
example.com
www.example.com
a.bb
$

1

solved sed, awk, perl or lex: find strings by prefix+regex, ignoring rest of input [closed]