Like this (taking one for the team :)? Using awk (Notice: it creates files like Abc 1:2
or whatever is between <b>
and <sup>
):
$ awk '
BEGIN {
FS="<sup>" # split at this delimiter
}
{
if($1==p) { # if first part equals first part of previous split
b=b " " $0 # append to the output buffer
}
else { # if first part differs, do stuff
if(NR>1) { # first line needs not printing
print b >> t[n]
# close t[n] # uncomment if if needed
}
n=split($1,t,/<b>/) # get the changing part
b=$0 # reset buffer
}
p=$1 # create previous to compare on next round
}
END {
print b >> t[n] # flush the rest of the buffer
}' file
Output of cat Abc\ 1\:2
:
<p><nsup></nsup> <b>Abc 1:2<sup>varied text <p><nsup></nsup> <b>Abc 1:2<sup>varied text
Depending on the awk flavor used, if you start running out of file descriptors, add a close(t[n])
after the print >>
s.
2
solved Can it work together head, sed and regex into one bash script?