That’s XML. XML is a bad idea to parse via regular expression. The reason for this is because these XML snippets are semantically identical:
<ul type="disc">
<li
class="MsoNormal"
style="line-height: normal; margin: 0in 0in 10pt; color: black; mso-list: l1 level1 lfo2; tab-stops: list .5in; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto">
<span style="mso-fareast-font-family: 'Times New Roman'; mso-bidi-font-family: 'Times New Roman'">
<font size="3">
<font face="Calibri">Highlight the items you want to recover.</font>
</font>
</span>
</li>
</ul>
And:
<ul
type="disc"
><li
class="MsoNormal"
style="line-height: normal; margin: 0in 0in 10pt; color: black; mso-list: l1 level1 lfo2; tab-stops: list .5in; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
><span
style="mso-fareast-font-family: 'Times New Roman'; mso-bidi-font-family: 'Times New Roman'"
><font
size="3"
><font
face="Calibri"
>Highlight the items you want to recover.</font></font></span></li></ul>
And:
<ul type="disc"><li class="MsoNormal" style="line-height: normal; margin: 0in 0in 10pt; color: black; mso-list: l1 level1 lfo2; tab-stops: list .5in; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><span style="mso-fareast-font-family: 'Times New Roman'; mso-bidi-font-family: 'Times New Roman'"><font size="3"><font face="Calibri">Highlight the items you want to recover.</font></font></span></li></ul>
So please – use a parser. Since you’ve tagged perl
I’m going to include a perl solution:
use strict;
use warnings;
use XML::Twig;
XML::Twig->new(
twig_handlers => {
'span' => sub { print $_ ->att('style'), "\n" }
}
)->parsefile ( 'your_file.xml' );
This will print – on a new line – the style
attribute from your span
elements. Once you’ve extracted this, you can turn it into key-values via splitting on ;
and using :
as a key-value separator.
E.g.:
my $style = $_ ->att('style');
my %styles = map { split ( ': ', $_, 2 ) } split ( '; ', $style);
print Dumper \%styles;
But exactly what you do is as much a question of what you’re trying to accomplish.
solved Regular Expression in String [closed]