Why couldn’t regex:
<post\\s*author=\"([^\"]+)\"[^>]+>[^</post>]*</post>
extract the author in following text.
Because
[^</post>]*
represents a character class and will match everything but the characters <
, /
, p
, o
, s
, t
, and >
0 or more times.
That doesn’t happen in your text. As for how to fix it, consider using the following regex
<post\s*author=\"([^\"]+?)\"[^>]+>(.|\s)*?<\/post>
// obviously, escape appropriate characters in Java String literals
with a multiline flag.
4
solved extract information from xml using regular expression