[Solved] extract information from xml using regular expression

Question

Why couldn’t regex:
<post\\s*author=\"([^\"]+)\"[^>]+>[^</post>]*</post>
extract the author in following text.

Because

[^</post>]*

represents a character class and will match everything but the characters <, /, p, o, s, t, and > 0 or more times.

That doesn’t happen in your text. As for how to fix it, consider using the following regex

<post\s*author=\"([^\"]+?)\"[^>]+>(.|\s)*?<\/post>
// obviously, escape appropriate characters in Java String literals

with a multiline flag.

Accepted Answer

Why couldn’t regex:
<post\\s*author=\"([^\"]+)\"[^>]+>[^</post>]*</post>
extract the author in following text.

Because

[^</post>]*

represents a character class and will match everything but the characters <, /, p, o, s, t, and > 0 or more times.

That doesn’t happen in your text. As for how to fix it, consider using the following regex

<post\s*author=\"([^\"]+?)\"[^>]+>(.|\s)*?<\/post>
// obviously, escape appropriate characters in Java String literals

with a multiline flag.