First of all, you need to learn the basic syntax of the pattern
of NSRegularExpression
:
pattern
does not contain delimiterspattern
does not contain modifiers, you need to pass such info asoptions
- When you want to use meta-character
\
, you need to escape it as\\
in Swift String.
So, the line creating an instance of NSRegularExpression
should be something like this:
let regex = try NSRegularExpression(pattern: "<([a-z]*)\\b[^>]*>(.*?)</\\1>", options: .caseInsensitive)
But, as you may already know, your pattern does not contain any code to match href
or capture its value.
Something like this would work with your example html
:
let pattern = "<a\\b[^>]*\\bhref\\s*=\\s*(\"[^\"]*\"|'[^']*')[^>]*>((?:(?!</a).)*)</a\\s*>"
let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let html = "<p>This is a simple text with some embedded <a\n" +
"href=\"http://example.com/link/to/some/page?param1=77¶m2=22\">links</a>.\n" +
"This is a <a href=\"https://exmp.le/sample-page/?uu=1\">different link</a>."
let matches = regex.matches(in: html, options: [], range: NSRange(0..<html.utf16.count))
var resultDict: [String: String] = [:]
for match in matches {
let hrefRange = NSRange(location: match.rangeAt(1).location+1, length: match.rangeAt(1).length-2)
let innerTextRange = match.rangeAt(2)
let href = (html as NSString).substring(with: hrefRange)
let innerText = (html as NSString).substring(with: innerTextRange)
resultDict[innerText] = href
}
print(resultDict)
//->["different link": "https://exmp.le/sample-page/?uu=1", "links": "http://example.com/link/to/some/page?param1=77¶m2=22"]
Remember, my pattern
above may mistakenly detect ill-formed a-tags or miss some nested structure, also it lacks feature to work with HTML character entities…
If you want to make your code more robust and generic, you’d better consider adopting HTML parsers as suggested by ColGraff and Rob.
solved Regex to match anchor tag and its href