Just to iterate on Jens’ comment, we have to guess: What is your expected output when additional information appears, e.g. http://therealzenstar.blogspot.fr/somedata.html
. Is it still blogspot.fr
? Are such examples needed to be adresed?
You said you want to replace “everything else” with ""
. Replace()
will replace everything that is matched with what you want. So, to replace it with ""
, you’d need to match everything that you do not want.It’s possible, however, it’s much easier to capture what you DO want and replace all the match with $1.
Assuming you always want only the domain.xx
, even if more information appears. Something like this will work: ^(?:https?:\/\/)?[^\/\s]*\.([^.\s\/]*\.[^.\s\/]*)(?:$|\/.*)
, as seen: https://regex101.com/r/hN8iQ7/1
A problem arises if your domains also include those with multiple extensions. I.e. domain.co.uk
. You’d need to adress them specifically (naming them), as it is very hard to generalize a way to distinguish these items.
^(?:https?:\/\/)?[^\/\s]*?\.([^.\s\/]*\.(?:co\.uk|[^.\s\/]*))(?:$|\/.*)
– with .co.uk
option added. https://regex101.com/r/hN8iQ7/2 .
yourregex.Replace(yourstring, "$1")
may do what you need.
solved Regex to extract only domain from sub-domains [duplicate]