[Solved] Regex to extract only domain from sub-domains [duplicate]

Question

Just to iterate on Jens’ comment, we have to guess: What is your expected output when additional information appears, e.g. http://therealzenstar.blogspot.fr/somedata.html. Is it still blogspot.fr? Are such examples needed to be adresed?

You said you want to replace “everything else” with "". Replace() will replace everything that is matched with what you want. So, to replace it with "", you’d need to match everything that you do not want.It’s possible, however, it’s much easier to capture what you DO want and replace all the match with $1.

Assuming you always want only the domain.xx, even if more information appears. Something like this will work: ^(?:https?:\/\/)?[^\/\s]*\.([^.\s\/]*\.[^.\s\/]*)(?:$|\/.*), as seen: https://regex101.com/r/hN8iQ7/1

A problem arises if your domains also include those with multiple extensions. I.e. domain.co.uk. You’d need to adress them specifically (naming them), as it is very hard to generalize a way to distinguish these items.

^(?:https?:\/\/)?[^\/\s]*?\.([^.\s\/]*\.(?:co\.uk|[^.\s\/]*))(?:$|\/.*) – with .co.uk option added. https://regex101.com/r/hN8iQ7/2 .

yourregex.Replace(yourstring, "$1") may do what you need.

Accepted Answer