[Solved] how to remove all tag in c# using regex.replace [closed]


You should never use regex to parse html, you need html parser. Here is an example how you can do it.

You need to add this reference in your project:

Install-Package HtmlAgilityPack

The code:

 static void Main(string[] args)
        {
            string html = @"<!DOCTYPE html>
<html>
<body>

<h1>My First Heading</h1>

<p>My first paragraph.</p>

<table>
    <tr>
        <td>A!!</td>
        <td>te2</td>
        <td>2!!</td>
        <td>te43</td>
        <td></td>
        <td> !!</td>
        <td>.!!</td>
        <td>te53</td>
        <td>te2</td>
        <td>texx</td>
    </tr>
</table>

<h4 class=""nikstyle_title""><a rel=""nofollow"" target=""_blank"" href=""http://www.niksalehi.com/ccount/click.php?ref=ZDNkM0xuQmxjbk5wWVc1MkxtTnZiUT09&id=117""><span class=""text-matn-title-bold-black"">my text</span></a></h4>

</body>
</html>";

            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(html);

            List<HtmlNode> tdNodes = doc.DocumentNode.Descendants().Where(x => x.Name == "h4" && x.Attributes.Contains("class") && x.Attributes["class"].Value.Contains("nikstyle_title")).ToList();


            foreach (HtmlNode node in tdNodes)
            {
                node.InnerHtml = "";
            }

            string html2 = doc.DocumentNode.InnerHtml;
        }

EDIT:

For your second desire -> Remove every <a></a> tag with `href=”http://www.sample.com”:

    static void Main(string[] args)
        {
            string html = @"<!DOCTYPE html>
<html>
<body>

<h1>My First Heading</h1>

<p>My first paragraph.</p>

<table>
    <tr>
        <td>A!!</td>
        <td>te2</td>
        <td>2!!</td>
        <td>te43</td>
        <td></td>
        <td> !!</td>
        <td>.!!</td>
        <td>te53</td>
        <td>te2</td>
        <td>texx</td>

    </tr>
</table>

<h4 class=""nikstyle_title""><a rel=""nofollow"" target=""_blank"" href=""http://www.sample.com""><span class=""text-matn-title-bold-black"">my text</span></a></h4>
<div><a rel=""nofollow"" target=""_blank"" href=""http://www.sample.com""><span class=""text-matn-title-bold-black"">my text</span></a></div>
</body>
</html>";

            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(html);

            List<HtmlNode> tdNodes = doc.DocumentNode.Descendants().Where(x => x.Name == "a" && x.Attributes.Contains("href") && x.Attributes["href"].Value.Contains("http://www.sample.com")).ToList();

            foreach (HtmlNode node in tdNodes)
            {

                node.Remove();
            }

            string html2 = doc.DocumentNode.InnerHtml;
        }

Also personally I prefer to use @ for escaping because it is more readable, you can try like in my example. When you are using @ you will escape with double quotes-example: class=""a"";

8

solved how to remove all tag in c# using regex.replace [closed]