[Solved] Show each row that is a duplicate


Final update

After yet another goal post shifting, I’ve updated my CTE again.
This is the final update since even if you are going to change your demands again I’ve had enough.
Please take my advice for future questions:

  • Defind the problem as well as you can.
  • Provide the most accurate table structure and sample data as ddl+dml.
    Don’t link to sqlfiddle as it suffers from a lot of downtime.
  • Provide the most accurate expected output
  • Show your efforts to solve the problem.

Here is the last updated CTE, everything else remains as it was in the last update. sqlfiddle here.

;WITH CTE AS (

SELECT OriginalUrl, 
       NewUrl,
       RIGHT(OriginalUrl, 
             LEN(OriginalUrl) - 
                 CHARINDEX("https://stackoverflow.com/", OriginalUrl,
                     CASE WHEN PATINDEX('%://%/%', OriginalUrl) > 0 THEN
                         PATINDEX('%://%/%', OriginalUrl)+3
                     ELSE
                        0
                     END
                 )+1
             ) As Product
FROM MyTable 
)

Update

Updated sample data (again….)

-- Create sample table and data. 
-- DDL
create table MyTable (
  OriginalUrl varchar(50),
  NewUrl varchar(50)
)
-- DML
  insert into MyTable VALUES 
('/blog', '/blog'),
('http://gaming.corsair.com/blog', 'http://gaming.corsair.com/blog'),
('http://www.corsair.com/blog', 'http://www.corsair.com/blog'),
('http://www.corsair.com/es/blog', 'http://www.corsair.com/es/blog')

I’ve updated the CTE to include in ‘Product’ everything from the last “https://stackoverflow.com/” char, so product for the first 3 rows is now /blog, and for the last one it’s /es/blog. so everything after the protocol and domain parts of the URL is considered a product now. Note – this also should apply to https and any other protocol.

;WITH CTE AS (

SELECT OriginalUrl, 
       NewUrl,
       RIGHT(OriginalUrl, 
             LEN(OriginalUrl) - 
                 CHARINDEX("https://stackoverflow.com/", OriginalUrl,
                     PATINDEX('%://%', OriginalUrl)+3 -- get the location right after '://' 
                 )+1 -- get the location right after the first / after the patindex
             ) As Product
FROM MyTable 
)

My query on the new CTE remain as it was in the last attempt:

SELECT T1.OriginalUrl, T1.NewUrl,T2.Product 
FROM CTE T1 
INNER JOIN (
    SELECT Product
    FROM CTE
    GROUP BY Product 
    HAVING COUNT(1) > 1
) T2 ON(T1.Product = T2.Product)

Here is the output:

OriginalUrl                       NewUrl                            Product
---------------------------------------------------------------------------
/blog                             /blog                             /blog
http://gaming.corsair.com/blog    http://gaming.corsair.com/blog    /blog
http://www.corsair.com/blog       http://www.corsair.com/blog       /blog

You can play with it yourself on sqlFiddle.

Earlier versions

So, by duplicates you mean all rows that have the same string after the last / in OriginalUrl column?
If so, you might want to try this:

-- Create sample table and data. 
-- DDL
create table MyTable (
  OriginalUrl varchar(50),
  NewUrl varchar(50)
)
-- DML
  insert into MyTable VALUES 
('/blog', '/en-us/blog'),
('/blog', '/en-us/blog'),
('http://www.corsair.com/blog', 'http://www.corsair.com/blog'),
('http://gaming.corsair.com/blog', 'http://gaming.corsair.com/blog'),
('blablabla/blog', 'yadayada/blog'),
('I don''t see what is wrong with this/Answer', 'It seems to be working/Fine'),
('Unless my/Answer', 'assumes duplicates as something else then/you'),
('300r', '300r')

Note: You should use this way to provide sample data when asking questions about sql.
This way we can copy your DDL + DML to sqlfiddle or our own environments and actually test the answers we give.

I’ve Used a CTE to insulate the last word after the last / in your sample data,
so that I will only have to write the RIGHT expression once.

;WITH CTE AS (
SELECT OriginalUrl, 
        NewUrl,
        RIGHT(OriginalUrl, CASE WHEN CHARINDEX("https://stackoverflow.com/", OriginalUrl) > 0 THEN
                               CHARINDEX("https://stackoverflow.com/", REVERSE(OriginalUrl))-1
                           ELSE 
                               LEN(OriginalUrl)
                           END) As Product
FROM MyTable 
)

SELECT DISTINCT T1.OriginalUrl, T1.NewUrl,T1.Product 
FROM CTE T1 
INNER JOIN CTE T2
ON(T1.Product = T2.Product)
WHERE T1.OriginalUrl <> T2.OriginalUrl 

Update
Using the same CTE, try this:

SELECT T1.OriginalUrl, T1.NewUrl,T2.Product 
FROM CTE T1 
INNER JOIN (
    SELECT Product
    FROM CTE
    GROUP BY Product 
    HAVING COUNT(1) > 1
) T2 ON(T1.Product = T2.Product)

Results:

    OriginalUrl                                        NewUrl                                             Product
    -------------------------------------------------- -------------------------------------------------- --------------------------------------------------
    I don't see what is wrong with this/Answer         It seems to be working/Fine                        Answer
    Unless my/Answer                                   assumes duplicates as something else then/you      Answer
    /blog                                              /en-us/blog                                        blog
    /blog                                              /en-us/blog                                        blog
    http://www.corsair.com/blog                        http://www.corsair.com/blog                        blog
    http://gaming.corsair.com/blog                     http://gaming.corsair.com/blog                     blog
    blablabla/blog                                     yadayada/blog                                      blog

10

solved Show each row that is a duplicate