Final update
After yet another goal post shifting, I’ve updated my CTE again.
This is the final update since even if you are going to change your demands again I’ve had enough.
Please take my advice for future questions:
- Defind the problem as well as you can.
- Provide the most accurate table structure and sample data as ddl+dml.
Don’t link to sqlfiddle as it suffers from a lot of downtime. - Provide the most accurate expected output
- Show your efforts to solve the problem.
Here is the last updated CTE, everything else remains as it was in the last update. sqlfiddle here.
;WITH CTE AS (
SELECT OriginalUrl,
NewUrl,
RIGHT(OriginalUrl,
LEN(OriginalUrl) -
CHARINDEX("https://stackoverflow.com/", OriginalUrl,
CASE WHEN PATINDEX('%://%/%', OriginalUrl) > 0 THEN
PATINDEX('%://%/%', OriginalUrl)+3
ELSE
0
END
)+1
) As Product
FROM MyTable
)
Update
Updated sample data (again….)
-- Create sample table and data.
-- DDL
create table MyTable (
OriginalUrl varchar(50),
NewUrl varchar(50)
)
-- DML
insert into MyTable VALUES
('/blog', '/blog'),
('http://gaming.corsair.com/blog', 'http://gaming.corsair.com/blog'),
('http://www.corsair.com/blog', 'http://www.corsair.com/blog'),
('http://www.corsair.com/es/blog', 'http://www.corsair.com/es/blog')
I’ve updated the CTE to include in ‘Product’ everything from the last “https://stackoverflow.com/” char, so product for the first 3 rows is now /blog
, and for the last one it’s /es/blog
. so everything after the protocol and domain parts of the URL is considered a product now. Note – this also should apply to https
and any other protocol.
;WITH CTE AS (
SELECT OriginalUrl,
NewUrl,
RIGHT(OriginalUrl,
LEN(OriginalUrl) -
CHARINDEX("https://stackoverflow.com/", OriginalUrl,
PATINDEX('%://%', OriginalUrl)+3 -- get the location right after '://'
)+1 -- get the location right after the first / after the patindex
) As Product
FROM MyTable
)
My query on the new CTE remain as it was in the last attempt:
SELECT T1.OriginalUrl, T1.NewUrl,T2.Product
FROM CTE T1
INNER JOIN (
SELECT Product
FROM CTE
GROUP BY Product
HAVING COUNT(1) > 1
) T2 ON(T1.Product = T2.Product)
Here is the output:
OriginalUrl NewUrl Product
---------------------------------------------------------------------------
/blog /blog /blog
http://gaming.corsair.com/blog http://gaming.corsair.com/blog /blog
http://www.corsair.com/blog http://www.corsair.com/blog /blog
You can play with it yourself on sqlFiddle.
Earlier versions
So, by duplicates you mean all rows that have the same string after the last /
in OriginalUrl column?
If so, you might want to try this:
-- Create sample table and data.
-- DDL
create table MyTable (
OriginalUrl varchar(50),
NewUrl varchar(50)
)
-- DML
insert into MyTable VALUES
('/blog', '/en-us/blog'),
('/blog', '/en-us/blog'),
('http://www.corsair.com/blog', 'http://www.corsair.com/blog'),
('http://gaming.corsair.com/blog', 'http://gaming.corsair.com/blog'),
('blablabla/blog', 'yadayada/blog'),
('I don''t see what is wrong with this/Answer', 'It seems to be working/Fine'),
('Unless my/Answer', 'assumes duplicates as something else then/you'),
('300r', '300r')
Note: You should use this way to provide sample data when asking questions about sql.
This way we can copy your DDL + DML to sqlfiddle or our own environments and actually test the answers we give.
I’ve Used a CTE to insulate the last word after the last / in your sample data,
so that I will only have to write the RIGHT expression once.
;WITH CTE AS (
SELECT OriginalUrl,
NewUrl,
RIGHT(OriginalUrl, CASE WHEN CHARINDEX("https://stackoverflow.com/", OriginalUrl) > 0 THEN
CHARINDEX("https://stackoverflow.com/", REVERSE(OriginalUrl))-1
ELSE
LEN(OriginalUrl)
END) As Product
FROM MyTable
)
SELECT DISTINCT T1.OriginalUrl, T1.NewUrl,T1.Product
FROM CTE T1
INNER JOIN CTE T2
ON(T1.Product = T2.Product)
WHERE T1.OriginalUrl <> T2.OriginalUrl
Update
Using the same CTE, try this:
SELECT T1.OriginalUrl, T1.NewUrl,T2.Product
FROM CTE T1
INNER JOIN (
SELECT Product
FROM CTE
GROUP BY Product
HAVING COUNT(1) > 1
) T2 ON(T1.Product = T2.Product)
Results:
OriginalUrl NewUrl Product
-------------------------------------------------- -------------------------------------------------- --------------------------------------------------
I don't see what is wrong with this/Answer It seems to be working/Fine Answer
Unless my/Answer assumes duplicates as something else then/you Answer
/blog /en-us/blog blog
/blog /en-us/blog blog
http://www.corsair.com/blog http://www.corsair.com/blog blog
http://gaming.corsair.com/blog http://gaming.corsair.com/blog blog
blablabla/blog yadayada/blog blog
10
solved Show each row that is a duplicate