Home / Blog / / Using Keyword Clustering To Identify Keyword Cannibalization – Case Study No.2

Using Keyword Clustering To Identify Keyword Cannibalization – Case Study No.2

Ever heard the term ‘content cannibalization’? It’s when your own web pages end up competing with each other, weakening their impact. This issue, though often missed, can drastically affect your site’s performance. It’s easy to fall into this trap, especially when your site has multiple writers adding content over time or if you’re in a franchise setup where everyone’s creating content without a centralized strategy. From my experience, it’s a common pitfall. In this guide, I’ll walk you through a straightforward way to identify and resolve these overlaps using keyword clustering.

Just to note: this method is especially useful for identifying keyword cannibalisation on editorial sites or blogs. If you’re dealing with issues on e-commerce, travel, or property-type pages, we’ve got another guide tailored for that just here.

Watch the video below or carry on reading, whichever medium you prefer. The video doesn’t actually explain what to do once you’ve identified the Keywords that are cannibalising each other. If you need this information, scroll to the bottom of the guide.

Step 1 – Choose Your Website for Inspection

For our demonstration, we’ll use Buzzoid. While perusing their blog section, I stumbled upon multiple posts targeting similar queries. Ever had that “Hmm, these seem very similar…” feeling?

I saw multiple posts that, over time, seemed to have been re-created rather than updated.

Step 2 – Gather the URLs for that site

There are multiple ways to extract URLs:

  • You can use tools like Screaming Frog or Sitebulb to crawl the site and extract the URLs.
  • You can grab the URLs from the website’s sitemap if available.

For this particular example, I used the sitemap method and the site scraper extension on Chrome to quickly copy all the URLs. If you’ve never accessed a website sitemap before, you can usually do so by appending “sitemap_index.xml” to the URL. For example, https://buzzoid.com/sitemap_index.xml.

Here is Buzzoids sitemap:

Then you can use the scraper extension to quickly copy all the URLs by right-clicking and selecting “scrape similar”.

Paste these into Google sheets or excel.

Step 3 – Filter only relevant URLs (optional)

For this exercise, I focused only on English URLs so I got rid of all the other language variants.

But, of course, customize this step to fit your needs. The goal is to have a clean list of URLs to work with.

Step 4 – Preparing the Data

Now, here’s where it gets fun:

  • Tokenize the URLs: Transform those URLs into keyword-esque forms. Think of it as streamlining them by removing all the slashes, dashes, or any other clutter. You can use the formula: =SUBSTITUTE(SUBSTITUTE(REPLACE(A2,1,LEN("https://buzzoid.com/"),"") , "-", " "), "/", "") but replace the domain with yours.
  • Add a ‘Search Volume’ Column: In a few steps time you’ll be uploading this list into Keyword Insights’s clustering tool. As part of the process, the tool requires a search volume input. For that reason, you’ll need to add this data, but you can just make the numbers up if you want. This is purely for tool compatibility. In my example, I just gave them all the arbitrary number of “10”.

Step 5 – Upload to Keyword Insights

Download your refined list from Google Sheets or Excel. Ideally, download it in CSV format. Then:

  • Map and Match: Ensure you associate the tokenized keyword with its respective column. Remember the arbitrary search volume column? Yep, that goes in here too.
  • Generate: Hit that beautiful “Generate Report” button.

You might be scratching your head, thinking, “Why cluster these tokenised keywords?” Well, keyword clustering is all about grouping similar keywords together. When we talk about a “keyword cluster,” we mean a bunch of keywords that essentially mean the same thing and should ideally be targeted on one webpage. In our methodology, the tokenised keyword actually represents the URL. So, when we find two or more tokenised keywords (read: URLs) in the same cluster, it’s a clear sign that they should be merged into a single page. Put simply: multiple URLs doing the same thing? Not on our watch. We want just one.

Want a deep dive into keyword clustering? Check our guide here.

Step 6 – Pivot table time

Once the report is ready (you’ll receive an email notification), grab the Google Drive version and open it. Right away, you’ll spot content pieces stepping on each other’s toes—classic signs of cannibalization.

Although the “keyword” in this report is just a tokenised version of the URL, I would still prefer to convert it back to a URL. Especially if I’m sending this report to a client to fix. So we’ll need to use the vlookup function to convert this “keyword” back into a URL.

  • Open the Cluster, Context, and Rank tab: Open the hidden tab within the sheet.
  • Highlight and copy all the data: And then paste this into your other Google sheet. The one where you originally pasted the URLs from the sitemap and then tokenised them into Keywords.
  • V Lookup Magic: Associate each URL with its cluster. To do this you’ll use the Vlookup function to pull the cluster name into the same tab as your sitemap URLs and tokenised keyword. The data in Sheet11 below is simply the data I pasted in from the hidden tab in the step above.
  • Craft a Pivot Table: Highlight all the data and insert a pivot table.
  • Finally, make sure your pivot table has the following settings in the editor:

And your table should look like this:

As you can see, it’s really easy to now see all the URLs that should be combined.

I once did this with a huge franchise company and found thousands of examples, purely because the content strategy wasn’t centralised. See below just one instance where they had SEVENTEEN pieces of cannibalising content.

Of course, this approach relies on your URLs being logically structured and carrying the actual keyword(s). If that’s not the case, circle back to step 2. You can use tools like Screaming Frog to extract other relevant “keywords”. With Screaming Frog, you have options like pulling the <H1>, page title, or even the meta description. You’ll have to then do some slightly more complex vlookup magic. Diving deeper into that is beyond this article’s scope, but if you need guidance, just give me a shout.

What do I do with cannibalising content?

Addressing cannibalizing content is crucial for SEO optimization and to avoid self-competition. Here’s what you can do with cannibalizing content:

  1. Identify the Stronger Piece: Before making changes, decide which content piece is more valuable or better-performing. Use metrics like organic traffic, user engagement, and backlinks to make this determination. If you used Screaming Frog in Step 2, you can actually use their API to pull these metrics in from GA and GSC too. You can then also include these in your vlookup so that you not only know which URLs are competing with one another, but which should be the one you keep.
  2. Merge Similar Articles: If you have multiple articles on the same topic, consider combining them into one comprehensive guide. This not only consolidates the value but can also be more useful for readers.
  3. 301 Redirect: After merging content, use a 301 redirect from the less valuable URL to the consolidated piece to retain SEO value and guide visitors to the right page.
  4. Re-optimize the Content: Update the title, meta description, and content itself to reflect the most relevant keywords and ensure it provides value.
  5. Update Internal Links: Ensure all internal links within your site that pointed to the old, cannibalizing content now point to the consolidated or chosen piece.
  6. Check External Backlinks: If you have significant backlinks to the pages you’re addressing, reach out to those sites and request they update the link to the new, consolidated URL.
  7. Avoid Future Cannibalization: Implement a content strategy that includes regular content audits. Before publishing new content, ensure it doesn’t overlap with what’s already present on your site.
  8. Monitor Performance: After making these changes, regularly check the performance of the consolidated content. Use tools like Google Analytics and Google Search Console to keep an eye on traffic, rankings, and user engagement.

Remember, the goal is to offer the best user experience and ensure search engines have a clear understanding of the subject and relevance of your content. By addressing cannibalization, you’re optimizing for both!

Wrapping up…

With these steps, you can quickly highlight potential content overlaps, suggesting which articles or pages might benefit from a merger.

Found lots of overlapping content? You’re not alone. In just a few minutes, I spotted numerous cannibalization opportunities on Buzzoid. How does your site fare?

If any step got you tangled, drop a comment below. Always here to help!

Also – remember to check out our other case study on how we used keyword clustering to solve content cannibalisation. It uses a different methodology that you may find useful.

Andy Chadwick

Andy Chadwick

Andy Chadwick is a digital marketing consultant, specializing in SEO. He has been in the industry since 2013 and worked with start-up companies (he grew his own start-up to a turnover of £2.5 million in 3 years) as well as international organizations. He’s also worked in-house as well as agency side. Andy runs a successful SEO consulting business in the UK as well as Snippet Digital SEO consultancy with Suganthan.

Start your trial today for only $1

Sign up today for a $1 trial and enjoy access to 6000 keyword clustering credits, 3 Keyword discovery searches, 1 Content Brief and Pro versions of SERP Similarity, SERP Explorer.

Table of contents

    Start your trial today for only $1

    Sign up today for a $1 trial and enjoy access to 6000 keyword clustering credits, 3 Keyword discovery searches, 1 Content Brief and Pro versions of SERP Similarity, SERP Explorer.

    Subscribe to our newsletter

    Subscribe to get our latest news, offers, insights, and any updates.