Jump to content

What is duplicate content? Questions and answers


Trip

Recommended Posts

In this post try to clarify some issues which many people seemed to be scared of.
I am referencing to the "Learn More about the Canonical Link Element" post from Matt Cutts which can be read
here:
http://www.mattcutts.com/blog/canonical-link-tag-video/
I think the strongest duplicate content issue is, that your domain can be reached under different urls and if your internal/extarnal linking is not consistant than it will shurely have a negative impact,
Examples for duplicate content - a set of urls which probably lead to the same content:

Duplicate Content
These URLs are all different:
www.example.com
example.com
www.example.com/
example.com/
www.example.com/index.html
example.com/index.html
www.example.com/Home.aspx
example.com/Home.aspx

Matt gives us also the solution on howto handle this:
1. Change your Content Management System (CMS) to generate only the urls you want. "Normalize" urls
(built in, maybe not 100% perfect but not that bad)
2. Pick one "canonical" url and ensure you link consistently within your site
(this can be tricky as same products may appear under different categories. I personally rewrote the link generation of ps that all links are leading to the product page without category in the url. From my experience I am not quite shure if there was a positive impact or not. It is also not clear if extra keywords in the category have a positive impact or not?! I think there is no clear yes or no. I saw some shops which are keyword stuffing within the category part of the url. If that really helps or hurts is hard to determine and might also depend on reputation of the website and many other factors)
3. Make all the non-canonical urls do a permanent (301) HTTP redirect to the canonical/preferred url
(Matt's answer to most of the issues is the canonical url tag. In older versions of PS it was not working but I think they are improving it)
4. Google's Webmaster Tools: specify www vs. non-www
(recommended and not so difficult to setup)
5. Break ties in Google by submitting your preferred url in a sitemaps file
(there are lots of modules out there to do this)

Matt shows also some tough duplicate content issues:
Tough Duplicate Content Issues
Sometimes can't generate permanent/301 redirects
Can't help how people link to you
Uppercase/lowercase paths
Session IDs
Tracking codes, analytics, and landing pages
Sorting by ascending vs. descending
Breadcrumbs (the user's previous web page)

Some are concerning prestashop and some are not.
Matt again recommends (beside the techniques mentioned above) to use the canonical url tag.
Before I go on with some other "facts" I would like to invite to discuss the pros and cons about pointing to the product page with the category in the url or not?

Maybe there is someone out there who likes to share his or her experience?
My point of view is that from a design view it is easier to always link to the product pages without the category in the url but it migth also loose some keywords in the url part. What is your opinion?

So to contradict to some thoughts of myself in another post there can be more severe negative impacts on duplicate content.
I'll quote some statements from Matt Cutts which he gave in another interview and which are summarized here
http://www.marketingwords.com/blog/?p=769

1) Duplicate Content Pages Might Not Get Crawled
“Imagine we crawl three pages from a site, and then we discover that the two other pages were duplicates of the third page. We’ll drop two out of the three pages and keep only one, and that’s why it looks like [your site] has less good content. So we might tend to not crawl quite as much from that site.”
…”Typically, duplicate content is not the largest factor on how many pages will be crawled, but it can be a factor.”
In other words, duplicate content on the same URL can result in Google not crawling as many pages from your site. According to Matt, you have a certain “crawl budget” – and allotment of pages they are willing to crawl within your domain. Having the same content on multiple pages of your website means Google will likely crawl less pages of your site.

2) What About Ecommerce Product Pages that are Almost Identical?
Matt says the canonical tag is one answer. “There are a couple of things to remember here. If you can reduce your duplicate content using site architecture, that’s preferable. The pages you combine don’t have to be complete duplicates, but they really should be conceptual duplicates of the same product, or things that are closely related. People can now do cross-domain rel=canonical, which we announced last December.”

So I hope these thoughts and statements will clarify and help with some issues. I also hope PS team will take some issues into account to make a better linking structure in the core files and maybe further improve the canonical module. Although I have seen some improvements in the newest version espacially with international SEO there is still room for modifications.
Greetz, trip

Link to comment
Share on other sites


Matt again recommends (beside the techniques mentioned above) to use the canonical url tag.
Before I go on with some other "facts" I would like to invite to discuss the pros and cons about pointing to the product page with the category in the url or not?



What i've got from the seo articles i have read so far this issue is totally main url dependant.
It should fully cover your product/service in the shortest form. Single words when possible.
Good indication of such url is if you can read it in a normal sentence.

Good example:
Prestashop-free.com/modules/ogone-checkout.html
Prestashop-free.com/themes/greenhouse01.html




Bad example:
(if you decide to offer premium content on such domain)

Prestashop-free.com/free-modules/ogone-checkout.html
Prestashop-free.com/premium-modules/ogone-checkout.html



The issue iam having with prestashop, is that it removes category from product page when you have subcategories:

Prestashop-free.com/
Category ->        /modules/
Subcategory ->             /payment/ogone-checkout.thml
                          /seo/etc.html



Product page would result in:
Prestashop-free.com/ogone-checkout.html

In this example it isn’t that bad, because you are partially covered by your main url. (prestashop, free, ogone checkout).
Imagine a site with some random url: ‘prestaproject.com’. In this case removing category is not recommended.

Link to comment
Share on other sites

  • 3 weeks later...
I personally rewrote the link generation of ps that all links are leading to the product page without category in the url.


I'm trying this too, but without any efforts. When I use the navigation top-bar than the breadcrumb is formed like I navigated it, when I click by new products, search or last viewed a product the link is also with categories (all or only last). I'm getting still crazy with this.

How can I central get:

home > productname without any categories ?

The categories-links should be stay in breadcrumb as navigated, only the product should be home > product name.

Can you please help me ?
Link to comment
Share on other sites

Hi cd2500,
I made this some months ago so I am not shure if I am missing something. There might be smarter ways but I changed the getProductLink() function in classes/Link.php

   public function getProductLink($id_product, $alias = NULL, $category = NULL, $ean13 = NULL)
   {
       global $cookie;
       $lang_link = "";
       if (!isset($this->allow)) $this->allow = 0;
       if ($this->allow && $cookie->id_lang != Configuration::get('PS_LANG_DEFAULT'))
           $lang_link = "lang-".Language::getIsoById($cookie->id_lang)."/";
       if (is_object($id_product))
           return ($this->allow == 1)?(_PS_BASE_URL_.__PS_BASE_URI__.$lang_link.(($id_product->category != 'home' AND !empty($id_product->category)) ? $id_product->category.'/' : '').intval($id_product->id).'-'.$id_product->link_rewrite.($id_product->ean13 ? '-'.$id_product->ean13 : '').'.html') :
           (_PS_BASE_URL_.__PS_BASE_URI__.'product.php?id_product='.intval($id_product->id));
       elseif ($alias)
           //return ($this->allow == 1)?(_PS_BASE_URL_.__PS_BASE_URI__.$lang_link.(($category AND $category != 'home') ? ($category.'/') : '').intval($id_product).'-'.$alias.($ean13 ? '-'.$ean13 : '').'.html') :
           //(_PS_BASE_URL_.__PS_BASE_URI__.'product.php?id_product='.intval($id_product));
           return ($this->allow == 1)?(_PS_BASE_URL_.__PS_BASE_URI__.$lang_link.(($category AND $category != 'home') ? ('') : '').intval($id_product).'-'.$alias.($ean13 ? '-'.$ean13 : '').'.html') :
           (_PS_BASE_URL_.__PS_BASE_URI__.'product.php?id_product='.intval($id_product));

       else
           return _PS_BASE_URL_.__PS_BASE_URI__.'product.php?id_product='.intval($id_product);
   }



In my case I had no problems so far but I would be carefull espacially for production environment. I have PS 1.2.5 but as far as I know the Link.php did not change. Also you maybe have to change your sitemap generation if you use one but to be honest what I read the last days a working canonical module would the best practice as playing with the ps core is a pain when it comes to update and you have to remember all your modifications and transplant and test them in the new version.
So use it at your own risk, trip

Link to comment
Share on other sites

Hi cd2500,
I made this some months ago so I am not shure if I am missing something. There might be smarter ways but I changed the getProductLink() function in classes/Link.php

   public function getProductLink($id_product, $alias = NULL, $category = NULL, $ean13 = NULL)
   {
       global $cookie;
       $lang_link = "";
       if (!isset($this->allow)) $this->allow = 0;
       if ($this->allow && $cookie->id_lang != Configuration::get('PS_LANG_DEFAULT'))
           $lang_link = "lang-".Language::getIsoById($cookie->id_lang)."/";
       if (is_object($id_product))
           return ($this->allow == 1)?(_PS_BASE_URL_.__PS_BASE_URI__.$lang_link.(($id_product->category != 'home' AND !empty($id_product->category)) ? $id_product->category.'/' : '').intval($id_product->id).'-'.$id_product->link_rewrite.($id_product->ean13 ? '-'.$id_product->ean13 : '').'.html') :
           (_PS_BASE_URL_.__PS_BASE_URI__.'product.php?id_product='.intval($id_product->id));
       elseif ($alias)
           //return ($this->allow == 1)?(_PS_BASE_URL_.__PS_BASE_URI__.$lang_link.(($category AND $category != 'home') ? ($category.'/') : '').intval($id_product).'-'.$alias.($ean13 ? '-'.$ean13 : '').'.html') :
           //(_PS_BASE_URL_.__PS_BASE_URI__.'product.php?id_product='.intval($id_product));
           return ($this->allow == 1)?(_PS_BASE_URL_.__PS_BASE_URI__.$lang_link.(($category AND $category != 'home') ? ('') : '').intval($id_product).'-'.$alias.($ean13 ? '-'.$ean13 : '').'.html') :
           (_PS_BASE_URL_.__PS_BASE_URI__.'product.php?id_product='.intval($id_product));

       else
           return _PS_BASE_URL_.__PS_BASE_URI__.'product.php?id_product='.intval($id_product);
   }



In my case I had no problems so far but I would be carefull espacially for production environment. I have PS 1.2.5 but as far as I know the Link.php did not change. Also you maybe have to change your sitemap generation if you use one but to be honest what I read the last days a working canonical module would the best practice as playing with the ps core is a pain when it comes to update and you have to remember all your modifications and transplant and test them in the new version.
So use it at your own risk, trip


classes/link.php from 1.3.1.1 is complete different :-( and categories I have already commented out there. But this is not the solution to my problem.
Link to comment
Share on other sites

Dublicated content doent hurt your ranking, the only thing about it is that google desides what page of the dublicated pages to show :)

I read this somewhere on google webmaster help section.

This is controverse. If you are canalizing links than you will split PR with other same links, so it hurts PR in this case, but not PR in general.
Link to comment
Share on other sites

  • 2 weeks later...
Dublicated content doent hurt your ranking, the only thing about it is that google desides what page of the dublicated pages to show :)

I read this somewhere on google webmaster help section.

This is controverse. If you are canalizing links than you will split PR with other same links, so it hurts PR in this case, but not PR in general.


Early'er this URL was posted somewhere on this forum. http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.html
It is a commend misunderstanding that DC hurts your ranking.

PR and Ranking are to diffenrent things, just because you have PR (Page Rank) it dosnt mean that you rank good on your keywords.
Link to comment
Share on other sites

  • 1 month later...

Is that duplicate content, if you advertise same product on AdWords? Or should i use nofollow on my website for those products that i'm advertising? Or should i entirely remove that link or what would be the right way?

Thank you,
Housy

Link to comment
Share on other sites

  • 5 weeks later...

I am facing duplication URL problem.

When I 301 Redirect any Friendly URL, It goes wrong as shows in following link:

URL I want redirecting to:
http://www.shoponbrowse.com/pakistan/nokia-mobiles/104-nokia-c3.html

URL which redirected to:
http://www.shoponbrowse.com/pakistan/nokia-mobiles/104-nokia-c3.html?id_product=104

It put extra characters “id_product=104”.

So because of that Google indexes duplicate URL.

How to fix it?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...