Scraping is a technique that copies content from other websites and aims to rank them in the search engines, possibly even higher than the original content. The author of such a site copies indiscriminately copious chunks (sometimes the whole thing) of text from an original source, and places them onto his or her own page. In doing so, they trust the rank of the original page will extend its popularity or efficiency towards their own. Very often they succeed. The method is an utter fraud, because it bypasses issues to do with authority and originality. It is not seldom that we see scraped sites taking over the original site in the ranking systems of search engines. That’s usually due to the fact that the copy-cat has more presence on the web than the original.
Webmaster should not hesitate to submit a copyright claim if they believe a content scraper is outranking them – many copyright claims for a website can lower the ranking in the SERP’s
VIsit http://support.google.com/ to submit a copyright claim
Tip: There are a few online service that can detect plagiarism, but if you want to quickly find content theft you should copy key sentences with unique keywords from your text and paste them into Google – it’s possible some scrapers are outranking you already
A disreputable practice
Some users get quite pissed off when they encounter scraped sites, even when those sites include references to the original source. AJ Kohn calls them “garbage” and “content pollution,” and thinks scraped sites are “the arterial plaque of the internet.” Google has put in place a form which invites users to report scraper pages whenever they encounter them. However, this is far from solving the problem, since the number of those who practice site scraping far surpasses the number of those who are willing to report on them. As a rule, internet users seem to be disinterested in the issue, unless they are themselves affected.
What can be done
So far, scraping itself cannot be combated. But what can be done is fight back. One way of dealing with the issue of originality is to let the internet know who is the initial author of a particular set of data. To do so, the owner of the webpage should resolve to updating their site as often as possible. This means new content posted regularly, which will increase their visibility and will outrun the scraper.
It’s well known that Google’s page rank system is the target of many users. And it is also known that Google takes into consideration not only the number of visitors but also the frequency of posting. So the more often relevant content is posted on the website, the more likely it is to turn up positively in the search lists.
How to make scrapers look stupid
Fighting back is definitely worth trying, and in order to do so you don’t have to start a crusade or hire contract killers. You can take action by making the scraper look like what they really are: frauds. This can be achieved by transferring some of your content to another page and linking your prime content to it via a ‘read more’ link. The instruction ‘read more,’ followed by your URL, will make it necessary for users to click on the link provided. Without the link, the information can’t be accessed.
So if the user is a scraper, he or she will end up only with a truncated version of the content, which will indicate to their visitors that something’s wrong. The same goes if the scrapers acquire the content of your website from RSS feeds. By providing a ‘read more’ link after a cropped portion of text, you give the scraper only a slice of what you’ve produced, while the whole cake is only in your hands.