Famous enough to be Google Spammed

Is it now a measure of fame that our content is getting Google spammed?

Google spam is one of the ways sleazy operators game Google -- they copy text from well-regarded websites and then extensively cross-link to each other. This apparently defeats Google's algorithms. Because the spam sites are linked to each other, there are enough incoming links ("page rank") and because the text is actually relevant, Google ranks the site high. Of course, it's a bait-and-switch. That copied text is only for the Google spider: when you and I click on the link, we'll get a page touting the kinds of things spammers tout.

Anyway, I was searching for a forum post from a while back, and since Google's search is better than the forum's search, I did what I usually do: typed the search into Google with "wdssii" preprended to it.

The first six results were all relevant: they all pointed to websites at Unidata or at NSSL talking about sparse grids, which is what I was searching for.  But starting at result #7, the Google spam took over.

The seventh site is Google spam. The quoted text is relevant:  it is copied text from somebody who built some sort of utility to process our data feed, but unfortunately, the link leads to a spammer.

The next result is good -- it's a citation of our work in a conference paper.

The ninth link is again Google spam. This time, the copied text is again of someone else who built some other utility. What does it say about Google's page rank algorithm that the original website of these people who did reasonable work based on WDSS-II is not listed, but the spammers who copied their text are shown as relevant?

The tenth link is again Google spam. In this case, the text is from a paper where I'm coauthor. Our paper is not listed, but the spammer manages to become relevant by copying our text!

No comments:

Post a Comment