MemeStreams | MemeStreams Discussion

Create an Account

This page contains all of the posts and discussion on MemeStreams referencing the following web page: Scraping and ad-stripping Google's results. You can find discussions on MemeStreams as you surf the web, even if you aren't a MemeStreams member, using the Threads Bookmarklet.

Scraping and ad-stripping Google's results
by Decius at 1:14 pm EST, Jan 11, 2005

] This step that we have taken has implications for all
] search engines. These engines crawl the public web
] without asking permission, and cache and reproduce the
] content without asking permission, and then use this
] information as a carrier for ads that generate private
] profit. We are convinced that if citizens scrape Google
] and strip the ads, and make the scraped results available
] as a nonprofit public service, that this is legal. This
] is especially the case if there are public policy
] concerns behind the scraping.
]
] Google Watch has been the most prominent critic of
] Google's outrageous privacy policies for more than two
] years. This is why we started the proxy, and it's why we
] continue the proxy. We invite Google to serve us with a
] cease and desist letter as a first step toward resolving
] this issue. So far, we have yet to hear from Google's
] lawyers. By releasing the source code for our proxy,
] we're trying to escalate the issue.

Google should not save all 4 octets of your IP address. There is no "good" use of that data. Of course, MemeStreams does this too, so yes I'm being hyprocritical, but I didn't just make a fortune in a public offering either. They should store a SHA1 hash of the last 2 octets so they can identify unique visitors without storing identifying data. And the cookie could use some end user control.

RE: Scraping and ad-stripping Google's results
by noteworthy at 10:39 pm EST, Jan 11, 2005

Decius wrote:
] Google should not save all 4 octets of your IP address.
] They should store a SHA1 hash of the last 2 octets so they
] can identify unique visitors without storing identifying data.

Of course, a SHA1 hash of two octets is completely reversible by brute force, so this doesn't really offer any strong form of protection, does it? It seems to be little more than obfuscation.

If this really mattered to us we would onion route our Google queries or pass them through a Crowd. Of course, "Tor" is the newest Swiss Army knife that everyone is talking about but no one is using.

RE: Scraping and ad-stripping Google's results
by Decius at 11:32 pm EST, Jan 11, 2005

noteworthy wrote:
] Of course, a SHA1 hash of two octets is completely reversible
] by brute force, so this doesn't really offer any strong form
] of protection, does it? It seems to be little more than
] obfuscation.

Well, I suppose so. How do they tabulate unique visitors without saving full IP information?

RE: Scraping and ad-stripping Google's results
by noteworthy at 1:22 am EST, Jan 12, 2005

Decius wrote:
] How do they tabulate unique visitors
] without saving full IP information?

This was (at least in part) the purpose of the HTTP cookie, was it not? The first time you visit, I give you a cookie. When you come back, you show me the cookie. I pay no attention to, and keep no record of, your IP address. Problem solved, win-win solution.

Perversely, the advocates of "privacy" have effectively made the cookie crumble, leaving site operators with little alternative but to identify visitors (and rather quite imperfectly, at that) on the basis of IP address.

And so it comes full circle. Is the cure worse than the disease?

Scraping and ad-stripping Google's results
by Acidus at 10:26 am EST, Jan 11, 2005

While I have been uncomfortable with the Google Cookie and saving the full 4 octets of an IP since Decius pointed it out to me, this is not the best way to address it. Interesting to see the copyright implications if this escalates.