Google's web crawler linked to SQL Injection attempts

What do you do if a search engine's crawler is being used to attack your website?

In a recent blog post, Daniel Cid, CTO of Securi, a company that provides website security monitoring and related services, published details of a recent SQL Injection (SQLi) attempt. That in itself isn't anything major, SQL Injection attempts happen quite frequently, but the source of the attempt certainly raises some eyebrows – it was Googlebot.

This begs an important question. Assuming you have logged proof that a search engine's crawler, such as Googlebot, was being used to attempt SQL Injection attacks, do you block the bot (thus preventing your domain from being indexed) or do you allow the the malicious traffic?

"This is exactly what happened a few days ago on a client site; we began blocking Google’s IP addresses because of the structure of the requests which were in fact SQLi attacks. Yes, Google bots were actually attacking a website," Cid wrote. - - [05/Nov/2013:00:28:40 -0500] "GET /url.php?variable=")%20declare%20@q% 20varchar(8000(%20select%20@q%20=%200x527%20exec(@q)%20-- HTTP/1.1" 403 4439 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +"

A quick dig on the IP and a few other checks confirmed it was in fact Google's bot, and not a spoofed IP. This wasn't the only case either, there were several request signatures in the logs, all of them from Google.

Google's bot, as well as those used by other search engines (Bing, Yahoo, Yandex, etc.), process your website in chunks. To these crawlers, it's all a big blob of text and metadata. As proven by the example above, the bots will follow links as directed, and they have no way of knowing if the link is malformed or if it is legitimate.

If they were able to tell the difference, then technically the bots would be sanitizing user input, which is out of scope for what they are designed to do, and augmenting their code to perform such checks would be impractical and nearly impossible. But how did this happen?

"Let’s assume we have an attacker, his name is John," Cid wrote, explaining one likely scenario for this type of attack.

John scans the Web and discovers enough data passively to create a list of possible weaknesses, including SQLi or Remote File Include vulnerabilities on Site B. From there, John goes to his site, Site A, and gets to work crafting his attack.

"...he adds all this awesome content about kittens and cupcakes, but in the process he adds a number of what appear to be benign links that are unsuspecting to the user reading, but very effective to the bot crawling the site. Those links are riddled with RFI and SQLi attacks that allow John to plead ignorance, also allowing him to stay two arms lengths away from Site B," Cid explained.

Again, blocking search engines directly is usually a bad idea. The only time it makes sense is when using robots.txt to deny access to various parts of the website. So if this type of issue was to occur on your domain today, what would you do?

Share your thoughts. Should the search engines do something to prevent this type of issue, or is this an issue for the webmaster to address?

Does your organization monitor search engine traffic when it comes to your WebAppSec defenses? Do you whitelist or ignore search engine traffic outright? If so, how would you deal with this type of attack vector?

I've reached out to Google and Securi for more information. I'll update if there's anything more to add.

Update 1:

In an email, Daniel Cid said that he hasn't seen any new examples of this issue taking place, but shortly after his blog was published, someone forwarded logs showing Google's bot being used to attempt Remote File Include attacks.

"We can't really blame Google, since they are just doing its job and crawling the sites," he said.

"The real problem is on the vulnerable sites without any form of protection. And the more I look at it, I think that attackers are doing it to try to bypass WAF's that allow Google IP addresses without any type of inspection (yes, they put Google IPs on their white lists).

"Overall, I don't blame Google for it and don't consider their fault. If the site is vulnerable, the bad guys would find another way to attack to exploit it, even if Google were blocking those SQL injection queries on their bots. And administrators have to be smart not to fully white list Google (or any search engine) IP address range."

To comment on this article and other CSO content, visit our Facebook page or our Twitter stream.
How much is a data breach going to cost you?