As much as half of your site traffic may come from bots, and some are up to no good. Here are some ways to stop them.

Content scraping. Data theft. Comment spam. Every e-commerce site is under attack from criminals trying to steal pricing, customer identities—even your link juice, by posting links or comments on your site to take advantage of your domain authority. Automation makes the lives of honest citizens more convenient, but it also helps the bad guys. Hackers employ massive armies of hijacked computers, called zombies, to attack your site with illicit code, called bots—often from far across the globe. If you run an e-commerce site, it likely costs you a lot of money.

Typically intended to perform simple and repetitive tasks, bots are scripts that enable their owner to do things quickly and at scale. For example, search engines like Google, Bing and Baidu use crawler bots to collect information from hundreds of millions of domains, and then index it to provide answers to our queries. Online stores favor these bots because they bring prominence to their sites, boosting them in search result pages. Other bots do good things, like those that routinely check website speed or report site availability problems.

But criminals use bots, too. They understand that bots can help them perform tedious and repetitive tasks at scale. Rather than trying to hack into one site at a time, or steal pricing and product information one page at a time, bots can do it all for them.

If you run an online store, how can you protect yourself?

There are ways to prevent online perpetrators from ripping you off. The first step is in understanding what the bad guys are up to.

advertisement

It’s important to understand bot visits as a percentage of overall traffic visiting your store. Here, I’ve got a dirty little secret to reveal. That number is almost 50 percent of all traffic, according to Imperva research. This might come as a big surprise to a lot of e-commerce managers who have the equivalent of a parking lot they thought was full, but find only half the shoppers at their store. What’s worse, half again of those shoppers are there to do harm to your business. We call these bad bots, and they fall into three broad categories: security probers, scrapers and spammers.

As the name suggests, the first type is there to probe your website for vulnerabilities. In particular, they’re performing actions like SQL injection attacks, which send bogus query commands masked in HTTP page requests. A common example pries into the user database, or customer database of an e-commerce site, and tries to retrieve user names, passwords, account names, and even credit card numbers.

SQL Injection Example [from https://www.incapsula.com/web-application-security/sql-injection.html]

An attacker wishing to execute SQL injection manipulates a standard SQL query to exploit non-validated input vulnerabilities in a database. There are many ways that this attack vector can be executed, several of which will be shown here to provide you with a general idea about how SQLI works.

advertisement

For example, the above-mentioned input, which pulls information for a specific product, can be altered to read http://www.estore.com/items/items.asp?itemid=999 or 1=1.

As a result, the corresponding SQL query looks like this:

SELECT ItemName, ItemDescription

FROM Items

advertisement

WHERE ItemNumber = 999 OR 1=1

And since the statement 1 = 1 is always true, the query returns all of the product names and descriptions in the database, even those that you may not be eligible to access.

Attackers are also able to take advantage of incorrectly filtered characters to alter SQL commands, including using a semicolon to separate two fields.

For example, this input http://www.estore.com/items/iteams.asp?itemid=999; DROP TABLE Users would generate the following SQL query:

advertisement

SELECT ItemName, ItemDescription

FROM Items

WHERE ItemNumber = 999; DROP TABLE USERS

As a result, the entire user database could be deleted.

advertisement

Another way SQL queries can be manipulated is a UNION SELECT statement. This combines two unrelated SELECT queries to retrieve data from different database tables.

For example, the input http://www.estore.com/items/items.asp?itemid=999 UNION SELECT user-name, password FROM USERS produces the following SQL query:

SELECT ItemName, ItemDescription

FROM Items

advertisement

WHERE ItemID = ‘999’ UNION SELECT Username, Password FROM Users;

Using the UNION SELECT statement, this query combines the request for item 999’s name and description with another that pulls names and passwords for every user in the database.

Next, the scraping bots look to steal your proprietary—and valuable—site content. These bots are trying to figure out what you’re selling and provide a better price at their store. They’re typically run by competitors and fall into a legal gray area. However, while the technique is ubiquitous, it’s not clearly legal. In fact a number of laws may apply to unofficial scraping, which include contract, copyright and trespass to chattels laws.

They might also be trying to steal your content outright, providing a shortcut in setting up their stores.

advertisement

Making up the last category, spamming bots are more of a nuisance. If you have an area on your site for users to leave comments or reviews, you might find it inundated with spammers inserting links to their sites. This both reduces the quality of your store experience and presents a lot more work for your team to clean up. Here is an example of comment spam inserted by the notorious spamming bot Semalt:

How can you stop these unwanted activities?

Knowing your traffic is a good place to start. Understanding the relative percentage of humans versus bots will provide a true picture of who’s visiting your site. In fact, you might discover your conversion rates increase if you only count human activity, versus bot visitors who never convert.

You could try using robots.txt to shield your site from bad bots, but it isn’t effective in stopping bots. But bad bots don’t adhere to rules, and they’ll inevitably ignore any commands they receive. In addition, some bad bots are coded to look inside robots.txt for hidden gems (e.g., private folders, admin pages) the site owner wants to hide from Google’s index.

advertisement

A better option might be the open source project ModSecurity, a web application firewall module that protects websites from attacks. It works in two ways: You can set up a whitelist that only allows certain bots on your site, or you can use a blacklist that blocks known bad bots. Using ModSecurity is free, but requires a web developer with reasonable skills. That person also has to maintain the list, which can be time-consuming. The biggest issue, however, pertains to false positives—where a legitimate bot or user is mistakenly flagged.

Ultimately, if bots persist, or false positives are high, you may want to consider investing in a commercial bot-blocking product. These come in two forms—an appliance you install in your data center or a cloud service that inspects traffic coming to your site. Some e-commerce hosters, like the Australian company Neto, also offer bot protection as part of their platform.

Friend or foe?

Bad bots are not going away and are projected to increase in activity. With the success criminals and competitors have had, we can only expect more and more bad bots probing, scraping, and spamming our sites.

advertisement

If you’re running an e-commerce site, find out who’s visiting your site and benchmark what kind of experience your visitors are having. Unwanted bot traffic in the form of a DDoS attack can slow down or crash your website. Bots can both benefit and harm your site. Make sure your site brings in the good bots and human visitors.

Imperva is a provider of cybersecurity technology and services.

Favorite