Hey WordPress Community, if you’re like me and run a series of what are called “self-hosted” WordPress blogs, then it’s a sure bet you’re dealing with visits from all kinds of “bots.”
Now, I’m way not the all high and mighty expert on this by any stretch of the imagination, and there are many in the WordPress Community who are, but I do like to share my experiences and connect and learn so, here we go!
Now, if you have a free WordPress site, and don’t care about backend matters, then you’re probably not aware of bots, or think the term refers to something in the way of what journalists thank are rogue Twitter accounts. (Constantly mistaken for bots, even when real people are operating them!)
But, to get this out of the way, the basic definition of a “bot” in our context is as provided by the Google Dictionary:
“an autonomous program on a network (especially the Internet) that can interact with computer systems or users, especially one designed to respond or behave like a player in an adventure game.”
There Are A Lot Of WordPress Bad Bot Lists
But, in the narrow context of WordPress, we’re not worried about “a player in an adventure game” – we’re worried about any bot that may slow down our site or take it down entirely. Those are called “bad bots.” There’s are many lists of “bad bots” – take this one I found at “Blocking robots on your web page – the list of 1800 bad bots.”
Then, there are bad bots that have been presented to us in popular culture, and I’d bet you’d never know what they were unless you did the sort of thing we do as WordPress blog developers and managers, which is spend much of our days and weeks looking out for invasions by bots we don’t want.
Take this scene from The Social Network, the 2010 movie about how Facebook CEO Mark Zuckerberg came to start the website that’s become a World of its own, and the legal problems that surfaced as a result. But I’m not focusing on the Winklevi here (short-hand for the Winklevoss twins), so let’s continue.
Let’s take a look at a scene in The Social Network that’s one hell of a capture of college web programming life that everyone in the WordPress community should see, over and over again:
That scene is cool on so many levels, but I’ll pick two:
First, it accurately captures how students in colleges like Harvard and (my experience) Berkeley talk: fast. I have to be honest: I went to undergraduate school at the University of Texas at Arlington (I’m not from Texas, and just picked the place out of an interest in urban planning), and while there were scores of smart people there, Berkeley, where I attended grad school, is on another planet.
The one thing you get used to is having light-speed conversations about everything. Ideas fly! And the speed of that way of talking comes to define you’re life so much that you’re not aware of it until three years after you’re away from it.
The Social Network and Jesse Eisenberg, the actor who got a well-deserved Oscar nomination for his role as Mark Zuckerberg, got it right. So, the next step is what you need to do…
Second, the scene tells you in a very basic way what he did to recreate the basic content needed to make something called “Face – Smash,” but considering what he did to get that, you should really focus here. The reason is the reference to that “WGet magic” – folks, Wget is a kind of scraping bot. So, unless you want to unwittingly help someone create the next Face Smash using your site, you should block it.
In fact, you should block any hotlinking or scraping of your site. Build that fortress of solitude so you and your friends and collegues can vlog and blog with freedom!
Block WGet But Yandex Bots Are Not As Bad As You Think
A lot of people scream about something called the Yandex Bots, and for good reason: that damn thing can beat the hell out of your site and weigh it down – in other words, slow down the servers, your servers, and thus your site, and pull down your search engine rankings in the process. But that doesn’t mean the Yandex Bot is all bad, it just means you’re popular in Russia for the search engine over there called Yandex.
Plus, Yandex is helpful at showing you how to limit the beat-down its bots can exact on your server!
Personally, I’m all for having my site viewed in Russia, China, Africa, the Moon, Mars, and even Alpha Centauri! So, if you’re like me, and want to be universally popular, but don’t want Yandex slowing down your site, do one of three things: either slow its craw rate down or block the really bad bots that don’t read your content for search engines, or both.
I vote for both.
Not All Bots Are Bad Just The Ones You Don’t Want
In closing (because I could write more about this and not have any more time in the day) figuring out what bot to allow depends on how you want your site to behave and be seen – and how much bandwidth you can stand.
Don’t be afraid to save your server and thus your blog from being bombarded by a huge list of bats. I was half-joking when I blogged about being popular; don’t be stupid and harm your site’s performance.