Search Engines 101

I think that the biggest "generation gap" thing that I've noticed between myself and my students is that I am old enough to remember a world without google. This is not as big as remembering a world without the web, but there are still some important implications of google that change how people do research

(It might be best for you to read the next paragraph aloud in your best "cranky old man" voice)

Back in my day, you see, search engines were bad! Nowadays you can just type words into google all willy-nilly and 9 times out of 10 you get exactly what you're looking for. Back in my day, searching the web took finesse. Queries had to be crafted. You needed to be able to know how a search engine worked to be able to get anything meaningful out of it. The kids these days, they wouldn't know a boolean operator from a hole in the ground.

I don't think people realize just how good google is. It'd be like GM making a car that you just step in and say "take me to Bob's house" and you end up there, without even having to tell the car whether you wanted to see Bob Groven or Bob Ciborowski. You don't need to know about street names or steering wheels or braking distances. The thing about google, though, is that if you do take the time to learn just a little bit about it, your results get even better

Alright, everyone stand back, I'm going to attempt a metaphor. Imagine a fisherman. He catches fish for a living. His daily routine consists of three basic activities. He picks a lake to fish at; let's say he's got his own private lake that he stocks himself. When he's there he casts out a net and pulls in fish. From each cast, he's got to sort the results. He'll save the biggest, tastiest fish for his family, the next-biggest-and-tastiest he'll sell to fancy restaurants, the pretty-good fish he'll bring to the market, and the smallest fish he'll throw back in so that they can one day grow up to become big and tasty.

These three activities roughly corresponds to what Google does. Google stocks the lake, Google casts the net, and google sorts the fish. The fact that people don't even think of these three things separately anymore is a testament to just how good google is.

Stocking the lake, as far as google is concerned, is a process of Crawling and Indexing. Google has a computer program called Googlebot (no kidding) that runs 24/7 "crawling the web," grabbing web pages from the internet and keeping track of what words occur in those pages to build an "index". If you search for "Tilapia" on google, you will find all the page that the Googlebot has visited and found the word "Tilapia." If I write a new web page about Tilapia, but Googlebot has not found it yet, it won't be in the search results. You can't catch it, because it's not in the lake.

Casting the net is the first thing google does when you do a search. If you type [Tilapia blackened recipe] into google, google will find all the pages in it's index that contain the words "Tilapia," "blackened," and "recipe." If you type in ["Blackened Tilapia Recipe"] google will find all the pages that contain the phrase "Blackened Tilapia Recipe" together. The phrase is different from the separate words. This is the first big distinction in google searches that some people don't know about.

Remember, google can only see the words on a page, not the ideas. If I write an article about Oreochromis niloticus niloticus and you search for "Tilapia," you won't find it. Same fish, different words. You'll need to use a different net to catch my article.

The last thing google does for you when you click "search" is sort the fish, and google does it very well. The reason that google has become so successful, I'd argue, is that it was the first web search engine to do a good job of separating the tasty fish from the mediocre fish (or even of separating the fish from the old hubcaps). I once remember being taught by a middle-school librarian of how to narrow down a search result to only the 100 most important pages, because paging through a pile of fish bigger that that was just unwieldy. Google has made this process totally irrelevant for most people. If it's not in the first 3 pages of results, it might as well not exist.

So how does google sort the fish? Since this is their secret sauce that has made them billions, they don't just share around the exact recipe. But we do know that it considers how recent a page is, What and how many pages link to that page, and where your search terms appear in the article, among other factors. These factors may or may not correlate to the articles that are the most useful to debaters, though, so knowing how to massage the results can really save you some time cutting cards.

So far I've focused only on google, but the point of this article is that every search engine does these three things: stocking the lake, casting the net, sorting the fish. The important thing, though, is that they all do them differently. Lexis' lake has totally different fish swimming in it than google's, for instance. Knowing the difference between how JSTOR builds its nets and how Lexis does is pretty critical if you want to maximise your research potential.

This is getting pretty long, so I think I'll leave the introduction at that. Hopefully I'll have some more time to play with this metaphor so I can teach (and learn) more about the search engines we're always wasting time on.

No comments: