Monday, May 05, 2008

How Wikipedia stacked up against subscription databases

I finished up my quick comparison of Wikipedia, Encyclopedia Britannica, Gale Virtual Reference Library, and Oxford Reference that I blogged about earlier today.

My Plan
Do quick look ups of nineteen terms and concepts discussed in Clay Shirky's book Here Comes Everybody to see what reference sources would be more helpful to the students I work with.

Using quotation marks around search terms to force phrase searches, I looked in the following resources:
  • Wikipedia
  • Encyclopedia Britannica
  • Gale Virtual Reference Library
  • Oxford Reference
In any given set of search results, I would look first for main entries that mirrored my search terms exactly and record any such precise hits in a table. If there were no exact hits, then I looked for any entries in which most of my search terms were in the main entry (such as an entry on "social network services" when I searched for "social networks"). If none of my search words were in the main entry, then I looked for entries in which the search words appeared in the body of the entry and were adequately defined and explained (as opposed to simply cited or referenced in an offhand way).

If a particular resource had multiple main entries on the topic (as was the case in Gale Virtual Reference Library and Oxford Reference), then I made a note in my table that there other entries as well.

  • Shirky's book is all about the ways that the latest technologies (especially on the web and on cell phones) have allowed people to organize in new ways that threaten the centrality of long-standing institutions and organizations. Wikipedia's entries can be created at any time by anyone, making it far more likely that this source will have entries on the latest tech developments. The other three sources all are reproducing content that was first published in book form, which means that there is a long lag time between the writing of the content and its appearance online, thus making it less likely that the technology topics will be as up to date as Wikipedia's.
  • The subscription from Baruch College to Gale Virtual Reference Library includes nearly 1100 reference sources originally published as printed editions. Each library's subscription to this database is likely to have a different set of sources in it.
  • The subscription to Oxford Reference I used has close to 270 sources in it. Most of the sources are subject dictionaries whose entries tend to be much briefer than the entries I was finding in Wikipedia and the other two databases
  • Although I refrained from doing any fancy searching (no fielded searches or adjacency operators), I did use quotation marks around my terms, something that many students are unlikely to bother doing. Students who don't do phrase searching are more likely to see that mishmash of results returned as more off-putting and believe they're "not finding anything good."
  • I didn't analyze the quality of results in any deep way. I just wanted to see if there was at least an adequate overview or definition of the topic in each source.
  • There's no presentation of what the actual search results looked up. Some sources gave many false hits that would have frustrated most searchers (e.g., "long tail" turned up pages of entries in the Encyclopedia Britannica that were on individual animals that happened to have long tails, which is not quite the same concept that Shirky was referring to).
As you can see from the updated table of results, Wikipedia and Gale Virtual Reference Library both do pretty well and Encyclopedia Britannica fared the worst. With the Wikipedia entries, you have the added bonus of extensive linking to related Wikipedia entries, a reference list, and a set of links to external web sites. Gale Virtual Reference Library entries featured reference lists most of the time and some linking to other entries in the database.

While most of the topics were covered in all four sources, I found it interesting which sources had a main entry on the topic vs. which ones covered the topic in some other entry. For all but two topics, Wikipedia had a separate entry on each topic. This is not surprising, given that in Wikipedia, which is born digital, there is all the space in the world for yet another page, while the other three sources are all born in print, where each additional page added means greater printing costs.

For Further Research
My colleague Jerry Bornstein reminded me today that we once had discussed doing an analysis of the content of entries in Wikipedia and Encyclopedia Britannica. Some day, I hope do that but also include the other two databases I looked in this project.


At 5:55 AM , Blogger iblee said...

Good stuff!

At 6:01 AM , Blogger iblee said...

Wow. Good stuff. Thanks doing something cool. In another vein, this would be a neat test to have students do and compare how they find stuff vs. how an experienced searcher finds stuff.

-sorry for the double comment -thought I lost this.


Post a Comment

Links to this post:

Create a Link

<< Home