English Wikipedia Quality Survey

From Meta, a Wikimedia project coordination wiki

This Wikipedia Quality Survey was carried out on 20 October 2003 by me (Dr Adam Carr), following a discussion at Village Pump concerning the quality of articles at Wikipedia in relation to other encyclopaedias. I make no claim that it was particularly scientific, but I think it gives a general indication of the categories, length and standard of articles being posted at Wikipedia.

"Standard" is a subjective term, and I freely admit that I applied my own criteria about what was a "good" article. It is perhaps relevant to note here that I have contributed 109 articles to Wikipedia, that I have a PhD in history, and that I have worked as a journalist, subeditor and proofreader for more than 20 years, and have also worked (briefly) as a university academic.

To conduct the survey, I looked at 200 pages using the "Random Page" function of Wikipedia. I classified the articles into a number of categories, obtaining the following results:

Wikipedia pages by type

Category                                              Number              
----------------------------------------------------------------------------
"Bot" census data on American towns                    44                  
Lists of links or dates                                20                  
Stubs                                                   8                  
Copied from CIA handbook                                6                  
Copied from 1911 Britannica                             2                  
----------------------------------------------------------------------------
Total "non-articles"                                   80                  
----------------------------------------------------------------------------
Short articles (less than 1 screen)                    50                  
Medium articles (1-2 screens)                          44                  
Long articles (more than 2 screens)                    26                  
----------------------------------------------------------------------------
Total articles                                        120     
----------------------------------------------------------------------------

Thus of the 166,000 articles claimed to exist at the Wikipedia main page, only 60% (about 100,000) are actually articles written by Wikipedians. Only 35% (about 58,000) are more than one screen in length. That is not to say that the "short" articles are worthless - some of them are precise and well-written. But many are mere outlines of topics that need much fuller treatment. The lack of substantial articles is striking.

I then discarded the "non-articles" and looked at further pages until I had a total of 200 "articles" as defined above. I then organised the articles by subject matter:

Wikipedia articles by subject

Subject                                                  Number             
----------------------------------------------------------------------------
Natural sciences                                         17
(Astronomy 4, Biology 2, Botany 1, Chemistry 4, 
 Geology 2, Physics 4)
----------------------------------------------------------------------------
Geography                                                34
----------------------------------------------------------------------------
Applied sciences                                         31
(Aeronautics 6, Astronautics 2, 
 Computer science 12, Engineering 2, Mathematics 6, 
 Medicine 1, Railways 2)
----------------------------------------------------------------------------
Social sciences                                          18
(Education 1, Anthropology 2, Economics 6, 
 History 8, Sociology 1)
----------------------------------------------------------------------------
Culture                                                  66
(Art 1, Cooking 1, Film 2, Literature 12, Music 16, 
 Mythology 12, Popular culture 8, Religion 10, 
 Science fiction 2, Sport 2)
----------------------------------------------------------------------------
Biography                                                19
(Law 4, Politics 2, Royalty 4, Science 1, Sports 8
----------------------------------------------------------------------------
Military                                                 12
(descriptions of ships, aircraft etc)
----------------------------------------------------------------------------
Miscellaneous                                             3
----------------------------------------------------------------------------

Much the largest category was geography - articles descriptive of countries, mountains, cities, oceans etc. Religion and mythology taken together was the next largest, followed by music, popular culture and science fiction taken together ("pop culture" in the broad sense).

Categories that seemed to me to be lacking were the natural sciences, "high culture" (art, literature, classical music), architecture, medicine, zoology, palaeontology and botany, archeology, and biography relating to these fields.

As to the "quality" of writing, a matter of extremely subjective judgement, I formed the view that about two-thirds of the "medium" and "long" articles were competently written in a technical sense (grammar, spelling etc), though perhaps only half met a standard of "good writing." As a general rule, the longer the article, the more likely it was to be well written. About a quarter (mostly "short" articles) were poor or very poor. The poor writing was noticeably concentrated in the "popular culture" areas.


See also: Wikipedia:Wikipedia commentary/Wikipedia quality