Research and Decision Science/Data glossary

From Meta, a Wikimedia project coordination wiki

This page collects definitions of essential and core metrics that teams in the Wikimedia Foundation use to guide tactical and strategic decisions. The sibling Data Dictionary page documents various data sources.

Definitions for core metrics and essential metrics[edit]

Core metrics[edit]

Core metrics (also called "Core Annual Plan metrics") are a small, well-defined set of measurements that we use to guide strategic decisions. These are the metrics the Foundation highlights in its annual plan. For FY23-24, the four core metric areas are "Content", "Contributors", "Effectiveness", and "Relevance". Click here for an overview of the core metrics as they relate to the Foundation's FY23-24 Annual Plan. Core metrics are a special subset of essential metrics.

Essential metrics[edit]

Essential metrics are metrics maintained by the Wikimedia Foundation in production which meet the following requirements:

  1. The metric and its measurement is trustable;
  2. The metric is accessible to decision makers;
  3. The metric and its measurement is transparent towards those affected by the decisions made informed by the metric and measurement;
  4. Frequent measurement of the metric is required for decisions of significant importance. This includes:
    • decision making in situations where erroneous decisions can have department, Foundation, or Movement-level impact;
    • decision making about WMF's strategic direction;
    • monitoring of important operations.
  5. The definition and measuring, storage, use, and potential publishing of the metric and its measurement must meet relevant WMF policies and guidelines. More specifically: WMF's Privacy Policy, Data Retention Guidelines, Data Publication Guidelines, and Human Rights Policy.

Please refer to the FAQ and more details to learn more about essential metrics.

Content metrics[edit]

(Note: This section pertains to content metrics that are part of monthly key metrics reporting. These are different from the core metrics content metric.)

The code that calculates these metrics can be found in the movement-metrics repository on Github.

Content
metric definition remarks
Net new content The number of content pages added since the previous month, excluding deleted pages and redirects. Produced by subtracting last month's total content pages metric from this month's.
—Wikipedia articles The number of Wikipedia articles added since the previous month, excluding deleted articles and redirects. A subset of net new content. Produced by subtracting last month's total Wikipedia articles metric from this month's.
—Commons files The number of Wikimedia Commons content pages added since the previous month, excluding deleted pages and redirects. A subset of net new content. Produced by subtracting last month's total media files metric from this month's.
—Wikidata entities The number of Wikidata entities added since the previous month, excluding deleted ones and redirects.

An entity is anything represented by a page in a content namespace, namely an item, property, lexeme, or entity schema.

A subset of net new content. Produced by subtracting last month's total Wikidata entities metric from this month's.
Mobile edits The number of edits tagged as having been made using the mobile website (which can be used by desktop computers) or the mobile apps.
Content Gap Is a metric that quantifies knowledge gaps in Content by estimating the distribution of pieces of content (e.g., Wikipedia articles, Wikidata items) across different categories (e.g., gender, geographic distribution, cultural background). A complete list of mappings for content can be found here Knowledge gaps are major differences in participation or coverage of a specific group of readers, contributors,or content.

Contributor metrics[edit]

The code that calculates these metrics can be found in the movement-metrics repository on Github.

Editors
metric definition remarks
Active editors The number of registered users who made at least 5 content edits across all projects in the given month.

See meta:Research:Active editor for the full definition.

Unlike Wikistats, this metric includes edits to pages which were later deleted.
—New Active editors who registered during the given month.
—Returning Active editors who registered before the given month.
New editor retention Out of the users who registered in the month before the previous and made at least one edit in their first 30 days, the proportion who also edited during their second 30 days. This includes all edits (including edits to content, talk, and other namespaces) whether or not the edits have been reverted. Unlike Wikistats, this metric includes edits to pages which were later deleted.

Reader metrics[edit]

The code that calculates these metrics can be found in the movement-metrics repository on Github.

Readers
Metric definition remarks
Content Interactions Pageviews (all platforms) + desktop previews (see definitions below).
—Pageviews Full definition: m:R:Page view

Monthly pageviews based on calendar month

Corrected for spurious IE views from some countries

Data source: pageview_hourly

The current calculation includes user agents and automated agents

We use the User Agent to define whether we assign a mobile or desktop experience; User Agents we consider to be mobile are captured in this code on Github. Most tablets are classified as mobile.

——Desktop Same limited to the desktop domains (e.g. en.wikipedia.org) (ditto)
——Mobile Web Same limited to the mobile web domains (e.g. en.m.wikipedia.org, does not include apps) (ditto)
—Desktop previews Seen page previews, defined as previews popups that remain visible for at least one second. Data source: virtualpageview_hourly
Unique Devices Full definition: m:R:Unique Devices

Monthly unique devices for all Wikipedias (includes desktop and mobile web), based on calendar month.

Data source: unique_devices_per_project_family_monthly

Diversity[edit]

A set of contributors, content, or readers metrics (defined above), restricted to:

Financial Metric: Programmatic Ratio[edit]

The percentage of the budget spent on program expenses, called the Programmatic Ratio, is an important financial metric that we track. Each year, we set a target for our programmatic expense ratio is to align with nonprofit sector best practices.

Independent charity assessment organizations like Charity Navigator help establish these best practices. Charity Navigator sets its benchmark for highest-scoring nonprofits as exceeding >70% programmatic expenses. Charity Navigator provides the following explanation and formula to calculate this ratio:

"Charities exist to provide programs and services. They fulfill the expectations of givers when they allocate a good portion of their budgets toward their stated missions. While administration expenses are necessary for efficient charity operations, organizations that grossly underspend on their programs and services will most likely not have as strong an impact on their charitable missions. We calculate the nonprofit's average program expense percentage over its three most recent fiscal years and then assign a numeric score based on an established scale.

Average Program Expense Percentage = Average Program Expenses ÷ Average Total Expenses (When Calculating Using Form 990) = Average of Part IX line 25B ÷ Average of Part IX line 25A"[1]