Jump to content

Edit check/Tone Check

From mediawiki.org

This page holds the work the Editing Team is doing in collaboration with the Machine Learning Team to develop Tone Check (formerly Peacock Check).

Tone Check is an Edit Check that uses a language model to prompt people adding promotional, derogatory, or otherwise subjective language to consider "neutralizing" the tone of what they are writing

A notable aspect of this project: Tone Check is the first Edit Check that uses artificial intelligence. In this case, BERT language model to identify biased language within the new text people are attempting to publish to Wikipedia.

To participate in and follow this project's development, we recommend adding this page to your watchlist. 

Status

Image for: Status
[edit]

Currently being worked on

[edit]

Last update:

  1. Evaluating model performance in English, Spanish, French, Japanese, and Portuguese.
  2. Defining the aspects of Tone Check will be configurable on-wiki
  3. Instrumenting what information about Tone Check (and how people engage with it) will be logged.

Feedback opportunities

[edit]
  1. User experience: in what ways do you think the Tone Check user experience could be improved? See talk page for testing instructions.
  2. Model performance: in what cases does the responses the model offers differ from Wikipedia policies? Sign up here to help evaluate the model.
  3. Configurability: what aspects of Tone Check will be configurable on-wiki? | T393820
  4. Logging: what information will be logged (and made available to volunteers on-wiki) about when Tone Check becomes activated? | T395166, T395175

Planning

[edit]

An A/B test to evaluate the impact of Tone Check.

Please visit Edit check#Status to gain a more granular understanding of where the development stands.

Objectives

Image for: Objectives
[edit]

Tone Check is intended to simultaneously:

  1. Cause newer volunteers to add new information to Wikipedia's main namespace that is written in a neutral tone
  2. Reduce the effort and attention experienced volunteers need to allocate towards ensuring text in the main namespace is written in a neutral tone.


Design

Image for: Design
[edit]

User experience

[edit]

This section will contain a general description of the UX (e.g. what needs to be true for Tone Check to be shown, where is it presented, what choice does it invite people to make, what design principles have shaped the current approach, etc.), screenshots of the proposed user experience, and a link to the latest prototype with instructions for people to try.

Language selection

[edit]

This section to include the languages we're prioritizing for initial experiment, the languages we're planning to scale to next, and why we came to select these languages. See phab:T388471.

Model

[edit]

Tone Check leverages a Small Language Model (SLM) to detect the presence of promotional, derogatory, or otherwise subjective language. The SLM we are using is a BERT model, which is open source and presents its weights openly.

The model works by being fine-tuned on examples of Wikipedia revisions. It learns from instances where experienced editors have applied a specific template ("peacock") to flag tone violations, as well as instances where that template was removed. This process teaches the BERT model to identify patterns associated with appropriate and inappropriate tones based on Wikipedia's editorial standards. Under the hood, SLMs work by transforming text into high-dimensional vectors, which are then compared with the label, allowing the model to find a hyperplane that splits text into negative or positive cases.

The model was trained using 20,000 data points from 10 languages consisting of:

  • Positive examples: Revisions on Wikipedia that were marked with the "peacock" template, indicating a tone policy violation.
  • Negative examples: Revisions where the "peacock" template had been removed (signifying no policy violation).

Small Language Models (like the one being used for Tone Check) differ from Large Language Models (LLMs) in that the former are trained to adapt for particular use cases by learning from a focused dataset. In the case of Tone Check, this means the SLM learns directly from the expertise of experienced Wikipedia volunteers. Hence, they offer more explainability and flexibility compared to LLMs. Also SLMs requires significantly fewer computational resources than its larger counterparts.

LLMs on the other hand, are designed to work for general-purposes, with limited context and through a chat or prompting interface. LLMs require a huge amount of computation resources, and their behavior is difficult to explain, due the high amount of parameters involved.

Evaluating impact

Image for: Evaluating impact
[edit]

Model

[edit]

Before evaluating the impact of the overall Tone Check experience through a controlled experiment in production, the team conducted two reviews.

Outlined below is information about the purpose of each review and what we found.

Internal Review

[edit]

The first review we conducted was internal. This review was meant to identify the languages:

  1. Training data is accessible enough for evaluating the model to be relatively straightforward
  2. Staff thought will perform with high enough precision for experienced volunteers to consider the feedback Tone Check(s) would offer new(er) volunteers reliable.

Process

To assess the above, the team:

  1. #TODO
  2. #TODO
  3. #TODO

Findings

#TODO

Volunteer evaluation

[edit]
An example of a diff volunteers used to review the predictions the Tone Check model makes.

The internal review found sufficient training data was available and accessible enough in English, French, Japanese, Portuguese, and Spanish.

This initial review also found the model was performant enough in these languages to proceed with an external review involving experienced volunteers.

Accordingly, this second review, was meant to learn: Do experienced volunteers think what the model identifies as promotional, derogatory, or otherwise subjective language aligns with Wikipedia policies?

Process

To assess the above, the team:

  1. #TODO
  2. #TODO
  3. #TODO

Findings

#TODO

User experience

[edit]

The viability of Tone Check, like the broader Edit Check project, depends on the feature being able to simultaneously:

  1. Reduce the moderation workload experienced volunteers carry
  2. Increase the rate at which new(er) volunteers contribute constructively

To evaluate the extent to which Tone Check is effective at the above, the team will be conducting qualitative and quantitative experiments.

Below you will find:

  1. Impacts the features introduced as part of the Edit Check are intended to cause and avert
  2. Data we will use to help[1] determine the extent to which a feature has/has not caused a particular impact
  3. Evaluation methods we will use to gather the data necessary to determine the impact of a given feature
Desired Outcomes
ID Outcome Data Evaluation Method(s)
1. Key performance indicator: The quality of new content edits newcomers and Junior Contributors make in the main namespace will increase because a greater percentage of these edits will not contain peacock language
  1. Proportion of all new content edits published without biased language
  2. Proportion of new content edits that are not reverted.
A/B test, qualitative feedback (e.g. talk page discussions, false positive reporting)
2. Key performance indicator: Newcomers and Junior Contributors will experience Peacock Check as encouraging because it will offer them more clarity about what is expected of the new information they add to Wikipedia Proportion of new content edits started (defined as reaching point that peacock check was or would be shown) that are successfully published (not reverted). A/B test, qualitative feedback (e.g. usability tests, interviews, etc.)
3. New account holders will be more likely to publish an unreverted edit to the main namespace within 24 hours of creating an account because they will be made aware the new text they're attempting to publish needs to be written in a neutral tone, when they don't first think/know to write in this way themselves Proportion of newcomers who publish ≥1 constructive edit in the Wikipedia main namespace on a mobile device within 24 hours of creating an account (constructive activation). A/B test
4. Newcomers and Junior Contributors will be more aware of the need to write in a neutral tone when contributing new text because the visual editor will prompt them to do so in cases where they have written text that contains peacock language. The proportion of newcomers and Junior Contributors that publish at least one new content edit that does not contain peacock language. A/B test
5. Newcomers and Junior Contributors will be more likely to return to publish a new content edit in the future that does not include peacock language because Peacock Check will have caused them to realize when they are at risk of of this not being true.
  1. Proportion of newcomers and Junior Contributors that publish an edit Peacock Check was activated within and successfully return to make an unreverted edit to a main namespace during the identified retention period.
  2. Proportion of newcomers and Junior Contributors that publish an edit Peacock Check was activated within and return to make a new content edit without non-neutral language to a page in the main namespace during the identified retention period.
A/B test
Undesirable Outcomes
ID Outcome Data Evaluation Method(s)
1. Edit quality decreases Proportion of published edits that add new content and are still reverted within 48hours. Note: Will include a breakdown of the revert rate of published new content edit edits with and without non-neutral language. A/B test and leading indicators analysis
2. Edit completion rate drastically decreases Proportion of new content edits started (defined as reaching point that peacock check was or would be shown) that are published. Note: Will include breakdown by the number of checks shown to identify if lower completion rate corresponds with higher number of check shown. A/B test and leading indicators analysis
3. Edit abandonment rate drastically increases Proportion of edits that are started (event.action = init) that are successfully published (event.action = saveSuccess). A/B test and leading indicators analysis
5. People shown Tone Check are blocked at higher rates Proportion of contributors blocked after publishing an edit where Tone Check was shown compared to contributors not shown the Tone Check A/B test and leading indicators analysis
6. High false positive rates Proportion of contributors that decline revising the text they’ve drafted and indicate that it was irrelevant. A/B test, leading indicators analysis, and qualitative feedback

Findings

Image for: Findings
[edit]

This section will include the findings from the experiments described in #Evaluating impact.

Configurability

Image for: Configurability
[edit]

Tone Check will be implemented – like all Edit Checks – in a way that enables volunteers to explicitly configure how it behaves and who Tone Check is made available to.

Configurability happens on a per project basis so that volunteers can ensure the Tone Check experience is aligned with local policies and conventions.

The particular facets of Tone Check that will be community configurable are still being decided. If there are particular aspects of Tone Check that you think need to be configured on-wiki, we ask that you share what you are thinking in T393820 or on the talk page.

Tone Check On-wiki Configuration
ID Configurable facet Potential value(s) Default value Notes

Timeline

Image for: Timeline
[edit]
Time Activity Status Notes

Peter to populate this section with a high-level timeline of the project: background analysis, initial model development, community conversations/consultations, usability study, pre-mortem, internal model evaluation, volunteer model evaluation, development, pilot experiment, etc.

Background

Image for: Background
[edit]

Writing in a neutral tone is an important part of Wikipedia's neutral point of view policy.

Writing in a neutral tone is also a practice many new volunteers find to be unintuitive. An October 2024 analysis of the new content edits newer volunteers[2] published to English Wikipedia found:

  • 56% of the new content edits newer volunteers published contained peacock words.
  • 22% of the new content edits newer volunteers published that contained peacock words were reverted
  • New content edits containing peacock words were 46.7% more likely to be reverted than new content edits without peacock words

The above

History

Image for: History
[edit]

Tone Check, and the broader Edit Check initiative, is a response to a range of community conversations and initiatives. Some which include those listed below. For more historical context, please see Edit check#Background.

Edit Check

Image for: Edit Check
[edit]

This initiative sits within the larger Edit Check project – an effort to meet people while they are editing with actionable feedback about Wikipedia policies.

Edit Check is intended to simultaneously deliver impact for two key groups of people.

Experienced volunteers who need:

  1. Relief from repairing preventable damage
  2. Capacity to confront complexity

New(er) volunteers who need:

  1. Actionable feedback
  2. Compelling opportunities to contribute
  3. Clarity about what is expected of them

FAQ

Image for: FAQ
[edit]
Why AI?
[edit]
AI increases Wikipedia projects' ability to detect promotional/non-neutral language before people publish it.
Which AI model do you use?
[edit]
We use an open-source model called BERT. The model we use is not a large language model (LLM). It is actually a smaller language model which the Machine learning team prefers, because it tells us how probable each of its predictions is, and it's easier to adapt to our custom data.
What language(s) does/will Tone Check support?
[edit]
Work to decide the languages Tone Check will support to start is underway in T388471 and T387925.
What – if any – on-wiki logging will be in place so volunteers can see when Tone Check was shown?
[edit]
To start, we're planning for an edit tag to be appended to all edits in which ≥1 Tone Check is shown.
This approach follows what was implemented for Reference Check.
Why did you not implement Tone Check as an Abuse filter?
[edit]
What will we do to ensure Tone Check does not cause people to publish more subtle forms of promotional, derogatory, or otherwise subjective language that is more difficult for the model and people to detect?
[edit]
What control will volunteers have over how Tone Check behaves and who it is available to?
[edit]

ADD questions from the internal pre-mortem we conducted.

What control do volunteers have over how the model behaves?

References

Image for: References
[edit]
  1. Emphasis on "help" seeing as how all decisions will depend on a variety of data, all of which need to be weighted and considered to make informed decisions.
  2. "Newer volunteers" refers to people who have published ≤100 cumulative edits.