FAQ

Methodology

Data is analysed using explainable artificial intelligence. The measurements of our algorithm can be explained from start to finish. After a message is analysed by the algorithm, it is assigned a toxicity score between 0 and 1 (0 = safe, 1 = extremely hateful).
The toxicity score is based on lexicons of hateful and otherwise problematic words and phrases. The entries of the lexicons were sourced from social media and were each assigned a toxicity level as well as toxicity categories they pertain to (sexism, racism, etc.). At least 2 native speakers reviewed each entry and their opinions were combined to determine the entry’s final score and categories.
To get a better sense of what the toxicity score means, we have divided messages into 4 buckets:
- Safe: a toxicity score of 0. This means we have not found problematic words or phrases in the message;
- Low: a toxicity score between 0 and 0.2. This means we have minor instances of toxic language;
- Medium: a toxicity score between 0.2 and 0.8. This bucket contains the messages found to be toxic without being extreme.
- High: a toxicity score between 0.8 and 1, meaning all the most hateful and toxic messages.
We detect all 24 official EU languages as well as Arabic and Russian.
However, the online landscape varies a lot across languages and this has implications and caveats that should be taken into consideration. First, English is the lingua franca of the Internet and, as such, accounts for the vast majority of online messages. While sampling can help mitigate the imbalance, English data is bound to have more diversity than data from other languages. This is also true to a lesser extent for other widely spoken languages like German or French.
Secondly, language analysis resources also differ across languages. Again, widely spoken languages benefit from a wider and more refined set of tools while less represented languages or morphologically complex ones may not have the same level of support. We have put efforts in reducing that gap but a gap in practice will remain.
We strictly process data as Data Processors under the instructions of our Data Controllers (e.g. law enforcement agents).
Freedom of speech is a cornerstone in free and democratic societies. While we want to uphold freedom of expression and diversity of opinions, we combat hate speech that can potentially lead to incitement to discrimination, violence, and hostility, according to the UN Rabat Plan of Action.
The European Observatory of Online Hate is a project supported by the European Commission’s Rights, Equality and Citizenship Programme Call, CERV-2023-CHAR-LITI-SPEECH.

Definitions

We follow the Digital Service Act (DSA) definition of Illegal hate speech; ‘all conduct publicly inciting to violence or hatred directed against a group of persons or a member of such a group defined by reference to race, colour, religion, descent or national or ethnic origin.’ Next to online hate speech, we also look at organised influence operations, information manipulation (disinformation, misinformation, malinformation), and cases of breach of the DSA.
Mainstream platforms
- Facebook (VLOP)
- Instagram (VLOP)
- Twitter (VLOP)
- YouTube (VLOP)
- TikTok (VLOP)
- Reddit
Fringe platforms
- 4chan
- Gab
- Minds
- 9gag
Fringe platforms are alternative platforms that position themselves as an alternative to the dominant ideology that governs the public sphere, providing a space for users of which the mainstream platforms are not inclusive.
Baseline channels are special channels that gather data for topics that regularly attract hateful messages. The data is collected using neutral keywords (in all EU languages) from the semantic field of the channel’s topic so as to not bias the data collection process. For instance, the keywords for the channel about antisemitism are: “jew”, “jews” and “jewish”.
- Anti-Roma
- Antisemitism
- Anti-Muslim
- Anti Refugees/Migrants
- Anti-LGBTQ+
- Sexism
Most of our baselines refer to legally protected categories and bear societal relevance.
Hate speech categories refer to the semantic categories to which lexicon entries can belong, and subsequently, the messages in which they appear.
- Dehumanisation: defaming or dehumanising expressions that belittle someone’s inherent worth.
- Sexism: Discrimination based on gender or sexual orientation (homophobia, sexism, etc.)
- Politics: expressions that relate to political ideology, in particular activism, extremism and propaganda (Infowars).
- Racism: expressions that relate to racism, on the basis of race, ethnicity, nationality.
- Religion: Words that relate to religious ideology, in particular islamophobia, jihadism and antisemitism.
- Untruth: Words that relate to conspiracy and disinformation, including government cover-up, doomsday and the occult.
- Violence and threats: Words that relate to conflict, including violence, threats and extortion. Also all names of weapons.

FAQ

Methodology

How is data analysed?

How do you measure toxicity?

How do you define High, Medium, Low, and Safe Toxicity?

What languages do you analyse?

How are you protecting users’ privacy (GDPR)?

How do you deal with freedom of speech?

Who are your funders?

Definitions

Hate speech

Mainstream platforms versus fringe platforms

Baseline channels

Why did you choose these baselines?

Hate speech categories