Gender Statistics
Explore the gender ratios of authors and instructors in the dataset
This dashboard shows aggregate statistics about the gender of instructors and authors in the Open Syllabus dataset, faceted by field, institution, and country.
Under the hood, these numbers are based on predictions from a gender classification model that is applied to the raw text of the instructor and author names that we extract from the syllabi. These models are trained on datasets from the following papers:
- Santamaría, Lucía and Helena Mihaljević. “Comparison and benchmark of name-to-gender inference services.” PeerJ Computer Science 4 (2018): n. pag.
- Bérubé, Nicolas et al. “Wiki-Gendersort: Automatic gender detection using first names in Wikipedia.” (2020).
- Menendez, David et al. “Damegender: Writing and Comparing Gender Detection Tools.” Seminar on Advanced Techniques and Tools for Software Evolution (2020).
As always, it's important to keep in mind that gender itself is not binary, and that these kinds of name-based models have a number of limitations. By providing aggregate statistics, our goal is to help inform discussions about broad patterns of gender imbalance that emerge at the level of fields and institutions, when pooled across thousands of names.
Predictions about the gender identity of specific people on the basis of their names can be incorrect and harmful; for this reason we don't publish any data at the level of individual names.
In order to provide a contemporary picture of author and instructor gender balance, the analysis is limited to institutions for which we have at least 2,000 syllabi from the last three years. In addition, time series views are limited to 2014-2023.
There are four distinct groups of names in the dashboard:
Female-associated names
Ambiguous names
Unresolved names
Male-associated names
Female-associated and male-associated names are groups of names where the classifier predicted either female or male with a high confidence. Ambiguous names are a group of names where the classifier predicted either female or male with a low confidence. Unresolved names are a group of names where the classifier is not well suited for such as initials, non-name entities, etc.
FILTERS
Choose fields for time series view
Or filter by
No results found.
Try removing a filter.