Gender Statistics

Explore the gender ratios of authors and instructors in the dataset

This dashboard shows aggregate statistics about the gender of instructors and authors in the Open Syllabus dataset, faceted by field, institution, and country.

Under the hood, these numbers are based on predictions from a gender classification model that is applied to the raw text of the instructor and author names that we extract from the syllabi. These models are trained on datasets from the following papers:

  • Santamaría, Lucía and Helena Mihaljević. “Comparison and benchmark of name-to-gender inference services.” PeerJ Computer Science 4 (2018): n. pag.
  • Bérubé, Nicolas et al. “Wiki-Gendersort: Automatic gender detection using first names in Wikipedia.” (2020).
  • Menendez, David et al. “Damegender: Writing and Comparing Gender Detection Tools.” Seminar on Advanced Techniques and Tools for Software Evolution (2020).

As always, it's important to keep in mind that gender itself is not binary, and that these kinds of name-based models have a number of limitations. By providing aggregate statistics, our goal is to help inform discussions about broad patterns of gender imbalance that emerge at the level of fields and institutions, when pooled across thousands of names.

Predictions about the gender identity of specific people on the basis of their names can be incorrect and harmful; for this reason we don't publish any data at the level of individual names.

In order to provide a contemporary picture of author and instructor gender balance, the analysis is limited to institutions for which we have at least 2,000 syllabi from the last three years. In addition, time series views are limited to 2014-2023.

There are four distinct groups of names in the dashboard:

Female-associated names

Ambiguous names

Unresolved names

Male-associated names

Female-associated and male-associated names are groups of names where the classifier predicted either female or male with a high confidence. Ambiguous names are a group of names where the classifier predicted either female or male with a low confidence. Unresolved names are a group of names where the classifier is not well suited for such as initials, non-name entities, etc.


Choose fields for time series view

By Field

Or filter by

By Country
By Institution

No results found.

Try removing a filter.