Young Asian girl holding a clapperboard in front of her face.



Since 2004, the Geena Davis Institute on Gender in Media at Mount Saint Mary’s University has advocated for greater inclusion in entertainment media through cutting-edge research and advocacy. The Institute is moving the needle on intersectional gender representation by working directly within the industry, with a particular focus on children’s entertainment. This report analyzes representations of gender, race, LGBTQ+, disability, age (over 60) and body size (large) in the top-grossing family films (rated G, PG, or PG-13) of 2019. We also include analysis of leading characters in live action and animated family films from over a decade (2007-2019). Here are our key findings:





AGE (60+)









This report examines representations of gender, race, LGBTQ+, disability, age (60+), and body size (large body type) representations in the 100 top-grossing family films (rated G, PG, or PG-13) of 2019. This is the first See Jane report to analyze all six major identity groups.

This study is critical because the stories that we choose to tell in entertainment media send a specific message about who matters most in our culture. In order to bring about a global culture change, it is especially important that children see diverse, intersectional representations of characters in media to reflect the population of the world—which is half female and very diverse— and avoid unwittingly instilling unconscious bias in them. As the historical findings of this report indicate, content creators in family film are making significant progress when it comes to inclusion.

Our research focuses on children’s and family programming in order to assess how media is impacting young people because youth are the highest consumers of media and the group most impacted by media content. A report from Common Sense Media finds that tweens use an average of six hours of entertainment media per day, while teens use an average of 9 hours per day.1 Children are particularly impacted by not seeing characters like themselves reflected in popular culture, as they are in the process of developing their identity and finding their place in the world.

Our research is unique in its approach and methods. First, we focus on family content. Second, our team of expert human coders applies a double-coding technique that increases the validity and reliability of the findings. Third, we are the only public data institute to consistently analyze representations of the six major marginalized identities: women, people of color, LGBTQ+ individuals, people with disabilities, older people (60+), and large-bodied individuals.

Additionally, our research organization uses the Geena Davis Inclusion Quotient (GD-IQ), a software tool with the ability to measure screen and speaking time through the use of automation (see Appendix A). This revolutionary machine learning tool was developed by the Institute and funded by The GD-IQ, which incorporates machine learning technology, was designed by Dr. Shrikanth Narayanan and his team of researchers at the University of Southern California’s Signal Analysis and Interpretation Laboratory (SAIL), along with Dr. Caroline Heldman, Vice President for Research and Insights for the Institute.


The methodology we used to produce the data in this report is content analysis, an approach that is ideal for systematically analyzing the content of communications. The unit of analysis for the automated coding tool is character gender and character race, and the unit of analysis for human coding is character. Throughout this report, we include differences that achieve statistical significance at the .05 level.

Our family film dataset includes 2,991 total characters in the 100 top-grossing family films of 2019 (rated G, PG, or PG-13). The top family films of 2019 were identified using data from Box Office Mojo and include live action and animation.

The most prominent characters who drive the unfolding storyline were classified as leads or co-leads. Characters who are not leads but contribute to the storyline were classified as supporting characters, and characters that appear only briefly were coded as minor characters. We identified 122 leading/co-leading (hereafter referred to as “leading”) characters, 1,032 supporting characters, and 1,837 minor characters. Most of our analysis in this report is based on the 1,154 leading/co-leading and supporting characters who were more prominently featured in the films.


In this section, we summarize our major findings for character representation by gender, race/ethnicity, sexuality (LGBTQ), disability, age (60+), and body size (large).


In this section, we examine gender and representation in terms of character prominence, age, stereotypes, work and leadership, traits, and box office returns.


Percentage of Female Leads in Top Family Films, 2007-2019

GD-IQ: Screen Time and Speaking Time

Percentage of Female Screen and Speaking Time, 2014-2019


Objectification and sexualization are remarkably common in family film content:

We also examined characteristics that are typically associated with masculinity—factors such as aggression and risk taking:4

Work and Leadership

Representations of work and leadership in family films tend to reinforce stereotypes about gendered occupations and men as breadwinners:

Character Traits

We do not find gender differences in characters traits that fit with gendered stereotypes:

Box Office

Domestic Box Office Revenue (in Millions) by Gender, 2007-2019

Worldwide Box Office Revenue (in Millions) by Gender, 2007-2019

Women make up 51% of film-going audiences.5 A recent study debunks the idea that movies with a female lead or a lead of color earn less money at the box office, and find that box office returns are instead driven by distribution, marketing, the quality of the story, and marketing and production costs.6 We find few gender differences in box office returns for family films.

Intersectional Analysis

Prominence by Gender and Race, 2019


Percentage of Leads of Color in Top Family Films, 2007-2019

In this section, we examine race and representation in terms of character prominence, stereotypes, work and leadership, traits, and box office returns. For clear analysis, we group race into two categories—characters of color and white characters.


GD-IQ: Screen Time

Screen Time by Race, 2019


White characters are more likely to be shown with a higher socioeconomic status

Work and Leadership

Character Traits

Domestic Box Office Revenue (in Millions) by Race, 2007-2019

Worldwide Box Office Revenue (in Millions) by Race, 2007-2019

Box Office


Percentage of LGBTQ+ Leads in Top Family Films, 2007-2019

In this section, we examine representations of LGBTQ+ characters when it comes to prominence, stereotypes, work and leadership, and character traits.



Representations of LGBTQ+ characters in family films uphold stereotypes about over-sexualization, sexual promiscuity, and death:

Family films reinforce the stereotype that LGBTQ+ people are more sexually promiscuous

Work and Leadership

Character Traits


Percentage of Leads with Disabilities in Top Family Films, 2007-2019

In this section, we examine representations of characters with disabilities when it comes to prominence, stereotypes, work and leadership, and character traits.



People with disabilities are often characterized by two familiar tropes-- the “super crip” and the “bitter crip.”11 In both types of portrayals, disability becomes the most prominent aspect of a character, and the storyline revolves around the individual’s relation to their disability. The “bitter crip” is a character who is overcome by their suffering, and she or he often becomes the villain or antagonist. The “super crip” is an individual whose life revolves around heroically and heartwarmingly overcoming their disability, which serves as a central motivator. Although this can initially seem complimentary, these storylines often reinforce the superiority of people without disabilities (and thus, the importance of “overcoming” a disability), glorify those with disabilities for being able to live a normal or successful life, and support the notion that overcoming is a matter of personal character rather than highlighting institutional or structural barriers that can make it more difficult for those living with disability to have the same resources and opportunities.

Characters with disabilities are twice as likely to die in family films than other characters

Work and Leadership

Character Traits

Age (60+)

Prominence by Age, 2019

This section analyzes representations of characters age 60 and older in terms of prominence, stereotypes, work and leadership, and character traits. We are interested in knowing whether older characters are included in family films, and when they appear, whether their representations reinforce or challenge ageist stereotypes.


Family films reinforce the stereotype that older people are not physically attractive


Family films reinforce the stereotype that people 60+ are non-sexual

Work and Leadership

Character Traits

Body Size (Large Body Type)

Prominence by Size, 2019

Forty percent of American adults have large body types according to a recent report from the Centers for Disease Control and Prevention. The average man in the U.S. is 5’9” and weighs 196 pounds, while the average woman is just under 5’4” tall and weighs 169 pounds. In this report, we are interested in whether family films present body size diversity, and whether large characters are portrayed in ways that reinforce damaging stereotypes. For purpose of this analysis, we group characters into large body type (somewhat or very large) and non-large body type.



A sizeable number of large characters are presented as negative stereotypes associated with their size:

One-in-ten large characters are used as a punchline in family films


Character Traits


The Geena Davis Institute on Gender in Media at Mount Saint Mary’s University was the first to focus on gender representation in media made for kids, and after many years of research-based advocacy, the major milestones of parity with female leads in family television (in 2011) and family films (2019) have been achieved. We focus on children’s programming because of the power of media images, and because youth are more vulnerable to negative media depictions than adults. By showing kids — from the beginning — fictitious worlds where women, people of color, LGBTQ+ individuals, people with disabilities, older people (60+), and people with a large body type are fairly represented, we begin to prevent the unintended consequence of creating unconscious biases through what should be harmless entertainments.


The fact that the percentage of family films with women leads reached parity in 2019, at 48%, is profoundly significant. Gender parity in leading characters in family films shows great progress, as does the increase in speaking time (from 31.3% in 2014 to just under 40.0%) and screen time (from 34.9% to 42.6%). But despite this progress, women are still vastly underrepresented as supporting characters and minor characters, female characters are more likely than male characters to be shown in revealing clothing, and male characters are more likely to be shown as violent. Content creators are making rapid improvements in including more women in family films, but the ways in which they are represented need improvement.


Furthermore, leads of color in family films increased significantly in the past decade, reaching 30% for the first time. The percentage of leads of color has increased dramatically in the past decade to a historic high, but racial stereotypes persist. White characters are more likely than characters of color to be portrayed as having higher socioeconomic status and shown as leaders.


Leads with disabilities have reached historic highs in the last two years (at 8%). Family films are now including more characters with disabilities than in the past, but portrayals reinforce stereotypes. They are more likely to be rescue and die in family films than other characters. On the other hand, characters with disabilities are more likely to be represented as hard working, in management positions, in STEM occupations, and as leaders than other characters.


When it comes to LGBTQ+ portrayals, family films reached a historic high in 2018, but declined in 2019. Also, portrayals reinforce stereotypes of LGBTQ+ characters as overly-sexual and sexually promiscuous, and are more likely to die in family films. Additionally, heterosexual characters are shown as smarter and harder working than LGBTQ+ characters.


Characters ages 60+ are underrepresented in family films compared to the broader population, and the ways they are presented tend to reinforce stereotypes that elderly people are non-sexual and not working. On a positive side, characters who are 60+ are more likely to be portrayed as managers and leaders in the workplace.


Large people are underrepresented in family films compared to the U.S. population. When they are shown, it is too often in ways that reinforce damaging stereotypes of large people as lazy, physically slow, stupid, poorly dressed, clumsy, and as a punchline. Large people are shown as less smart and face verbal shaming.



Dr. Caroline Heldman

Soraya Giaccardi, Rebecca Cooper, Dr. Ian Breckenridge-Jackson, Dr. Meredith Conroy, Dr. Patricia Esparza, Dr. Linzi Juliano, Dr. Ninochka Mctaggart, Dr. Rita Seabrook, Emma Burrows, Sofie Christensen, Nathan Cooper Jones, Romeo Perez, Hannah Phillips, Jenna Virgo


Dr. Shrikanth Narayanan

Dr. Naveen Kumar, Dr. Tanaya Guha, Krishna Somandepalli, Anil Ramakrishna, Nikolaos Malandrakis, Victor Martinez, Karan Singla, Rahul Sharma, Rajat Hebbar, Chloe Tanlimco, Digbalay Bose, Rimita Lahiri, Timothy Greer, Sabyasachee Baruah, Ming-Chang Chiu, Jamie Flores, Alex Young


The GD-IQ was funded by Incorporating Google’s machine learning technology and the University of Southern California’s audio-visual processing technologies, this tool was co-developed by the Institute and led by Dr. Shrikanth (Shri) Narayanan and his team of researchers at the University of Southern California’s Signal Analysis and Interpretation Laboratory (SAIL), along with Dr. Caroline Heldman.

To date, most research investigations of media representations have been done manually. The GD-IQ revolutionizes this approach by using automated analysis, which is not only more precise, but makes it possible for researchers to quickly analyze massive amounts of data, which allows findings to be reported in real time. Additionally, the GD-IQ allows for more accurate analysis, and because the tool is automated, comparisons across data sets and researchers are possible, as is reproducibility. Automated analysis of media content gets around the limitations of human coding. Beyond the significant advantage of being able to efficiently analyze more films in less time, the GD-IQ can also calculate content detail with a level of accuracy that eludes human coders. This is especially true for factors such as screen and speaking time, where near exact precision is possible. Algorithms are a set of rules of calculations that are used in problem-solving. For this report, we employed two automated algorithms that measure screen time by gender and race, and speaking time of characters by their gender. Here is an overview of the procedures we used for each algorithm.


We compute the screen time of female characters by calculating the ratio of female faces to the total number of faces in the film’s visuals. The screen time is calculated using online face detection and tracking with tools provided by Google’s machine learning technology. In the interest of precision and time, we estimate screen time by computing statistics over face-tracks (boxes tracking the general outline of each face) instead of individual faces. The face-tracks returned by technology include different attributes of the face with the corresponding time of occurrence in the video. Among the attributes returned for each of the detected faces, we use two parameters - the confidence of the detected face and the system’s posterior probability for gender prediction. A threshold of 0.25 was empirically chosen for determining confident face detection.

Due to multiple characters appearing on screen simultaneously, the face-tracks can be overlapping. A gender label is then assigned to each track using the average gender posterior associated with the confident faces in the track. If the average gender posterior probability of the track is greater than 0.5, the track is classified as a “female track,” otherwise, it is a “male track.” The number of frames with confident face detections in each track is summed up across all tracks to get the total number of faces. The number of female tracks is aggregated to get the total number of faces predicted as female. Finally, the screen time is computed as the ratio between the number of female face detections to the total number of face detections across the length of the movie. Supplementary analysis shows that screen time estimated at frame-level (individual faces) instead of using face-tracks was not significantly different and was comparable. Furthermore, computing the average of gender posterior over tracks has an added benefit of “smoothing out” some of the local gender prediction errors. Face-tracking incorporates temporal contiguity information to reduce transient errors in gender prediction that may occur with analyzing individual faces independently. We performed a similar analysis for character race and screen time.


Using movie audio, we compute the speaking time of male and female characters to obtain an objective indicator of gender representation. The algorithm for performing this analysis involves automatic voice activity detection, audio segmentation, and gender classification.


Movie audio typically contains many non-speech regions, including sound effects, background music, and silence. The first step is to eliminate non-speech regions from the audio using voice activity detection (VAD) and retain only speech segments. We used a recurrent neural network based VAD algorithm implemented in the open-source toolkit OpenSMILE to isolate speech segments.


We then break speech segments into smaller sections in order to ensure each segment includes speech from only one speaker. This is performed using an algorithm based on Bayes Information Criterion (BIC), available in the KALDI toolkit. Thirteen dimensional Mel Frequency Cepstral Coefficient (MFCC) features are used for the automatic speaker segmentation. This step essentially decomposes continuous speech segments obtained in the VAD step into smaller segments to make sure no segment contains speech from two different speakers.


The speech segment is then classified into two categories based on whether it was likely spoken by a male or female character. This is accomplished with acoustic feature extraction and feature normalization.


We use 13-dimensional MFCC features for gender classification because they can be reliably extracted from movie audio, unlike pitch or other high-level features where extraction is made unreliable by the diverse and noisy nature of movie audio.


Feature normalization is deemed necessary to address the issue of variability of speech across different movies and speakers, and to reduce the effect of noise present in the audio channel. Cepstral Mean Normalization (CMN) is a standard technique popular in Automatic Speech Recognition (ASR) and other speech technology applications. Using this method, the cepstral coefficients are linearly transformed to have the same segmental statistics (zero mean).Classification of the speaker as either male or female is based on gender-specific Gaussian mixture models (GMMs) of the acoustic features. These models are trained on a gender-annotated subset of general speech databases used for developing speech technologies using frame-level features for each gender. The GMM we use in this system has 100 mixture components and is optimized by tuning the parameters in a held-out evaluation set.  For a new input segment whose gender label is to be predicted, the likelihoods of the segment belonging to a male or female class are computed based on this pre-trained model. The class with higher likelihood is assigned to the segment as the estimated gender prediction. The total speaking time by gender is then computed by adding together the durations for each utterance classified as Male/Female. This gives us the male and female speaking time in a movie.


1. Common Sense Media (2019). The Common Sense Census: Media Use by Tweens and Teens 2019,

2. The Geena Davis Institute on Gender in Media (2018). The Geena Benchmark Report: 2007-2017,

3. Sexual objectification is the act of treating a person as an instrument of sexual pleasure. Objectification more broadly means treating a person as a commodity or an object without regard to their personality or dignity. Panning refers to rotating a camera on its vertical or horizontal axis. In this instance, it refers to moving from one part of a body to another. Slow motion can be used to accentuate various aspects of the images on a screen. For this particular measure, record instances when slow motion is used to accentuate a character’s physical form in a sexual way, for example, jiggling breasts. Verbal sexual objectification can come in many forms, including cat calling and comments a character makes about another character’s physicality to a third party.

4. See Levant, R. F., Hirsch, L. S., Celentano, E., & Cozza, T. M. (1992). ”The Male Role: An Investigation of Contemporary Norms.” Journal of Mental Health Counseling and Parent, 14(3), 325-37. See also M. C., & Moradi, B. (2011). “An Abbreviated Tool for Assessing Conformity to Masculine Norms: Psychometric Properties of the Conformity to Masculine Norms Inventory,” Psychology of Men & Masculinity, 12(4), 339.

5. Motion Picture Association of America, (2018). 2018 Theme Report,

6. See Smith, S.L., R. Weber, M. Choueiti, K. Pieper, A. Case, K. Yao, and C. Lee (2020). The Ticket to Inclusion: Gender & Race/Ethnicity of Leads and Financial Performance Across 1,200 Popular Films. The Annenberg Inclusion Initiative, University of Southern California,

7. United States Census (2020). U.S. Census Bureau Quickfacts.

8. Our research team coded for work ethic, defined as the principle that hard work is intrinsically virtuous or worthy of reward. We are measuring a character’s work ethic by how they perform their work. Work can be defined as schoolwork, paid labor, volunteer work (unpaid labor), housework, etc.

9. Newport, F. (2018). “In U.S., Estimate of LGBT Population Rises to 4.5%.” Gallup,

10. United States Census (2012). Nearly 1-in-5 People have a Disability in the U.S. Census Bureau Reports. July 25, census. gov/newsroom/releases/archives/miscellaneous/cb12-134.html.

11. Nelson, J.A. (1994). “Broken Images: Portrayals of Those With Disabilities in American Media.” In J.A. Nelson (Ed.) The Disabled, the Media, and the Information Age (pp.1-24). Westport, CT: Greenwood Press.


Heldman, Caroline, Rebecca Cooper, Shrikanth Narayanan, Krishna Somandepalli, Emma Burrows, Sofie Christensen, Nathan Cooper-Jones, Meredith Conroy, Soraya Giaccardi, Linzi Juliano, Ninochka McTaggart, Romeo Perez, Hannah Phillips, Rita Seabrook, and Jenna Virgo (2020). See Jane 2020 Report. The Geena Davis Institute for Gender in Media at Mount St. Mary’s.