Mixed-race group of three girls watching television and eating popcorn.

The See Jane 100

Gender and Race Representations in the Top Family Films of 2017

Research Geena Davis Institute on Gender in Media and funded by Google

Executive Summary

Related Links

Infographic

Full Report

The See Jane 100 examines gender and race representation in the top 100 grossing family films of 2017 using the Geena Davis Inclusion Quotient (GD-IQ). The GD-IQ is the first automated tool designed to analyze character screen and speaking time with a precision and reliability that exceeds human analytic capabilities.

We find persistent gaps in gender and race representations in family films, but also progress when it comes to the percentage of films with female leads and higher box office returns for films led by women and racially diverse co-leads. More specifically, we find that:

1. Male characters outnumber female characters two-to-one when it comes to leads (59.0% compared to 26.0%), screen time (60.9% compared to 39.1%), and speaking time (63.7% compared to 36.3%).
2. Although the number of female leads is a long way from parity, the percentage of family films with
female leads has doubled in the past four years.
3. Female-led family films grossed 38.1% more on average than male-led films, a pattern that has remained consistent over four years.
Box Office Advantages

Films with female-leads gross 38.1% more.
Films with racially diverse co-leads gross 60.5% more.
4. Three-in-four family films (73.0%) feature white actors in the leading roles compared to just 17.0% of films with protagonists of color.
5. Family films with racially diverse co-leads earned 60.5% more on average than films with whiteprotagonists.

In summary, we find that male leads continue to outnumber female leads two-to-one, and protagonists of color rarely appear as leads. When it comes to screen time and speaking time, female characters receive far less face time and speak less often than male characters. On a positive note, more family films feature female leads than in the past, and at the current pace of progress, the industry could conceivably achieve protagonist parity in a decade. These gender and race gaps persist, despite the fact that films with female leads and racially diverse co-leads earn more at the box office than films with only male leads and white leads, respectively.

We recommend that film studios improve their gender and race parity by employing the GD-IQ as a “spellcheck” for gender and race bias at every stage of the production process; making gender and racial diversity more apparent in scene descriptions in scripts; actively rejecting the myth that films led by women and people of color are less bankable; and hiring more women and people of color in key storytelling positions behind the scenes.

The See Jane 100 examines gender and race representation in the top 100 grossing family films of 2017 using the GD-IQ, the first automated tool designed to analyze character screen and speaking time. The GD-IQ is a tool to evaluate media content with a precision that exceeds the analytic capabilities of humans analyzing content through observation. We begin this report with a review of the methodology used to generate the data. We then summarize the major findings pertaining to gender then race representations. We conclude this report with recommendations based on our findings.

For this report, we analyzed the top 100 grossing (non-animated)¹ family films for 2017, as reported by Variety. We selected the top grossing films in order to evaluate the most watched family content at the movies, as well as to compare across years. We employed content analysis methodology, an approach for studying communication artifacts (such as movies) that describe the content of these artifacts through systematic observation of language use, images, etc.

We used the GD-IQ to conduct our content analysis. The GD-IQ is a tool for automated analysis of audio and visual content. The GD-IQ revolutionizes media content analysis by using algorithms, which make it possible for researchers to quickly analyze massive amounts of data. Algorithms are a set of rules or calculations that are used in problem solving. Beyond the significant advantage of being able to efficiently analyze more films in less time, the GD-IQ can also calculate content detail with a level of accuracy that eludes human coders. This is especially true for factors such as screen and speaking time, where near exact precision is possible. For this report, we employed two automated algorithms that measure the screen and speaking time of characters by their gender. For more specific information about our algorithms, please see Appendix A.

In this section, we summarize our major findings. We find persistent gaps in gender and race representations in family films, but also progress when it comes to the percentage of films with female leads and higher box office returns for films led by women and racially diverse co-leads. We begin with our gender analysis, followed by race. We conclude with a summary of our findings and recommendations for film studios.

Gender Representations: Plateaus and Progress

Here, we report our major findings for gender representation in terms of character prominence, screen time, speaking time, and box office returns. We also compare these measures across four years (2014-2017).

Leading Characters
Characters are assigned four different types of prominence in the See Jane 100: 1) leading/co-leading, 2) major, 3) minor, and 4) background character.² We focus on leading characters for this report and find that male leads outnumber female leads two-to-one (59% compared to 26%), while the remaining 15% are male-female co-leads.

Screen Time
We compare the percentage of time male and female characters appear on the screen and find that male characters receive significantly more screen time than female characters (60.9% compared to 39.1%). This means that when movie-goers see a face on the screen, the odds are two-to-one that the face is male.

Male characters speak twice as often as female characters in the top grossing films.
Figure 1
Prominence by Gender

Speaking Time
Male characters speak twice as often as female characters in the top grossing films (63.7% compared to 36.3%). This means that male characters do most of the talking in family films.

Box Office Returns
Although women are less likely to play the lead actor in the top 100 family films, female-led films grossed more revenue than male-led films. On average, family films with female protagonists grossed $148,022,519 million compared to $107,169,842 million for male-led films and $59,522,260 million for co-leads. In other words, female-led films grossed 38.1% more on average than male-led films in 2017.

On average, films with female protagonists grossed over $148 million compared to $107 million for male-led films.
Figure 2
Female Representation, 2014-2017
Figure 3
Female Advantage at Box Office, 2014-2017

Many factors determine the box office revenue of a given film, but these numbers are revealing. Our findings debunk the idea that female leads are not bankable. Films with female leads actually earned more money than family films with male leads. For example, Wonder Woman with Gal Gadot, Pitch Perfect with an ensemble cast of female leads, and Star Wars: The Last Jedi with Daisy Ridley were major box office draws in 2017. Producing female-led family films brings sound financial returns.

Gender Representations Over Time
In this section, we analyze gender representations in terms of leading characters, screen time, speaking time, and box office returns from 2014 to 2017.

The percentage of female leads has more than doubled in the past four years with steady progress from 2014 (11%), to 2015 (17%), 2016 (32%), and a dip in 2017 (26%). After half a century of stagnation, Hollywood is finally producing more films with female leads.³ If this progress continues, gender parity with protagonists could be reached within a decade.

When it comes to screen time, the percentage of time female characters appear on the screen was stagnant in 2014 (34.9%), 2015 (35.9%), and 2016 (36.5%), but saw a moderate 3-point increase in 2017 (39.1%).

Female speaking time has fluctuated over the past four years analyzed in this report, but with no definite trend toward progress. Female speaking time increased from 31.3% in 2014 to 39.2% in 2015, but fell back to 33.5% in 2016. It moderately increased to 36.3% in 2017.

Female-led family films have consistently made more money than male-led films over the past four years. In 2014, family films with female leads made 35.2% more than films with male leads. The female-led advantage was 18.5% in 2015 and 7.3% in 2016. In 2017, female-led films grossed 38.1% more on average than films with male leads.

Female-led family films have consistently made more money than male-led films over the past four years

Race Representations: People of Color Missing in Action

Figure 4
Prominence by Race

In this section, we analyze prominence and box office by race.⁴ We report findings for leading characters and box office returns.⁵

Leading Characters
Three-in-four films (73.0%) feature white actors in the leading roles compared to just 17.0% of films with protagonists of color.

Box Office Returns
When it comes to box office returns, films co-led by people of color and white actors grossed the most revenue on average. Films with racially diverse co-leads earned an average of $80,847,647 compared to $50,348,738 for films with white leads and $11,543,365 for films with leads of color. In other words, films with white co-leads/co-leads of color earned 60.5% more on average than films with only white protagonists. Jumanji: Welcome to the Jungle and Kong: Skull Island are examples of major box office draws led by racially diverse ensemble casts in 2017.

Conclusion

Using the revolutionary GD-IQ, we find persistent gaps in gender and race representations in family films. Male leads continue to outnumber female leads two-to-one, and protagonists of color rarely appear as leads. When it comes to screen time and speaking time, female characters receive far less face time and speak less often than male characters. On a positive note, more family films feature female leads than in the past, and at the current pace of progress, the industry could conceivably achieve protagonist parity in a decade.

Gender and race gaps persist, despite the fact that films with female leads and racially diverse co-leads earn more at the box office than films with only male leads and white leads, respectively. In other words, Hollywood studios continue to produce films that do not reflect the gender and race make-up of the population, even though such films make more money.

Recommendations

We recommend that family film studios engage in the following actions to improve their representations of women and people of color: Storytellers can examine character prominence and dialogue at every stage of the production process by using the GD-IQ as a “spellcheck” for gender and race bias. This tool can provide invaluable information about cast parity— including character prominence, language use, speaking time, and other character aspects— at the development, pre-production, production, and post-production stages.

Writers can be more explicit in their character descriptions to ensure that casts are representative of the larger population. For example, “Scene: A crowd gathers in a bank. In the background, we see 50% female characters, 40% people of color, and 10% otherly abled characters.” Studio executives can stop leaving money on the table by rejecting the myth that female leads and leads of color are not as bankable as white, male leads. Women constitute 52% of the movie-going audience,⁶ and Black Americans see movies 21% more often than other Americans,⁷ but the myth persists that leading white men make the most money. Several research studies over multiple years have found that this is simply not the case,⁸ and that films that attract a gender and racially diverse audience rake in three times the amount of money on opening weekend as  films that attract nondiverse audiences.⁹ To take full advantage of the profits to be gained from producing more balanced content, studio executives can prioritize the greenlighting of films featuring women and people of color.

Studio executives can increase gender and racial diversity in their casts by hiring more diverse crews. Women currently constitute 18% of the key storytelling positions in film (directors, writers, producers, executive directors, editors, and cinematographers), a percentage that has stayed virtually the same for the past two decades.¹° Previous research finds that hiring more women in key creative roles translates to more women on the screen,¹¹ and the top grossing films featuring leads of color in recent years have almost exclusively been helmed by directors of color (e.g., Black Panther, Get Out, Fences). In short, hiring a more diverse crew leads to a more diverse cast.

Appendix A: About the GD-IQ

The GD-IQ was funded by Google.org. Incorporating Google’s machine learning technology and the University of Southern California’s audio-visual processing technologies, this tool was co-developed by the Institute and led by Dr. Shrikanth (Shri) Narayanan and his team of researchers at the University of Southern California’s Signal Analysis and Interpretation Laboratory (SAIL), with additional analysis from Dr. Caroline Heldman.

To date, most research investigations of media representations have been done manually. The GD-IQ revolutionizes this approach by using automated analysis, which is not only more precise, but makes it possible for researchers to quickly analyze massive amounts of data, which allows findings to be reported in real time. Additionally, the GD-IQ allows for more accurate analysis, and because the tool is automated, comparisons across data sets and researchers are possible, as is reproducibility. Automated analysis of media content gets around the limitations of human coding.

Beyond the significant advantage of being able to efficiently analyze more films in less time, the GD-IQ can also calculate content detail with a level of accuracy that eludes human coders. This is especially true for factors such as screen and speaking time, where near exact precision is possible. Algorithms are a set of rules of calculations that are used in problem-solving. For this report, we employed two automated algorithms that measure screen and speaking time of characters by their gender. Here is an overview of the procedures we used for each algorithm.

Screen Time Analysis
We compute the screen time of female characters by calculating the ratio of female faces to the total number of faces in the film’s visuals. The screen time is calculated using online face detection and tracking with tools provided by Google’s machine learning technology. In the interest of precision and time, we estimate screen time by computing statistics over face-tracks (boxes tracking the general outline of each face) instead of individual faces. The face-tracks returned by technology include different attributes of the face with the corresponding time of occurrence in the video. Among the attributes returned for each of the detected faces, we use two parameters – the confidence of the detected face and the system’s posterior probability for gender prediction. A threshold of 0.25 was empirically chosen for determining confident face detection.

Due to multiple characters appearing on screen simultaneously, the face-tracks can be overlapping. A gender label is then assigned to each track using the average gender posterior associated with the confident faces in the track. If the average gender posterior probability of the track is greater than 0.5, the track is classified as a “female track,” otherwise, it is a “male track.” The number of frames with confident face detections in each track is summed up across all tracks to get the total number of faces. The number of female tracks is aggregated to get the total number of faces predicted as female. Finally, the screen time is computed as the ratio between the number of female face detections to the total number of face detections across the length of the movie. Supplementary analysis shows that screen time estimated at frame-level (individual faces) instead of using face-tracks was not significantly different and was comparable. Furthermore, computing the average of gender posterior over tracks has an added benefit of “smoothing out” some of the local gender prediction errors. Face-tracking incorporates temporal contiguity information to reduce transient errors in gender prediction that may occur with analyzing individual faces independently.

Speaking Time Analysis
Using movie audio, we compute the speaking time of male and female characters to obtain an objective indicator of gender representation. The algorithm for performing this analysis involves automatic voice activity detection, audio segmentation, and gender classification.

Voice Activity Detection: Movie audio typically contains many non-speech regions, including sound effects, background music, and silence. The first step is to eliminate non-speech regions from the audio using voice activity detection (VAD) and retain only speech segments. We used a recurrent neural network based VAD algorithm implemented in the open-source toolkit OpenSMILE to isolate speech segments.

Segmentation: We then break speech segments into smaller sections in order to ensure each segment includes speech from only one speaker. This is performed using an algorithm based on Bayes Information Criterion (BIC), available in the KALDI toolkit. Thirteen dimensional Mel Frequency Cepstral Coficient (MFCC) features are used for the automatic speaker segmentation. This step essentially decomposes continuous speech segments obtained in the VAD step into smaller segments to make sure no segment contains speech from two different speakers.

Gender Classification: The speech segment is then classified into two categories based on whether it was likely spoken by a male or female character. This is accomplished with acoustic feature extraction and feature normalization.

Acoustic Feature Extraction: We use 13-dimensional MFCC features for gender classification because they can be reliably extracted from movie audio, unlike pitch or other high-level features where extraction is made unreliable by the diverse and noisy nature of movie audio.

Feature Normalization: Feature normalization is deemed necessary to address the issue of variability of speech across different movies and speakers, and to reduce the effect of noise present in the audio channel. Cepstral Mean Normalization (CMN) is a standard technique popular in Automatic Speech Recognition (ASR) and other speech technology applications. Using this method, the cepstral coficients are linearly transformed to have the same segmental statistics (zero mean).Classification of the speaker as either male or female is based on gender-specific Gaussian mixture models (GMMs) of the acoustic features. These models are trained on a gender-annotated subset of general speech databases used for developing speech technologies using frame-level features for each gender. The GMM we use in this system has 100 mixture components and is optimized by tuning the parameters in a held-out evaluation set. For a new input segment whose gender label is to be predicted, the likelihoods of the segment belonging to a male or female class are computed based on this pre-trained model. The class with higher likelihood is assigned to the segment as the estimated gender prediction. The total speaking time by gender is then computed by adding together the durations for each utterance classified as Male/Female. This gives us the male and female speaking time in a movie.

Footnotes

1. Our dataset does not include animated films since the automated tool is not yet able to read animated characters. Future reports will include animated films.

2. Leading characters are those who drive the unfolding storyline. The co-lead category includes ensemble casts where both men and women are featured for roughly equal amounts of time. Films with multiple male leads or multiple female leads are folded into the male-lead and female-lead categories, respectively.

3. See Stacy L. Smith, Marc Choueiti, and Katherine Pieper, 2014. “Gender Bias Without Borders: An Investigation of Female Charac ters in Popular Films Across 11 Countries,” The Geena Davis Institute on Gender and Media and the Social Change Initiative at USC Annenberg, retrieved from seejane.org/wp-content/ uploads/genderbias-without-borders-executive-summary.pdf

4. Our GD-IQ race analysis extends only two years, so we only include findings from 2017 in this report.

5. The findings for race are limited because the GD-IQ is not yet able to calculate screen time and speaking time by race with the precision necessary for publication.

6. See Rachel Montpelier, 2017. “MPAA Report 2016: 52% of Movie Audiences are Women & Other Takeaways,” Women and Hollywood, March 24, retrieved from womenandhollywood.com/mpaa-report-2016-52-ofmovie-audiences-are-women-other-takeaways-12320da989b4/

7. See Karen Grisby Bates, 2011. “Minorities at the Movies Fill Seats, But Not Screens,” National Public Radio, June 23, retrieved from https://www.npr.org/2011/06/24/137374242/minorities-at-the-movies-fill-seats-but-not-screens

8. See Darnell Hunt, Ana-Christina Ramón, Michael Tran, Amberia Sargent, and Debanjan Roychoudhury, 2018. “Hollywood Diversity Report 2018: Five Years of Progress and Missed Opportunities,” UCLA College of Social Sciences, February 27, retrieved from socialsciences.ucla.edu/wp-content/uploads/2018/02/UCLAHollywood-Diversity-Report-2018-2-27-18.pdf

9. See Tre’vell Anderson, 2017. “New CAA Study Says Diverse Casting Increases Box office Potential Across Budgets,” The Los Angeles Times, June 21, retrieved from latimes.com/entertainment/movies/la-etmn-caa-diversity-study-exclusive-20170622-story.html

10. See Martha Lauzen, 2018. “The Celluloid Ceiling: Behind-The-Scenes Employment of Women on the Top 100, 250, and 500 Films of 2017,” retrieved from womenintvfilm.sdsu.edu/wpcontent/uploads/2018/01/2017_Celluloid_Ceiling_Report.pdf

11. See Ana Defillo, 2016. “Meet Ashley Black, One of the Only Women of Color in Late Night,” Complex, May 20,
retrieved from complex.com/pop-culture/2016/05/full-frontal-ashley-black-late-night-writerinterview

IF SHE CAN SEE IT, SHE CAN BE IT®