Do other Disney villains and heroes use similarly sophisticated language?

Last week, I published a post comparing the lexical sophistication of Aladdin and Jafar’s word use in the classic Disney film Aladdin. My analysis showed that the average word frequency (according to the COCA magazine sub-corpus – ( of the two characters differed in terms of the raw number of lemmatized tokens, but not in terms of lemmatized word types. In other words, Aladdin repeats frequent lemmas more often than Jafar does, but both characters use similarly frequent/infrequent types of words (i.e., unique words).

Today I would like to give a brief report on two other Disney films I’ve analyzed over the past week: The Little Mermaid and The Hunchback of Notredame. The villains in these two films – the sea witch Ursula, and the cruel justice minister Frollo – are unmistakenly evil, selfish characters who believe themselves to be superior to the heroes (or at least to their families). Driven by revenge and malice for King Triton, Ariel’s father, Ursula sets out to become Queen of Atlantia by manipulating Ariel into giving up her voice. Although she speaks in an American accent, she speaks with bravado and exudes a sense of superiority that echoes the middle-of-the-20th-century Hollywood elite. The dogmatic Frollo, who proclaims the abandoned Quasimodo as a monster in his first few lines of the film, consistently positions himself as a savior of the weak (reflected in his reluctant rearing of Quasimodo), and punisher of the wicked – in other words, an absolute controlling nightmare.

I ran a slightly modified version of the code reported in the previous post to:

  1. Load the script file
  2. Cut out the scene descriptions
  3. Separate and save the heroes/villains’ lines
  4. Lowercase, tokenize, remove stopwords/punctuation, and lemmatize the heroes/villians’ lines (i.e., preprocess)
  5. Save the tokens to an excel file alongside frequency counts from the COCA Magazine Unigram Frequency List

Subsequently, I ran the same analysis in R using the ggstatsplot package, looking at (a) average lemma token frequency; and (b) average lemma type frequency. Here are the results:

ggstatsplot of average lemma type frequency by character (The Little Mermaid)


ggstatsplot of average lemma token frequency by character (The Little Mermaid)
ggstatsplot of average lemma type frequency by character (The Hunchback of Notredame)
ggstatsplot of average lemma token frequency by character (The Hunchback of Notre Dame)

Observing these four graphs, we see that the same pattern identified in Aladdin holds for the two films investigated here. In “The Little Mermaid” (the first two graphs), we see that Ariel and Ursula produce 173 and 273 word types, respectively. The average lemma frequencies for the two characters are statistically equivalent. In terms of word tokens, Ursula produces a whopping 890 to Ariel’s paltry 314 (which is not surprising considering that for half of the movie, she doesn’t speak). Despite this, the average lemma frequencies for the two characters are significantly different; that is, on the whole, Ariel uses more common words than Ursula does. 

We arrive at the same pattern for The Hunchback of Notre Dame: Quasimodo and Frollo use equally frequent lemma types on average, while Frollo uses less frequent lemma tokens overall. 

Overall, these findings lend greater support to my initial hypothesis that, although Disney villains come off as superior in many ways, this sophistication does not extend to their use of unique word types!    

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">