Tarantino Profanities

Methodology

Screenplay Markup

Structural Markup

For the strucutral markup of the screenplays, we broke up the screenplays into the most important elements: the title, screenwriters, the cast list, and the script. The title and screenwriters were straightforward as it is indicated on the script itself. We created the cast list based on the TEI guidelines. In the cast list, we created a unique xml:id for each role to link the character in the the script. The script was broken down into the following elements setup and scenes. Each screenplay included a setup before the script as to set up the 'world' of script. For each scene, it was further broken down into location, action block, dialogue. In the script, the location was always listed before the start of any action block or dialogue as to situate where the scene was located. The action block would indicate how the scene was set up and how the characters in each scene were configured in the scene. To mark which dialogue belonged to which character, we referenced each role to the unique xml:id in the cast list and also noted whether if the character on or off screen, and who the dialogue was directed towards. Below is an example of our markup from Django Unchained:

<dialogue off_screen = "no" ref="dr.schultz" DOcharacter="dicky_speck">
    Yes, unless you find a talented physician
    very quickly, I'm afraid that will be the.
    end result. But back to business, how much
    do you want for Django? </dialogue>

Contextual Markup

What we are interested in the three selected Tarantino screenplays was the usage of profanities. Therefore, in the dialogue of the scenes, we tagged each profanity and categorized it as either a curse word or a slur. We established that the difference between curse words and slur words is that a slur must have some deragotory undertone, labeling a specific group of people as inferior. In addition, another distinguishing factor is all slurs were considered nouns as they were referencing a person or groups of people. For the specifice profanity 'bitch,' we tagged it differently based on if it was being directed towards a specific person/groups of people or being used as modifier of aThe slurs and curses we tagged are represented in the table below:

Curses	Slurs
Fuck	Nigger
Shit	Fag
Pussy	Cunt
Damn	Bitch
Goddamn	Sonsabitches
Bitch	Bastard
Ass	Slut
Motherfucker	Jew

There were also some curse words and slurs that were screenplay-specific such as pussyfootin' in Django Unchained. All incarnations of curse words such as added suffixes were lemmatized and designated to certain types of curse words. For example, the word "fuckin'" and "fuckers" would be categorized as type 'fuck' even though it has the suffix -in and -er respectively. Each curse word was then tagged for different parts of speech. Below is an example markup from Pulp Fiction:

<dialogue off_screen="no" ref="jules" DOcharacter="vincent"> Look, just because I wouldn't give
    no man a foot massage, don't make it right for Marsellus to throw Antwan off a building into a
    glass- <curse type="motherfuck" pos="adj">motherfuckin</curse>-house, <curse type="fuck"
        pos="verb">fuckin'</curse> up the way the<slur type="nigger">nigger</slur> talks. That ain't
    right, man. <curse type="motherfuck" pos="noun">Motherfucker</curse> do that to me, he better
    paralyze my<curse type="ass" pos="noun">ass</curse> , 'cause I'd kill'a <curse type="motherfuck"
        pos="noun">motherfucker</curse>.</dialogue>

Analysis Methodology

Graph Analysis

For the graphs, we created a SVG visualization through XSLT. In general, most of the graphs have a curse and slur version. An example would be that one graph counts the amount of curse words used while another counts slurs. These graphs have the same sizing and fixed lines even if the value of the data is low. This is because we want to avoid accidentally altering how the data is viewed. This way, all of the graphs are standardized.

Each page under the 'Results' tab above has multiple graphs for each and then all films. Every graph has the exact numerical value to its right. The graphs focus on curses, slurs, the sex of the person using these words, and the part of speech.

Network Analysis

We were also interested in the co-occurence of curse words and slurs in a single dialogue. To further explore this co-occurence, we referenced Elisa's Beshero-Bondar's Network Analysis tutorial. We first compiled all three screenplays into a single XML document. We then followed Beshero-Bondar's tutorial on how to prepare XML documents to be exported into Cytoscape, a network analysis tool. We specifically chose to use XSLT to transform our compilation of screenplays into the proper format that could be exported into Cytoscape. After we exported the data into Cytoscape, there were many different options as to how to view the network analysis. The option we chose to display the graph in was Partners of Multi-Edge Nodes filter.