Methodology
Screenplay Markup
Structural Markup
For the strucutral markup of the screenplays, we broke up the screenplays into the most important elements: the title, screenwriters, the cast list, and the script. The title and screenwriters were straightforward as it is indicated on the script itself. We created the cast list based on the TEI guidelines. In the cast list, we created a unique xml:id for each role to link the character in the the script. The script was broken down into the following elements setup and scenes. Each screenplay included a setup before the script as to set up the 'world' of script. For each scene, it was further broken down into location, action block, dialogue. In the script, the location was always listed before the start of any action block or dialogue as to situate where the scene was located. The action block would indicate how the scene was set up and how the characters in each scene were configured in the scene. To mark which dialogue belonged to which character, we referenced each role to the unique xml:id in the cast list and also noted whether if the character on or off screen, and who the dialogue was directed towards. Below is an example of our markup from Django Unchained:
<dialogue off_screen = "no" ref="dr.schultz" DOcharacter="dicky_speck"> Yes, unless you find a talented physician very quickly, I'm afraid that will be the. end result. But back to business, how much do you want for Django? </dialogue>
Contextual Markup
What we are interested in the three selected Tarantino screenplays was the usage of profanities. Therefore, in the dialogue of the scenes, we tagged each profanity and categorized it as either a curse word or a slur. We established that the difference between curse words and slur words is that a slur must have some deragotory undertone, labeling a specific group of people as inferior. In addition, another distinguishing factor is all slurs were considered nouns as they were referencing a person or groups of people. For the specifice profanity 'bitch,' we tagged it differently based on if it was being directed towards a specific person/groups of people or being used as modifier of aThe slurs and curses we tagged are represented in the table below:
Curses | Slurs |
---|---|
Fuck | Nigger |
Shit | Fag |
Pussy | Cunt |
Damn | Bitch |
Goddamn | Sonsabitches |
Bitch | Bastard |
Ass | Slut |
Motherfucker | Jew |
There were also some curse words and slurs that were screenplay-specific such as pussyfootin' in Django Unchained. All incarnations of curse words such as added suffixes were lemmatized and designated to certain types of curse words. For example, the word "fuckin'" and "fuckers" would be categorized as type 'fuck' even though it has the suffix -in and -er respectively. Each curse word was then tagged for different parts of speech. Below is an example markup from Pulp Fiction:
<dialogue off_screen="no" ref="jules" DOcharacter="vincent"> Look, just because I wouldn't give no man a foot massage, don't make it right for Marsellus to throw Antwan off a building into a glass- <curse type="motherfuck" pos="adj">motherfuckin</curse>-house, <curse type="fuck" pos="verb">fuckin'</curse> up the way the<slur type="nigger">nigger</slur> talks. That ain't right, man. <curse type="motherfuck" pos="noun">Motherfucker</curse> do that to me, he better paralyze my<curse type="ass" pos="noun">ass</curse> , 'cause I'd kill'a <curse type="motherfuck" pos="noun">motherfucker</curse>.</dialogue>