Visualizing Word Frequency In Julius Caesar

Yesha Patel
3 min readMar 6, 2021

A Picture is worth a thousand words.

William Shakespeare is arguably the most famous writer of the English language, known for both his plays and sonnets. One of his book Julius Caesar combines various genres, most importantly the historical and tragic genres. I begin to wonder…

What are Shakespeare’s most common words in Julius Caesar, and how frequently do they occur?

William Shakespeare

To answer this question we will scrap Julius Caesar from Project Gutenberg’s website (which features a wide corpus of books) using the Python package requests. Then we’re going to extract words from this web data using BeautifulSoup. Next, we’ll dive into analyzing the distribution of words using the Natural Language ToolKit (nltk) and Counter. Finally, we start the data visualization technique, where words from a given text are presented in a graphic, with more important words written with larger, bold fonts, whereas less important words are displayed with smaller, thinner fonts. Using Wordcloud, matplotlib.pylot and PIL.

Code:

# Importing requests, BeautifulSoup, nltk, pandas, wordcloud,PIT
import requests
from bs4 import BeautifulSoup
import nltk
from collections import Counter
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import re
import numpy as np
from PIL import Image
#getting the tempest book in html format
r_url = requests.get('https://www.gutenberg.org/files/1522/1522-h/1522-h.htm')
r_url.encoding = 'utf-8'
html = r_url.text
# Creating a BeautifulSoup object from the HTML
soup = BeautifulSoup(html,'html.parser')
# Getting the text out of the soup
text = soup.get_text().lower()
# Remove URL and number
text = re.sub("https*\S+", "", text)
text = re.sub(r'\S*\d+\S*', '', text)
# Creating a tokenizer
tokenizer = nltk.tokenize.RegexpTokenizer('\w+')
# Tokenizing the text
tokens = tokenizer.tokenize(text)
# Getting the English stop words from nltk
sw = nltk.corpus.stopwords.words('english')
# Create a list containing all words that are in words but not in sw
words_ns = [word for word in tokens if word not in sw]
# Initialize a Counter object from our processed list of words
count = Counter(words_ns)
# convert class 'collections.Counter' into dataframe
word_dataframe = pd.DataFrame.from_dict(count, orient='index').reset_index()
# rename Column
word_dataframe.columns = ['Word','Frequancy']
# sorting by most frequent word
word_dataframe.sort_values(by='Frequancy',ascending=False)
dataframe
# visualizing frequency of words and correlates the size
stopwords = set(STOPWORDS)
mask = np.array(Image.open("william-shakespeare.jpg"))
wordcloud_fra = WordCloud(stopwords=stopwords, background_color="white", mode="RGBA", max_words=1000, mask=mask).generate(text)
# create coloring from image
image_colors = ImageColorGenerator(mask)
plt.figure(figsize=[10,15])
plt.imshow(wordcloud_fra.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off")
plt.show()

Output:

Word Cloud of Julius Caesar

Github:

https://github.com/yeshapatel356/Visualizing-Word-Frequency-In-Julius-Caesar

Conclusion:

We Visualized the most common words used by Shakespeare in Julius Caesar is Brutus with 371 counts.

About Brutus

He is a supporter of the republic who believes strongly in a government guided by the votes of senators. While Brutus loves Caesar as a friend, he opposes the ascension of any single man to the position of dictator, and he fears that Caesar aspires to such power. Brutus’s inflexible sense of honour makes it easy for Caesar’s enemies to manipulate him into believing that Caesar must die in order to preserve the republic. While the other conspirators act out of envy and rivalry, only Brutus truly believes that Caesar’s death will benefit Rome. Unlike Caesar, Brutus is able to separate completely his public life from his private life; by giving priority to matters of state, he epitomizes Roman virtue. Torn between his loyalty to Caesar and his allegiance to the state, Brutus becomes the tragic hero of the play.

Reference:

Thanks for reading. Happy Coding!💻😊

--

--