Creating a Wordcloud with the Twitter Api in R Studio
May 14, 2018 By Pascal Schmidt Other R
In this blog post, we are going to show you how you can easily create a twitter wordcloud.
Connecting with the Twitter API
In order to get data from twitter into R, we need the API key, the API
secret, the Access token and the Access token secret. So first, sign up for a twitter account if you haven’t already and make sure that your mobile phone number is associated with account. Then go to https://apps.twitter.com/ and sign in. Afterwards, click on “Create New App” and fill out this form:
In order to be able to fill out this form, you must create your own website. You can create a free wordpress website for example. Click on the “Yes, I agree” box and then click on “Create your Twitter application”. Click on the “Permissions” tab and change the permission to “Read Write and Access direct messages”. Click on “Keys and Access Tokens” tab to generate your Consumer Key (API key) and Consumer Secret (API secret), Access Token and Access Token Secret…
… and you are done. Now back to R.
library(twitteR) library(ROAuth) library(stringr) library(wordcloud) library(twitteR) consumer_key = "your consumer key" consumer_secret = "your consumer secret" access_token = "your access token" access_secret = "your access secret" setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret) ## [1] "Using direct authentication" # put 2 in the R console
Put in your own consumer key, your consumer secret, your access token, and your access secret and you are good to analyse twitter data!
Pulling Data From Twitter
Tweets = searchTwitter(searchString = "worldcup -filter:retweets", n = 2000, lang = "en")
In the code above, we want to get tweets that include the word “”worldcup”. We do not want to include retweets in our data. We want to get 2000 tweets and specify the language to be english.
TweetsDF = twListToDF(Tweets)
Now, we have to convert the tweets we got into a data frame.
text = gsub("http[s]?://[[:alnum:].\\/]+", "", dsTweetsDF$text) # remove urls text = gsub("(?!(#|@))[[:punct:]]", "", text, perl = T) # remove all punctuations except # and @. text = gsub("[[:cntrl:]]", "", text) words = unlist(strsplit(text, " ")) hashtags = grep("^#\\w+", unlist(strsplit(text, " ")), value = T) # ^ -> matches the start of a string # w+ means that it matches one word character or more than one word character. handles = grep("^@\\w+", unlist(strsplit(text, " ")), value = T) hashtags.freq = table(hashtags) handle.freq = table(handles)
After we have done some data cleaning, we are now ready to create a wordcloud with only hashtags and only handles.
Creating Wordclouds
wordcloud(names(hashtags.freq), hashtags.freq, min.freq = 4, colors = rainbow(8), random.order = FALSE)
wordcloud(names(handle.freq), handle.freq, min.freq = 4, colors = rainbow(8), random.order = FALSE)
text = gsub("@\\w+", " ", dsTweetsDF$text) # removes all the handles in the tweets text = gsub("(?!')[[:punct:]]", "", text, perl = T) # removes all the punctuation except apostrophe text = gsub("[[:cntrl:]]", "", text) # removes all the control chracters, like \n or \r text = gsub("[[:digit:]]", "", text) # removes numbers text = gsub("http\\w+", "", text) # removes url links text = gsub("[ \t]{2,}", " ", text) text = gsub("^\\s+|\\s+$", "", text) # remove unnecessary spaces words = strsplit(text, " ") # split into words
words = unlist(words) words = words[!words %in% tm::stopwords(kind = "english")] wordcloud(names(table(words)),table(words),min.freq=15,colors=rainbow(8))
We used the tm package to exclude stop words from our tweets before we created the wordcloud.
Recent Posts
Recent Comments
- Kardiana on The Lasso – R Tutorial (Part 3)
- Pascal Schmidt on RSelenium Tutorial: A Tutorial to Basic Web Scraping With RSelenium
- Pascal Schmidt on Dynamic Tabs, insertTab, and removeTab For More efficient R Shiny Applications
- Gisa on Persistent Data Storage With a MySQL Database in R Shiny – An Example App
- Nicholas on Dynamic Tabs, insertTab, and removeTab For More efficient R Shiny Applications