Tracking Geographical Origin of Twitter Posts
In an ambitious project, a Python script has been developed to analyse public sentiment towards K-12 learning during the COVID-19 pandemic in the United States. The script, which utilises Natural Language Processing (NLP) and Supervised Machine Learning, filters Twitter data by account location to focus on sentiment within the US.
The filtering process begins with a script that listens for relevant Tweets using tweepy.StreamListener. When a Tweet is received, it is further processed in the script and stored in a MongoDB collection for analysis.
The approach used to filter Tweets by account location depends on whether the Account Location contains a US identifier. This could be a valid US state name or the term 'USA'. To cater for both possibilities, two regular expressions are employed: one checks for US state two-character abbreviations, while the other checks for US state full names and the term 'USA'.
The script's on_status function is where the magic happens. If Account Location is populated in the Tweet, it is converted to upper case and checked against the two regular expressions. A helper function, get_is_usa_loc, is called within on_status to determine if Account Location is populated in the Tweet.
It's important to note that Account Location is based on the 'home' location provided by the user in their public profile. However, since Account Location is not guaranteed to be populated, some relevant Tweets may be missed.
The Twitter Developer site provides guidance on available options for filtering Tweets by location. The script's creator has utilised this guidance to ensure the most accurate and comprehensive filtering possible.
Once the Tweets have been filtered, a text classifier is employed to predict the sentiment of the Tweets on this topic. This classifier will play a crucial role in understanding the public sentiment towards K-12 learning during the COVID-19 pandemic in the United States.
The project's objective is to provide valuable insights that can inform decisions and policies related to K-12 learning during these challenging times. The results of this project could potentially help educators, policymakers, and parents make informed decisions to support students during the ongoing pandemic.