In obtaining data, there are a few limitations to our approach. The first is the use of the REST API. While able to retroactively retrieve tweets and all data associated with them, such as text, favorites, retweets, user, profile color, etc., the API is limited to retrieval within 7-10 days. Reason being Twitter does not keep tweets indexed past this time for free users. A paid premium license would allow for retrieval of all tweets without limit.
Our second limitation is using web-scraping to retrieve tweets. This method is able to retrieve all tweets so long as they are public and available, that is, so long as they are not deleted. This means there are probably deleted data we missed that could have contributed to our insights. In addition, web-scraping only allows retrieval of user-facing data: tweet text, number of favorites, number of retweets, geotag, and permalink. No metadata is available using this method.
Our end analysis is severely limited due to the nature of the tweets associated with this topic. Tweets are predominantly simplified and fragmented texts (like using hashtags and links), sometimes paired with images and accompanying captions that may not always be descriptive of the image. As such, our text and sentiment analysis is limited by the lack of descriptive text. This is seen in the above sentiment analysis, with the majority of tweets having neutral sentiment. Attempting to perform image analysis is difficult for machines and requires a vast amount of training for any image recognition model.
Another end analysis limitation is our biases towards the keywords we have chosen to link with Glossier and to analyze. Perhaps the words we have chosen are not as impactful as we believe for them to be, and perhaps there are other variations that might yield more effective findings. Related to this limitation are the text variations in keyword spelling, nicknames, and spacing, which were not comprehensive, as we did not take into account misspellings and foreign languages. If we had taken into account other languages other than English, the results would be even more accurate and likely different.
An overarching (and one that we perceive to be the greatest reason) limitation is our lack of access to the Instagram API, which is Glossier's main social media outlet. By limiting our API access to Twitter, we miss out on the visual-based, content-rich space of Instagram, where the company's main marketing takes place and where follower engagement is at its peak.
One possible ethical issue we encountered in our research is our inclusion of sample tweeted text under section "Text Analysis." Although we did not provide any other key value, like screenname, geotag, or timestamp, the specific tweet might still be identified if an individual were to use the text we posted in the search bar on Twitter.