Developed Natural Language Processing machine learning models in Python to filter unrelated text from iteration reference game transcripts with 100,000+ messages. This helped improve data precision and processing efficiency. I also standardized datasets in R by participant demographics to enable futher analysis of description patterns.
Compiled and unified precinct-level voting data from 20+ elections in 10 different countries so each election would be compatible with Professor Mebane's eForensics package. Using this package, I analyzed datasets with 100,000+ precincts in R to determine the legitimacy of vote counts. Additionally, I created visuals using shapefiles and documented methods for reproducibility
Examined social media analytics of Major League Baseball teams. I first processed and cleaned around 20,000 tweets using an ScrapeHero and Excel, and then analyzed the dataset.Â