Rebuilt GPT-2 with David Stutz using dynamic attention head routing, improving efficiency by 50%+ in FLOPs while maintaining overall model performance.
[Poster]
Used probabilistic models and recursive simulations to see in which situations teams should foul.
Worked with Daniel Rashes to build Markov matrices and analyze which sports leagues allow bad teams to improve the quickest.