Showing posts from May, 2020

Estimating the view count distribution on TikTok

TikTok is unusual in that the views of video get are fairly unpredictable. Unlike other social media platforms like Facebook or Instagram, most views through TikTok are driven by a recommendation algorithm (rather than who you follow) so views can vary dramatically based on unpredictable nuances of the algorithm. Within the same week, I've had videos whose success differs by a factor of 10,000. I want to get some sense of how likely my next video is to be successful. Merely looking at the average of my most recent videos is not very predictive, because of how random the view counts are. This post details a method for creating these predictions, as well as a calculator so that you can protect your own future success. Methodology The view counts that videos get on TikTok are reasonably well approximated by a gamma distribution: (Note that the x-axis is logged.) This gives us a reasonable prior to use for a Bayesian update . Recalling that the conjugate prior of a ga

TikTok Is Impressively Unequal

Credit to @lilweehag for help with the scraper. I’ve created a representative sample of ~20,000 TikTok videos. I’ve only found one paper analyzing the view counts of videos, which claims that the most popular videos were Zipf-distributed. That may be true of the most popular videos, but at least in my sample they are not perfectly power law distributed: Regardless of the exact distribution, it’s easy to see that the distribution is fairly fat tailed, so I was curious what the Gini coefficient and Lorenz curves looks like. Here’s the Lorenz curve for view counts in 2020: The black line indicates perfect equality. The green x’s indicate actual data. As you can see, it is almost perfectly unequal. Indeed, the Gini coefficient is around 0.93. (0 indicates perfect equality, 1 indicates perfect inequality.)  To put that into perspective, here’s the Gini coefficients of some other things: Income in the United States: 0.41 Income in South Africa (the world’s most unequal c