Guest Post: Homophily in TikTok Recommendations

This is a guest post from @lilweehag.


















Summary: Earlier this year, Marc Faddoul reported that TikTok’s “recommended profile” feature seemed to recommend demographically similar profiles (e.g a user visiting a profile of a bearded white man would be recommended the profiles of other bearded white men). His results were suggestive but small-scale. I therefore repeated his research on a larger-scale set of 7,285 recommendation pairs, with gender and race classified by a machine learning system. I find that TikTok recommendations are homophilic by gender (p < 1e-43) and race (p < 1e-79). Consistent with TikTok’s stated goal of “elevating black creators,”  I find that profiles containing a picture of a black or African-American person are recommended ~60% more than chance would predict (p < 1e-11).
Background 
It is well understood that people connect with others who are similar to them. This is sometimes referred to as “the homophily principle” and is described by McPherson et al. 2001 as:

Similarity breeds connection. This principle—the homophily principle—structures network ties of every type, including marriage, friendship, work, advice, support, information transfer, exchange, comembership, and other types of relationship. The result is that people's personal networks are homogeneous with regard to many sociodemographic, behavioral, and intrapersonal characteristics.

There is a wide body of research investigating homophily in social media, such as on Twitter and Instagram. TikTok has received relatively little investigation, presumably because of its newness (although see e.g. Serrano et al. 2020).
Methods
I visited the profiles of the 500 most followed TikTok creators. For each one, I recorded their username and profile picture, as well as the usernames and profile pictures of the accounts recommended by TikTok. I used the clarifai demographic API to classify each profile by gender and race. Any classification with a confidence less than 75% was discarded. I performed additional manual classification of 70 profiles which were frequently used but the API could not classify.

This left me with 1,506 recommendation pairs where both the source and recommended profiles had a classified race, and 4,622 pairs with classified genders. 

In this post, the profiles I originally visited (of the 500 most followed creators) are referred to as “source” profiles, and the profiles TikTok recommended me to follow are referred to as “recommended” profiles.

Please note that I was not logged into TikTok while performing this, but TikTok still (presumably) had access to various pieces of information about me such as my IP address. I believe that this may explain why Asian profiles were underrepresented in my recommendations – many of the top Asian creators are Indian, and TikTok’s recommendation algorithm may reasonably have declined to recommend these profiles to me on account of me living in the US.
Results
Gender
Gender of recommended profile
Female
Male
Gender of source profile
Female
1188
921
Male
907
1606

There appears to be homophily in these recommendations. E.g. 56% of the recommendations from female profiles are female, vs. 36% of the recommendations from male profiles being female, a difference of 56%. A chi-square test rejects the null hypothesis at p < 1e-43.

Men make up a similar fraction of both source and recommended profiles (54.4% versus 54.7%).

Race
Race of recommended profile
Asian
Black or African American
Hispanic, Latino, or Spanish origin
Middle Eastern or North African
White
Race of source profile
Asian
102
38
36
5
54
Black or African American
7
93
23
2
92
Hispanic, Latino, or Spanish origin
13
49
42
2
92
Middle Eastern or North African
0
0
0
1
5
White
36
168
117
9
520

(Note: I am using the race categories defined by clarifai; I have no strong opinion about whether these are the most useful categories. Homophilic recommendations are the entries along the diagonal and are italicized for convenience.)

A chi-square test rejects the null hypothesis at p < 1e-79. As one example: we can see that 43% of the recommended profiles from an Asian profile are themselves Asian, versus only 4% of the recommendations from a white profile being Asian.

Unlike with gender, there does seem to be a substantial difference in the racial representation of source versus recommended profiles:

Race
Source #
Source %
Rec #
Rec %
Relative change
Asian
235
0.16
158
0.10
0.67
Black or African American
217
0.14
348
0.23
1.60
Hispanic, Latino, or Spanish origin
198
0.13
218
0.14
1.10
Middle Eastern or North African
6
0.00
19
0.01
3.17
White
850
0.56
763
0.51
0.90

“Black or African-American” profiles are displayed 60% more as recommended profiles than as source, and Asian and White profiles are recommended somewhat less frequently than one would predict from their source makeup. Again, a chi-square test rejects the null hypothesis at p < 1e-11.

One possible explanation is that TikTok is intentionally promoting the profiles of black creators. This is consistent with broad marketing claims made by TikTok, although I am not aware of them ever describing this particular detail.
Conclusion
Homophily is an important driver of social networks, whose implications we do not yet fully understand. I hope these results contribute to our body of knowledge about this phenomenon.

As always, code and raw data can be found on GitHub. This is a relatively straightforward experiment, and I encourage others to do their own investigations. In particular, I would be interested in whether these results replicate if the source profiles are randomly selected creators, instead of the most popular creators.

I would like to thank Marc for noticing this phenomenon.


Comments

  1. This is entirely consistent with what I've discovered with my own research. It's nice to see the numbers behind this and to compare and contrast it to my own findings. I do think there's an area of study that you could cover for future blogs. Aligning with this theory, there's a bit of a rift between homosexual and heterosexual tiktok. But, I suspect a lot of that algorithm is still muddied, since many people watch the same gender for comparison reasons, not sexual ones. I have found that the algorithm tends to confuse comparison with sexual attraction and probably overly conflates the number of homosexual/bisexual/whatever.

    ReplyDelete
  2. Would it be possible to reach out to one or both you over email or call? I've been teaching myself programing for a couple years but would still consider myself an overall beginner-intermediate. I'm about to head to college and would love to talk to some people doing cool things with data. I don't really know anyone in the field, and you two both seem chill and approachable. Please let me know either way.

    ReplyDelete
    Replies
    1. Also, I'll comment my email if you reply. I just don't love putting it random places without a reason yet.

      Delete

Post a Comment

Popular posts from this blog

Skincare products recommended on TikTok

Full regression results