Personalization and Polarization on Social Media

Social media platforms are a central part of how people consume media content, from short videos and memes to news and political information. More than 80% of U.S. adults report using at least some form of social media and more than 50% regularly use social media platforms to consume news. Content on platforms such as Twitter/X, Facebook, Instagram or Reddit is often sorted by algorithms that aim to maximize users’ engagement. This can have adverse side effects and users might see more unreliable and polarizing content when they view content sorted by algorithms. However ranking content is not the only place where algorithms play a role on social media platforms. A second important aspect are recommender algorithms, that suggest “people you might know” or communities “you might also like”. These personalization helpers might contribute to an increase in filter bubbles and echo chambers by making it easier for similar users to link with each other. Understanding the role of algorithms in boths steps is extremely relevant for policy makers who have recently moved to regulate large social media platforms.

In this paper I study how algorithmically-aided personalization can impact network formation and consequently content viewed and posted on Reddit, a large social media platform. I exploit a platform-wide change in the registration procedure for new users that replaced universal subscriptions to 50 default communities with personalized network building. I follow a regression discontinuity design, comapring users who registered slightly after the change to users who registered slightly before and were not affected by the change. I study three distinct aspects.

  1. Network Formation: How do personal recommendations affect who users connect and interact with?
  2. Content Exposure: How does algorithmic personalization change the type of content users see?
  3. Political Impact: How do these changes influence users' political engagement and polarization?

Network Formation

Users on Reddit do not form links with each other but rather join communities. To understand who they are likely interacting with, I focus on potential interactions due to membership in the same communities. I develop three neural network classifiers to predict the gender, age group and political leaning of all Reddit users using data about their activity on the platform (see full paper for details). Using this data I calculate for each user and month the excess exposure to their own type, meaning how much higher (or lower) is the probability that the user interacts with another user of his own type given joint community membership when compared to a random network?

Binned scatter plot of excess exposure for users who signed up around May 1, 2017. The data contains pooled excess exposure to own type between May 2017 and November 2017 for users for whom an exact date of registration is known. Orange lines are 4th order polynomial fits. Vertical black lines are 95% confidence intervals.

The figure shows the average excess exposure for users who signed up before the change and those who registered after the personalization change. Users who received the personalized onboarding were significantly more likely to interact with similar users. The increase in excess exposure relative to the control group is 24%.

Content Exposure

Users form more homophilic networks, but what does this mean for the content they consume? I proxy the type of content that users are exposed to using the category of the communities that users posted in and thus likely subscribe to (again, see full paper for the nethodology and details). I focus on two measures, first I calculate the topic HHI for each user, a measure of specialization, a larger HHI would indicate that the user is active in less categories and focuses more on fewer specific topics. Second, I study the probability that a user is active in any subreddit that is classified as political.

Binned scatter plots of the Herfindahl-Hirschman-Index over the share of subreddit categories in which a user posted between May 1 and December 2017 (a) and the probability to post in any political subreddit (b) for users who signed up around May 1, 2017. Pooled data between May 2017 and November 2017 for users for whom an exact date of registration is known. Orange lines are 4th order polynomial fits. Vertical black lines are 95% confidence intervals.

As the figure shows, users who receive personalized onboarding have a higher topic HHI and are thus more specialized. Even more important, they are less likely to post in political communities. Generally it seems that users are avoiding political content when they can.

Own posts and political polarization

Finally, I want to understand how personalization impacted the content that users post and share. I use two specialized large language models to identify if a post or comment is political and if so, its political leaning. The next figure shows the regression discontinuity plot for these outcomes.

Binned scatter plots the share of political content (a) or the share of conservative minus liberal content (b) that a users posted between May 2017 and November 2017. Data for users who signed up around May 1, 2017. Pooled data between May 2017 and November 2017 for users for whom an exact date of registration is known. Orange lines are 4th order polynomial fits. Vertical black lines are 95% confidence intervals.

I find that consistent with the consumed content, users are less likely to post political content but there is no difference in their political leaning.

Policy relevance

The findings are directly relevant for the design of legislation aimed at curbing the negative consequences of algorithmic content selection, such as the Digital Services Act of the European Union. I show that personalization and algorithms promoting personalized networks allow users to exercise greater choice over their content environment, enabling them to avoid specific categories like politics, an outcome many users desire. Therefore, regulatory interventions targeting the network formation stage must carefully weigh the goal of mitigating homophily against the potential cost of restricting user autonomy in shaping their information feeds and avoiding unwanted content exposure.

Read the full paper.