en de

Online Magazine

Analyze discussions on Twitter

Social media data generate insights regarding all sorts of questions like how users on a social medium discuss important topics. One of them is gender-specific medicine – the consideration of gender differences in healthcare. So far, this is mainly discussed by experts, however, social media such as Twitter are believed to be suitable platforms to broaden the conversation. Is this really true? We analysed real-world Twitter data to find out.

By Katharina Batzel

Social media data can be used to generate insights on all sorts of things. It can for example help an organization spot trends that are relevant to their business or find out how customers feel about their products and services. On the other hand, it enables researchers to understand conversations about societally important topics. With the help of social media data, they can analyze what is being discussed by whom and in what way.

In our research, we for example used data from Twitter to examine how users talk about gender-specific medicine. Let me show you why this is important and how we did it.

The Topic: Gender-specific medicine

Gender-specific medicine is the practice of taking differences between men and women into account when conducting medical research and treating patients. In both areas, these differences still play only a minor role, thereby putting individuals’ health at risk.

Historically, biomedical studies, clinical trials, and drug development have used male subjects (Clayton 2016). Assuming that human cells are identical, they drew conclusions from their findings for both sexes. But medicine is neither sex- nor gender-neutral (Regitz-Zagrosek 2012). The latest example for this is the Covid-19 pandemic: studies show that the virus is deadlier for men than for women, with an increased mortality rate of 0.9% in Chinese men and more severe cases in elderly European men (Gebhard et al. 2020). This difference is caused by sex-specific factors, such as hormone-driven immune response, as well as gender-specific factors such as lifestyle, stress, and socioeconomic conditions (Gebhard et al. 2020).

Unfortunately, knowledge on the topic of gender-specific medicine is limited and so far, almost exclusively being talked about by experts. Meanwhile, the wider public remains largely excluded from the discussion. The question therefore is: how can this be changed?

The question: How does information on gender-specific medicine flow on Twitter?

Nowadays, both individuals and healthcare providers increasingly use social media, such as Twitter for example, to exchange health-related information. But who is interacting with whom? How does information flow? How diverse is knowledge across the Twitter network? And how could it potentially be improved? To find out, we analyzed real-world data from Twitter discussions around gender-specific medicine.

The method: Social network analysis

To examine the quality of discussion and the information flow around gender-specific medicine on Twitter, we took the following approach:

  1. Data selection (4-step search term selection)
  2. Data extraction via Twitter API
  3. Network structure classification
  4. Community detection

First, we selected our data base by collecting publicly available tweets containing 15 different search terms from Twitter from January to May 2021. To build the network, we used the Python package NetworkX (Hagberg et al. 2020). We included all the 12,603 users and removed self-loops from the data. In our network, users serve as the nodes, and their interactions in the form of retweets, quotes, replies and mentions are the edges. For the graphs, we used the spring layout which is based on the Fruchterman-Reingold force-directed algorithm. It maps connected nodes closer to one another, than to disconnected ones (Fruchterman and Reingold, 1991). Initially, nodes are pushed apart, then connected dots are pulled closer. Those types of layouts have the advantage of accommodating large networks and of clearly revealing community structure.

In a second step, we extracted this data via the Twitter API.

Then followed the network classification. Especially 4 characteristics were important to assess the quality of discussion in the network on gender-specific medicine (see also Figure 1):

  1. The degree of centralization of the network structure: In highly centralized networks, only a few users contribute most of the content and therefore dominate the information flow (Barabási 2009; 2016). We calculated the centralization by measuring the sum of all nodes’ degree of centrality divided by the number of nodes.

  2. The level of density: If the centralization of the network is lower than 0.59, the 2nd step is to measure its density. In dense networks, individuals maintain close ties with others and form one or several strongly concentrated communities (Himelboim et al. 2017).

  3. The distribution of connections: If density is high, network modularity is measured, which denotes the connectivity of the whole network.

  4. The share of isolates: In a last step, the share of isolates, meaning the proportion of users without any interaction to other users, is calculated. This is done to distinguish between sparse networks with a few connected communities (clustered), or networks with a large share of isolates and a few clusters (fragmented) (Himelboim et al. 2017).

Figure 1: Network classification process after Smith et al. (2014)


The result: Low centralization, density and share of isolates

Based on the data that we extracted from Twitter, we created a graph comprising 12,603 nodes and 16,704 edges (replies: 2,240; mentions: 5,243; retweets: 9,221) to capture the network structure on gender-specific medicine conversations from January to May 2021. We found that on average, 3.22 interactions on the topic of gender-specific medicine take place between individuals over five months.

AI-powered data analysis

The intelligent use of data can be beneficial for companies in a wide range of areas: from market and customer analysis to cybersecurity and financial risk management. Especially for the analysis of unstructured enterprise data such as text, audio, videos or images, AI-powered approaches are very helpful.

Learn more here!

With regard to the characteristics of the network on gender-specific medicine, we discovered the following:

  1. Network centralization is very low (0.0002103). This indicates that users in the network do not rely on central actors for information. The boundaries and multitude of the groups create knowledge silos, revealing a landscape of different opinions and perspectives on the same topic.

  2. Graph density is low (0.0001052), in fact, far below the threshold of 0.12. The network is thus not only decentralized but also weakly connected which points to a slow and vulnerable information flow. Moreover, users can only be reached through few routes and strongly depend on users that connect them and provide access to the information network. This sparsity highlights missing coordinated activity by official sources or pioneering actors in that field.

  3. Due to the low density of the network, Step 3 was not performed. Instead, we calculated the fraction of isolates.

  4. The share of isolates is low (3.99%).

Therefore, we conclude that the network on gender-specific medicine presents some form of group connectivity, where a few moderately sized communities form around hubs (Himelboim et al. 2017). It is “clustered”, as is shown in Figure 2:

Figure 2: Result of the network classification process after Smith et al. (2014)


This network analysis showed us that discussion of gender-specific medicine on Twitter occurs within and among a few different communities. Communities are highly interconnected and denser than the network overall, which enables easy and fast information exchange between its members. To understand even better, how gender-specific medicine is discussed on Twitter, we decided to take a closer look at these communities.

A step further: Community detection

To further examine the different communities within the network on gender-specific medicine, we applied the ClausetNewman-Moore algorithm using NetworkX (NetworkX 2021). We identified 5 different main communities, all discussing a different topic focus:

  • In Community 1 (251 nodes), individuals critically report on the role of gender in varying realms of life.

  • Community 2 (231 nodes) is characterized by a few broadcasters and their larger audience. Content-wise, the community is concerned with the biological aspects of gender-specific medicine. This can be traced back to hashtags such as #SABV or #SexDifferences.

  • Community 3 (157 nodes) centers around the NGO Women’s Brain Project which advocates for the recognition of sex and gender differences between men and women in mental health and neuro medicine.

  • Community 4 (129 nodes) focuses on cardiovascular diseases in women, with the top 5 users being cardiologists.

  • Community 5 (129 nodes) includes mainly Canadian users, centering around the #WearRedCanada campaign.

Due to these individual groups revolving around very specific subtopics of gender-specific medicine, we can say that the network is characterized by what we call “homophily” – users tend to surround themselves with like-minded people. This further implies that knowledge is structured in silos, with little information variation within the communities.

More articles about the intelligent use of data

How you can learn more about your customers thanks to data? Read it here!

How you can use data to predict product demand? Learn more!

How a data engineer helps companies get data? Listen in now!


By analyzing the way gender-specific medicine is discussed on Twitter, we found that information exchange is limited. This is due to the restricted information circulation, the decentralization and the sparsity of the network. The network on gender-specific medicine consists of individual communities which create knowledge silos and thus a landscape of different opinions and perspectives. These are not exactly ideal conditions for a holistic public discussion about the importance of gender-specific medicine.

Let us therefore return to our initial question: how could this situation be improved? In-deed, there are several measures that e.g., public health managers and marketing agencies from health institutes can take to make Twitter a place of truly beneficial discussion on gender-specific medicine:

  1. Use communities for group-specific information: The fact that there are individual communities can be used to provide group-specific and tailored information to users. For example, Community 3 is concerned with neuroscience and the effect of sex and gender variables on the brain. Here, public health managers could seed in-formation on the topic of cardiology to broaden information diversity for members of that community.

  2. Use influencers to spread information: The data could also be used to map influential users. They could act as multipliers of information as they are able to diffuse it much faster and more efficiently than other users in the network.

  3. Unite content with hashtag: To support coordinated interaction, online marketing agencies from national health institutes could launch a top-down distribution of a single hashtag that bundles gender medicine content on Twitter.

As you can see, analyzing social media data can lead to unexpected insights – be it about people’s conversation behavior regarding a certain topic, as in our example, or about an organization’s target group or customers. From this, you can derive specific measures and recommendations for action.


Barabási, A.-L.2009. “Scale-Free Networks: A Decade and Beyond,” Science, (325:5939), pp.412–413.
Barabási, A.-L. 2016. Network Science, Cambridge University Press.
Clayton, J. A. 2016. “Studying Both Sexes: A Guiding Principle for Biomedicine,” FASEB Journal (30:2), pp. 519–524.
Fruchterman, T. M. J., and Reingold, E. M. (1991) Graph Drawing by Force-Directed Placement, Software: Practice and Experience, 21, 11, 1129–1164.
Gebhard, C., Regitz-Zagrosek, V., Neuhauser, H. K., Morgan, R., and Klein, S. L. 2020. “Impact of Sex and Gender on COVID-19 Outcomes in Europe,” Biology of Sex Differences (11:29), p 1-13.
Hagberg, A., Schult, D., and Swart, P. 2020. NetworkX. (https://networkx.org/documentation/networkx1.10/download.html, /, accessed February 18, 2022).
Himelboim, I., Smith, M. A., Rainie, L., Shneiderman, B., and Espina, C. 2017. “Classifying Twitter TopicNetworks Using Social Network Analysis,” Social Media + Society (3:1), p. 1-13.
NetworkX. 2021. “Networkx.Algorithms.Community.Modularity_max.Greedy_modularity_communities — NetworkX 2.6.2 Documentation,” NetworkX 2.6.2 Documentation.
Regitz-Zagrosek, V. 2012. “Sex and Gender Differences in Health,” EMBO Reports (13:7), pp. 596–603.

Your contact


Data analytics Machine learning

7 habits to shorten the time-to-value in process mining
AI in business Data analytics Machine learning

How can banks become truly AI-driven?
AI ethics AI in business

TechTalk Audio: Responsible AI & ChatGPT