grey banner image

Introduction

COVID-19 is one of the biggest events in the 21st century and it will go on forever in the annals of history. This virus has transformed our habits, as well as how we interact with others and perceive the outside world. We are aware that each country handled the issue differently and responded with varying weights and measures: some governments, who were more concerned about the virus, imposed early mobility restrictions; other governments, who were less concerned, simply began issuing public warnings without taking further action. Particularly at the start of 2020, each decision had a significant impact on how COVID-19 spread and how many people became infected. Consequently, the following is the major question we are attempting to address in this study:

"Did high attention towards COVID-19 in the early stage of the pandemic help people fight it, or is it simply a sign that a bad situation existed in a country when the virus first started spreading?"

Let's start
ADAventur!

Therefore, the true idea is to see whether attention towards covid was an essential tool to fight covid or it was a natural reaction people had when realizing that covid was a dangerous threat.

To find answers on our questions, we use datasets containing mobility and Wikipedia data, as well tweets posted by common and influential people at the beginning of the pandemic in each one of the following countries: France, Denmark, Germany, Italy, Netherlands, Norway, Sweden, Serbia, Finland, Republic of Korea, Canada, and Japan. Differently from Wikipedia, mainly consulted for educational purposes, Twitter is a more dynamic platform, offering the people the possibility to interact and express opinions as soon as they want. All these qualities make it a perfect tool to extract insight about people’s beliefs, thus inferring the situation going on in a country. Since we are focusing on the early stage of the pandemic, data used for each country refers to the 3 weeks preceding the official date of the lockdown (or mobility restrictions) in that country.

What was our reaction?

Firstly, we suggest to investigate how people responded to the COVID-19 pandemic's initial occurrence by examining the pages they searched on Wikipedia and the topics they tweeted about. This analysis not only provides fascinating insights into the behavior of the populace, but it is also essential to start determining which nations were the quickest to realize the true risks brought by the virus.

"Are Twitter and Wikipedia data similar? Can they give interesting insights into how people faced the early stage of the pandemic?"

We start our analysis by applying a topic detection algorithm on tweets posted by “common people'' in each country. To do so, we firstly prepare our data with a careful cleaning (stopwords, emojis, spaces, lowercase, etc….) and then we use the Empath library to get the results we want.

*Please notice that to compare trends of topics in the same plot we applied a min-max scaling on Twitter and Wikipedia data separately.














In general, from the charts above, we can notice that transportation is one of the most discussed topics on Twitter in the analysed countries. This may be related to the early social distancing policies imposed by some governments, or the mobility restrictions put into effect in the countries most affected by COVID-19. Moreover, COVID-19 itself starts to make its way into people's speeches at this very early stage of the pandemic.
Let's compare Twitter results with the visits the same topics received on Wikipedia.

Covid attention is higher on Twitter than on Wikipedia

Wikipedia Covid related pages received very few visits during the period preceding the lockdown (most of these pages were being created in that period). Therefore, combining Twitter and Wikipedia data seems as a good choice, since they complement each other.

Importance of mobility restrictions

While it was quite ignored on Wikipedia, transportation received a lot of attention on Twitter. This may indicate that people began discussing potential issues and limitations relating to mobility, in order to prevent infections and deaths.

Similarity between Twitter and Wikipedia topics

As seen for the majority of the countries in the charts above, health appears to be a topic that is discussed both on Twitter and Wikipedia. Additionally, sports and government are key topics on Twitter and Wikipedia.

Did they warn us?

We notice that the attention shown by a country is not only quantifiable through mere numbers (e.g., number of visits health-related Wikipedia pages received or number of tweets containing the word 'covid'). This is why we are aware that the words people choose to use online can be a useful tool for understanding a country's concerns. Therefore, in order to better understand how each country responded to the exceptional situation, we suggest analyzing tweets from influential people (influencers, politicians, celebrities). Influential people in each nation either used COVID-19 related hashtags and terms to alert the population or they chose to remain silent and disregard the crisis. Did the language they choose have an impact on people's behavior?

We can see from the charts above that, in most of the countries, the topics most discussed by influential people have been government and health (we notice the highest health score in Japan). A very curious case is that of Korea, which is geographically very close to Japan, but whose influential people talked about everything but covid. Notice that, among the influential people we have decided to analyze, there are also politicians. Therefore, for countries showing small health and covid scores, the fact that politicians poorly discussed about COVID-19 may suggest that at the dawn of the pandemic institutions were not yet actively intervening to limit the spread and the consequences of the virus.

Moreover, we note the prominent use of COVID-19-related hashtags (such as #coronavirus and #covid), probably used to raise awareness about the pandemic in the population. It is also interesting to note that in some countries, hashtags of resilience and virtual fight against the pandemic were created: for instance, in Germany, we can observe #usagainstvirus , while in Serbia we notice #stayathome. This suggests that influential people might have attempted to use their popularity in order to convey messages, alerting people about the imminent danger.

"What do we know about different countries? "

#coronADA

Definitely, Italy is a country where influential people deeply cared to express their thoughts and opinions about the COVID-19. This is understandable, since Italy was in an extremely difficult situation during the early stage of the pandemic. Both discussed topics and hashtags can confirm it. A similar trend of discussed topics and hashtags can be seen in Germany, France, Canada, Italy and Serbia. The Netherlands showed less interest in covid and health-related topics, but transportation topic which can be a sign of mobility discussion is very prominent.

Influential people from northern countries (Norway, Sweden, Denmark and Finland) have a great interest in topics related to government, transportation, and health. But sports and technology are still substantial parts of discussed topics.

In Korea most of the used hashtags are related to boy-bands and music, which perfectly reflects the most discussed topics. The vast majority of influential people are related to entertainment, therefore it is more difficult to actually observe these individuals sharing messages about COVID-19. Notice that this is not a biased result; instead, the great attention these people received in that period is useful metadata to understand the situation going on in Korea at the beginning of the pandemic. Finally, in Japan, we do not notice any meaningful hashtags even though COVID-19 seems to be a largely discussed topic. This might be due to computational reasons (remember that each tweet has been translated before being used), but it doesn't affect influential people's interest in health and covid topics.

How was it handled?

Now we know that influential people talked about COVID-19 and that the general population did express interest in this topic. However, we notice there exists a difference between the levels of attention on COVID-19 between countries. Hence, we would like to examine: did higher attention manage to raise awareness about the issue and prevent even more extensive spread, or maybe it was just a mirror of the current situation in a country?

The idea is to quantify the overall interest in COVID-19 (Wikipedia) and its presence in social media (Twitter) in the period that preceded the official lockdowns through a measure we call the attention score . To determine the key topics talked about by influential people on Twitter, we utilize the Latent Dirichlet Allocation (LDA). Thus, we can recognize a topic that is related to the pandemic, and compute the percentage of tokens associated with this topic, used to quantify the presence of COVID-19 on social media. Then we compare the number of virus infections between similar countries with different attention scores and try to draw conclusions based on the results we get.

The period we take into account when we compute the attention score is the same as in the previous analysis: 3 weeks before the lockdown starts in each country, as during this period COVID-19 spread reached its peak across countries. Furthermore, to compare the number of COVID-19 infections, we use 45 days since the start of the lockdown, comparing the number of new infections per day for each country. We selected this interval because at the start of the lockdown, when the coronavirus reaches its peak, the effects of the covid's spread, people's behavior, and attention paid to COVID-19 during the pre-lockdown term are most apparent.On the other hand, if it is too long, the influences of various confounders could strongly impact the results, such as the strictness of the government measures, competence of the countries when handling crises situations, etc.

* Each bubble is related to one topic. By cliking on a bubble, on the right side of plot you will see words related to selected topics and its importance, as well as presentence of that topics in analysed text.

When computing the covid attention score for each country, we rely on two sources:

1. Twitter data of influential people - represents the presence of COVID-19 in the media
2. Wikipedia pageviews - represent the people's general interest in COVID-19

To compute the COVID-19 attention score, we calculate the harmonic mean between the score obtained by LDA and the average Wikipedia covid pageview score. We use the harmonic mean to evenly weigh the influence of each of the percentage scores applied. From the results, we can divide the countries into two groups based on the attention score. Countries with a score less than 1 (low score) and more than 1 (high score).

*Features plot

To have meaningful results from comparing two countries in a pair and reduce the influence of confounders on the result, it is essential to make sure that those two selected countries have similar features. That's why we describe each country with various features (see the plot*) that could depict the ability and manner in which a country deals with the COVID-19 crisis. On top of these features, we add information about the average change of mobility of people during the 3 weeks before the lockdown started for each county to see how people responded to the news that virus is spreading. Note that this is the same period for which we observe the attention and interest in COVID-19.

In the following paired t-tests conducted on the pairs we obtained, we tested the null hypothesis H0 "The average daily infections (in % of population) of low attention country and high attention country is the same".

Finland

Sweden

pvalue=2.541e-07
CI-95%: [-3e-05, -1e-05]

Netherlands

Sweden

pvalue=0.003
CI-95%: [0.0, 2e-05]

France

Canada

pvalue=0.004
CI-95%: [-0.0, 2e-05]

France

Denmark

pvalue=0.012
CI-95%: [-0.0, 2e-05]

Germany

Canada

pvalue=0.102
CI-95%: [-0.0, 2e-05]

Germany

Denmark

pvalue=0.271
CI-95%: [-1e-05, 1e-05]

Germany

Norway

pvalue=3.277e-05
CI-95%: [0.0, 2e-05]

Italy

Denmark

pvalue=5.817e-15
CI-95%: [2e-05, 5e-05]

Italy

Canada

pvalue=1.749e-11
CI-95%: [3e-05, 5e-05]

Results that we obtained from paired t-test and confidence intervals suggest that in most of the cases, countries with more attention towards COVID-19 at early stages of pandemic had more daily infections with respect to their population than those with less attention. Speaking about COVID-19 and giving it a lot of interest and attention was not the way to fight it at the early stages of pandemic, rather we can say that the Internet (in our case identifiable with Twitter and Wikipedia) was used as people's natural response to the imminent danger of the virus. In other words, it could happen that people talked and searched about COVID-19 more because of the general fear and panic of its fast spreading in order to try finding way to protect themselves, than to warn about it and prevent its spread. However, we do have to emphasize that there could be many potentially confounders that we cannot observe or quantify and that influence the results quite significantly. For instance, the mentality of the people of a country could be an interesting such confounder (people of some countries tend to be more rebellious towards governments instructions than others, more lean to panic in crisis, etc.). In addition, the results we got are direct consequence of the parameters and data that we set and obtained manually.

By defining this new measure (attention score) and by looking at human traces on the internet, we were able to dive deep into reactions and behavior induced by the pandemic. Finally, we can conclude saying that high online interest in COVID-19 was not a relevant element in fighting the pandemic, but rather it was an index of reactionary response to the pandemic trend itself.

Conclusion

Internet as a tool to fight pandemic?

Unfortunately, the Internet is not tool a to fight pandemic, but it is an incredible tool to draw insights into what is happening in the country and people's behavior.

I really believe that the virtual world mirrors the physical world.
Marissa Mayer - american businesswoman

Are we learning from the past?

The results seem to show that once again human beings perceive danger only when they are truly within the danger itself and how often we struggle to be forward-looking.

Human nature will not change.
Abraham Lincoln - american lawyer and politician