Previous EN TOC View on GitHub PDF EN CS ES PL SK Next EN

Initial spread of the epidemic

Keywords: statistics, data processing, arithmetic mean, geometric mean, regression analysis, GeoGebra

The most famous recent epidemic is the COVID-19 pandemic, caused by the SARS-CoV-2 coronavirus, which broke out in December 2019 in the city of Wuhan in Hubei province in central China.

The beginning of the COVID-19 epidemic in Europe dates back to January 2020. The first confirmed cases were reported in France on January 24, 2020. They were three patients who had recently returned from China, where the epidemic was already in full swing.

The disease then spread to the Czech Republic before March 1, 2020, when the first three cases were confirmed. As of March 18, 2020, 464 cases had already been confirmed in our country.

The beginning of the epidemic is typically modeled exponentially. Only later the spread usually slows down and takes a different progression (linear, logistic, etc.).

Typical scenario of an epidemic

Based on the data, we will try to model the number of infected persons depending on the days since the beginning of the epidemic. The table shows the data describing the number of infected persons depending on the days since the beginning of the pandemic.

Day \((n)\) The number of infected people (\(a_n\))
\(1\) \(3\)
\(2\) \(3\)
\(3\) \(5\)
\(4\) \(6\)
\(5\) \(9\)
\(6\) \(20\)
\(7\) \(27\)
\(8\) \(33\)
\(9\) \(39\)
\(10\) \(64\)
\(11\) \(95\)
\(12\) \(117\)
\(13\) \(142\)
\(14\) \(190\)
\(15\) \(299\)
\(16\) \(384\)

Note. This is real data from the Czech Republic as of March 1, 2020.

Exercise 1. Calculate the ratio of the number of infected people in one day to the previous day.

Solution. If we denote by \(q_n\) the ratio of infected people on the \(n\)th day to the number of infected people on the \(n-1\)th day, we get the following relation for calculation \[ q_n=\frac{a_n}{a_{n-1}}, \qquad n\geq2. \] So we can calculate all the values and complete the table.

Day \((n)\) The number of infected people (\(a_n\)) Ratio of infected people
\(1\) \(3\) \(-\)
\(2\) \(3\) \(1{,}000\)
\(3\) \(5\) \(1{,}667\)
\(4\) \(6\) \(1{,}200\)
\(5\) \(9\) \(1{,}500\)
\(6\) \(20\) \(2{,}222\)
\(7\) \(27\) \(1{,}350\)
\(8\) \(33\) \(1{,}222\)
\(9\) \(39\) \(1{,}182\)
\(10\) \(64\) \(1{,}641\)
\(11\) \(95\) \(1{,}484\)
\(12\) \(117\) \(1{,}232\)
\(13\) \(142\) \(1{,}214\)
\(14\) \(190\) \(1{,}338\)
\(15\) \(299\) \(1{,}574\)
\(16\) \(384\) \(1{,}284\)

The calculated value of the ratio of the number of infected people on one day to the previous day can be interpreted as the speed of the spread of the disease. For a strictly exponential growth or decline, this ratio would be constant. We work only with “measured” data, so the value of the ratio is only approximate. Let’s take a closer look at this ratio.

Exercise 2. Calculate the arithmetic and geometric mean of the ratio of the number of infected. Which one is more appropriate in this case?

Solution. The arithmetic mean \(\bar{q}\) is given by the formula \[ \bar{q}=\frac{q_2+\cdots+q_{16}}{15}=1{,}40731. \] Similarly, for the geometric mean \(G\), \[ G=\sqrt[15]{q_2\cdot \cdots \cdot q_{16}}=1{,}38191. \] It is better to use the geometric mean in this case.

The arithmetic mean can be affected by extreme values ​​(for us, for example, the value \(q_6\)), which can distort the interpretation of average growth or decline. The geometric mean, on the other hand, mitigates this effect.

Moreover, in the exponential case of growth (if we restrict ourselves to the basic type of the form \(k\cdot a^x\)) it is multiplied by a constant factor (i.e. the ratio of two measured values ​​after a fixed interval is constant). The geometric mean directly represents this multiplying nature of the changes. For example, if we have two consecutive measurements with ratios \[ q_1=2, \qquad q_2=0{,}5, \] ​then the arithmetic mean \(\bar{q}=1{,}25\) would indicate stable growth, while the geometric mean \(G=1\) correctly reflects zero net growth.​

Exercise 3. Design a function that would approximate the number of infected people on each day. Create a graph in GeoGebra (or another program).

Solution. Trial-and-error is used to create an exponential function that appropriately represents our data, for example, using calculated averages. We will use the usual notation for writing the function, which is also used in GeoGebra and in the figure below. In the graph, \(x\) denotes the time in days and \(y\) denotes the number of infections.

With the arithmetic mean we get the function \(y=1{,}40731^x\), with the geometric mean \(y=1{,}38191^x\). These functions are quite far from the marked points, so we will slightly modify them: \(y=2\cdot1{,}40731^x\), \(y=2\cdot1{,}38191^x\). The graphs of the proposed functions are shown in the figure. Of course, you can get even more accurate exponential functions.

Figure 1. Proposed exponential functions by the method of trial and error

Exercise 4. After using regression analysis of the data, a more suitable function describing the behavior of the number of infected people can be obtained, which is of the form \(y=1{,}9466\cdot \mathrm{e}^{0{,}3376x}\). Compare your function with this function in GeoGebra (or other suitable software). Calculate the values ​​of the proposed functions for days 14 to 16, rounding the result to an integer. Compare them with the values ​​in the table.

Note. The exponential function obtained by regression analysis can be created using either a spreadsheet or Geogebra. In Geogebra, the points obtained from the spreadsheet must be entered using the following command: RegreseExponencialni({(1,3), (2,3), (3,5),...,(16,384)}).

Solution. All three functions are shown in the figure below. At first glance, we can see that all the functions are similar at first, but for further values ​​of the function \(y=1{,}9466\cdot \mathrm{e}^{0{,}3376x}\) seems to be the best fit for the given data.

Figure 2. Comparison of the proposed functions with the function according to regression analysis

We can also quantify this observation using the data from the table.

Day \((n)\) The number of infected people (\(a_n\)) Number according to the function \(y=2\cdot1{,}40731^x\) Number according to the function \(y=2\cdot1{,}38191^x\) Number according to the function \(y=1{,}9466\cdot \mathrm{e}^{0{,}3376x}\)
\(14\) \(190\) \(239\) \(185\) \(220\)
\(15\) \(299\) \(336\) \(256\) \(308\)
\(16\) \(384\) \(473\) \(354\) \(432\)

We can use the extent to which the individual values differ from each other as an indicator of the accuracy with which our functions represent the given data. Gradually, we get

\[ \begin{align*} |239-190|+|336-299|+|473-384|&=175\\ |185-190|+|256-299|+|354-384|&=92\\ |220-190|+|308-299|+|432-384|&=87. \end{align*} \]

However, it is usually the squares of the differences that are used to assess accuracy. This is because exponentiation reduces small differences (less than 1) and increases large ones even more.

In this case we get \[ \begin{align*} (239-190)^2+(336-299)^2+(473-384)^2&=11691\\ (185-190)^2+(256-299)^2+(354-384)^2&=4174\\ (220-190)^2+(308-299)^2+(432-384)^2&=3285. \end{align*} \] In this case, the third curve also turned out to be the best. This is no coincidence, as the curve created using regression has the smallest error.

Exercise 5. Will the epidemic continue to spread according to the designed function? What might influence its future behaviour?

Solution. In order for the epidemic to continue to grow exponentially, the conditions for the spread of the disease must remain unchanged. In practice, people will start to protect themselves with protective equipment, minimize contact with others or get vaccinated. This will reduce the rate at which the number of infected people increases. The epidemic will reach its peak and the number of infected people will start to decrease. For sustained exponential growth, there would also have to be an unlimited number of individuals who can become infected with the disease.

Literature