How I Attempted to Mathematically Model the Spread of Memes on the Internet, pt 3 (The Final Chapter)
Welcome back! If you have not read the earlier parts of this series, you can find them here:
Part 1Ā (Intro to Markov Chains)
Part 2Ā (Setting Up The Model)
This will be the last part of this series and will primarily involve ~actually~ using the model. Finally, amirite?
Things Iām Going To Touch On:
First, letās recall the equations we obtained in the last post.Ā
We can figure out the long term behavior of this I(t)Ā by taking the limit as tĀ approaches $\infty$. Letās note that since $\alpha$, $\beta$ and $\gamma$ are less than 1, the exponential terms will go to zero when taking the limit. So,Ā
What does this mean in context?
This $\frac{\gamma}{\beta+\gamma}$ tells us the probability of someone being in the infected state in the long term (that is, after the initialĀ āoutbreakā of the meme). We can also verify this from our steady state distribution, but Iāll leave that as an exercise.Ā
Additionally, I think itās interesting to note that this long term value does not depend whatsoever on $\alpha$. This means that the probability for moving from susceptible to infected has noĀ effect on the long term probability of being in the infected state.Ā
I obtained my data from Google Trends. Hereās what I did:
First, I looked up a particular meme as a search term on Google Trends and got a spreadsheet of popularity values. Google plots the data so that the search termās peak popularity is given a value of 100 and then the rest is scaled in relation to that value. Since Iām concerned with probabilities, I canāt use the values given by Google. So, I scaled the values down so that they were between 0 and 1.
Now that I have my adjusted values, I want to try and fit my I(t)Ā equation to them. I did this by performing regression analysis using SageMath to find the $\alpha$, $\beta$, and $\gamma$ values that best fit I(t)Ā to the data (Will provide my code upon request!).Ā
Below is the Google Trends graph for ādat boiā. I do want to note that each data point obtained from this graph is for a one week time span.Ā
And hereās a graph of the best fit I(t)Ā (*drumroll*):
This is awesome! Look how closely I(t)Ā matches the actual data. In this case, the $\alpha$, $\beta$, and $\gamma$ that best fit I(t) are:
$\gamma \approx$ = 0.03285
We can also see that $\frac{\gamma}{\beta+\gamma}Ā = 0.09$.
I performed this analysis with 10 different memes/search terms. Iāll present the results in the table below, just listing the $\alpha$, $\beta$, and $\gamma$ values.Ā
Quick note: those terms with an asterisk had data points that were taken daily, those with a cross were monthly, and those with nothing were taken weekly.
I included āFlint, Michiganā because although it isnāt a meme,Ā it does follow the same viral pattern that the memes do. This is due to the increased media coverage due to the Flint Water Crisis. I do want to stress that I am in no way trying to diminish the significance of the Flint Water Crisis or the struggle that those involved are facing, but rather I want to show that this model can be applied to more things than just memes.Ā
Itās neat that we can find these probabilities with memes that have already died down from theirĀ āoutbreaks,ā but is there a way we can try and predict the $\alpha$, $\beta$, and $\gamma$ values for up and coming memes? This brings us to the next section of this post.Ā
I want to try to predict the $\alpha$, $\beta$, and $\gamma$ values for a particular meme given an initial set of data points. Letās assume that we have the data points up to and including the peak popularity. Weāll talk about why we need to assume this in a little bit.Ā
Letās start by attempting to predict $\alpha$. Briefly recall that $\alpha$ is the probability of moving from the susceptible state to the infected state. I know that $\alpha$ has to do with how quickly the meme rises in popularity. Because of this, I approximate $\alpha$ using slopes in the following process:
Take the set of points up to and including the peak point.
Find all of the slopes between consecutive pairs of points.
Take the maximum over all these slopes. This is the estimated $\alpha$.Ā
This process is why we have to assume the peak is included in our data set. If we didnāt have the peak, we wouldnāt know if there was a slope that would be higher than the maximum we found over the points we have.Ā
Using this method over all 10 examples gives the following results for $\alpha$.
This seems promising at first, but with a closer look, the differences for some of the $\alpha$ values are pretty large. Is there a way to make it an even better approximation?
Using SPSS, I performed various regressions with this estimated $\alpha$ as the independent variable and the actual $\alpha$ value obtained from the best fit I(t)Ā as the dependent variable. See the table below:
Letās quickly note that for every regression performed, the p-value is < 0.05, and so the relationship between $\alpha$ and our approximation of $\alpha$ is statistically significant (via F-Test of Overall Significance). We want to pick the equation that gives us the highest R-Square value. The quadratic and the cubic equations have the same R-Square value, so Iāll pick the quadratic just so I have one less term to deal with.Ā
where $\alpha_{est}$ is that largest slope value.Ā
Using this new way to estimate $\alpha$, we find our new approximate $\alpha$ values by plugging in the maximum slope the equation above:
We can see that the differences between the estimated and the actual $\alpha$ values are much smaller than in the previous estimate! The bold values are those that have decreased significantly.
Now we have a pretty nice approximation for $\alpha$!
Can we do the same for $\beta$?
Letās try! Since $\beta$ is the probability of moving from the infected state to the recovered state, we canāt really use the slopes as an approximation. Instead, weāll perform a regression analysis with $\alpha$ as the independent variable and $\beta$ as the dependent variable to see if thereās a connection.Ā
In this case, we will pick the cubic equation. Thus,
Letās look at how this approximation compares to the actual $\beta$ values obtained from this model.Ā
The average difference between the expected and the actual $\beta$ with this approximation is 0.06214. I believe itās safe to say that provided we have an accurate $\alpha$ value, we can obtain an estimated $\beta$ value that is somewhat close to the actual $\beta$.Ā
Of course, there is probably a method that will better predict $\beta$, but for now, this is the best weāve got.
Well, hereās where we have problems. Unfortunately, there is no significant relationship between $\alpha$ and $\gamma$ or $\beta$ and $\gamma$, as determined by performing various regressions. Because of this, we canāt use the same methods of approximation that we used for $\alpha$ and $\beta$. This brings me to the final section of this post.Ā
Weāve done a lot with this already, but there is always more to be done! There are several things that my model does not account for:
If a meme has two peaks instead of just one (i.e. it died and then became popular again)
Extremely sharp increase followed by an extremely sharp decrease (say, in the matter of a couple days)
If $\beta + \gamma = \alpha$ or if $\beta + \gamma > 1$. This is because of issues with our I(t)Ā equation. $\beta + \gamma Ā = \alpha$ gives a denominator of zero and $\beta + \gamma > 1$ makes $1-\beta-\gamma$ negative, which causes issues when raising it to t.Ā
Problem 3 is an issue because initially our Markov chain has no restrictions on what $\alpha$, $\beta$, or $\gamma$ could be, yet when I come up with the I(t) equation, we somehow have these problems. Itād be nice to have a way to circumvent this issue!Ā
Additionally, we still cannot predict what $\gamma$ will be, and until we can do so, we canāt create an equation given just an initial data set. Finding a way to $\gamma$ is the first thing on the list for future work.Ā
These are all things to consider more in the future, and Iād (of course) love to hear any ideas on how to combat these issues.Ā
As always, if there are any questions/comments/concerns or even suggestions for another post, please feel free to send an ask!
Hope all is well and (as always) stay positive!