The "geodetector" is a new method for identifying patterns in spatial data. While it's been around for a long time, the literature developing/extending it has been very internally-oriented. By this, I mean that it's not really well-integrated broadly into spatial analysis literature.
So, when I try to read into the literature, it puts me off when an āaxiomā (from the new Wang et al. (2024) paper published in the Annals) is not true:
Our axiom is that if X causes Y, then their spatial patterns (spatial stratified heterogeneities) would tend to be coupled, in addition to displaying a significant Pearsonās correlation.
A, spatial patterns aren't equivalent to "spatial stratified heterogeneities"; and B, if X causes Y, it's not true that X is even correlated with Y. This is a really basic point, and you hit it with nearly any introduction to causal inference, usually through the idea of "faithfulness" being a convenient but exceptionally fragile assumption. But, let's repeat this one more time (louder) for the geographers in the back---
It's very easy to create a case where a variable causes an outcome, but is uncorrelated with it. This is not unique to the spatial case, but is easy to illustrate with spatially-patterned variables. The simplest case is the humble mediator variable.
Python:
import numpy, matplotlib.pyplot as plt, seaborn
numpy.random.seed(112112)
X = numpy.arange(100) + numpy.random.normal(scale=5, size=100)
M = numpy.arange(100)[::-1] + numpy.random.normal(scale=5, size=100)
y = X*M + numpy.random.normal(size=100)
We can make these spatially patterned, or we can ignore spatial patterningāeither way, a mediator and a ācausalā variable together can force the correlatoion between the observed and outcome variables to be arbitrarily far from the ātrueā causal effect relating the cause and the outcome.
In this case,Ā XĀ causesĀ yĀ through a mediator variable. However, the mediator āfiltersā the pattern ofĀ X, resulting in an outcome that is quite distinct from the pattern ofĀ X:
Python:
Xm = X.reshape(10,10)
Mm = M.reshape(10,10)
ym = y.reshape(10,10)
f,ax = plt.subplots(1,3,figsize=(8,4))
ax[0].imshow(Xm)
ax[1].imshow(Mm) ax[2].imshow(ym)
ax[0].set_title("X")
ax[1].set_title("Mediator")
ax[2].set_title("Outcome")
plt.show()
You can see this case that the mediator is the inverse pattern ofĀ X, the causal effect we observe. Due to the relationship caused by the causal diagram above, this is akin to a generating function likeĀ \(y=āx^2+x\)Ā where the maximum is aligned with the horizontal axis, towards the middle of the vertical axis. Regardless, this results in zero correlation because of nonlinear structure in the relationship:
Hence, it's pretty frustrating to see an "axiom" that is so simple to falsify. Yes, it's a reasonable inference to make; things that are causally related should be associated with one another. But! We need to have a *full specification* of the system at handāin this mediation example, X caues Y, but not directly; X is a parent of Y.
I think itās pretty important to think through this stuff carefully, rather than making claims to "axioms" which are easily falsified. If we're going to have any clarity on how to do causal inference in geography, we'll need more clarity in this. Indeed, this was a big theme from ourĀ Dagstuhl seminar on spatial causal inference!
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
ā Live Streamingā Interactive Chatā Private Showsā HD Quality
Anya is LIVE right now
FREE
Free to watch ⢠No registration required ⢠HD streaming
Want to draw spatially-correlated values wihtout all the fuss?
I think Iāve figured out a simple way to sample from a spatially-correlated normal distribution without needing to invert the large precision matrix directly.Ā
As Iām having to wind down on summer of code work and ramp up on dissertation & grant work, Iām really digging this strategy for traceplot visualization. The only thing Iād like a little more is a plot of a smoothed moving average in the trace.Ā
By default the PyMC3 traceplot kde plots are horizontal, not vertical. But, if you align the traceplot axis to a vertical kde plot, itās like the traceĀ āfills intoā the cup formed by the kde curve. Not sure why this is so pleasing for me to use to understand whatās going on, but I think it helps me intuitively connect the trace to the distribution if you think of it as pouring its mass into the open kernel density curve. This is especially true for really long chains, where it can be kinda hard to connect the trace visual to a distributional picture.Ā
Anyway, just thought Iād show these pretty well-converged traces from some multilevel linear modeling work Iām moving to as I try to also wind down the rest of the work in my GSOC!
Howard Chang, a former PhD student of mine now at Emory, just published a paper on a measurement error model for estimating the health effects of coarse particulate matter (PM). This is a cool paper that deals with the problem that coarse PM tends to be very spatially heterogeneous. Coarse PM is a bit of a hot topic now because there is currently no national ambient air quality standard for coarse PM specifically. There is a standard for fine PM, but compared to fine PM, Ā the scientific evidence for health effects of coarse PM is relatively less developed.Ā
When you want to assign a coarse PM exposure level to people in a county (assuming you don't have personal monitoring) there is a fair amount of uncertainty about the assignment because of the spatial variability. This is in contrast to pollutants like fine PM or ozone which tend to be more spatially smooth. Standard approaches essentially ignore the uncertainty which may lead to some bias in estimates of the health effects.
Howard developed a measurement error model that uses observations from multiple monitors to estimate the spatial variability and correct for it in time series regression models estimating the health effects of coarse PM. Another nice thing about his approach is that it avoids any complex spatial-temporal modeling to do the correction.
Related Posts: Jeff on "Cool papers" and "Dissecting the genomics of trauma"