Yet Another Geographer @yetanothergeographer - Tumblr Blog

A short technical post on shapes

I love when I can get it together to work with code about shapes :)

But, tumblr's really bad with math and code! check it out on my website.

#gis #geography #geographic data science

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Causality and the Geodetector

I don't think it does what they think it does.

The "geodetector" is a new method for identifying patterns in spatial data. While it's been around for a long time, the literature developing/extending it has been very internally-oriented. By this, I mean that it's not really well-integrated broadly into spatial analysis literature.

So, when I try to read into the literature, it puts me off when an “axiom” (from the new Wang et al. (2024) paper published in the Annals) is not true:

Our axiom is that if X causes Y, then their spatial patterns (spatial stratified heterogeneities) would tend to be coupled, in addition to displaying a significant Pearson’s correlation.

A, spatial patterns aren't equivalent to "spatial stratified heterogeneities"; and B, if X causes Y, it's not true that X is even correlated with Y. This is a really basic point, and you hit it with nearly any introduction to causal inference, usually through the idea of "faithfulness" being a convenient but exceptionally fragile assumption. But, let's repeat this one more time (louder) for the geographers in the back---

#spatially stratified heterogeneity #spatial statistics #geography #causality

I've liked being 33, but NYT says it sucks:

This podcast came out a year ago, describing why being 33 sucks.

It's not really about the state of being 33, but rather that being a millennial born in 1990-91 in the US puts you at the crest of a relative baby boom---when you're trying to go to college, more people than ever before (or since) are going to college; when you're trying to buy a house, more people than ever before (or since) are trying to buy houses.

You can see this in the US population pyramid below, which records the population size by age of a country. There, you can see that the Millennial peak looks about 32/33 as of the end of 2023. You can *also* see the tail end of the boomers in the 60s, a little peak around 54/55, the next crest of the Zoomer wave around 24.

This macro demographic way of thinking is a really interesting perspective.

By the NYT argument, we should see these peaks coming, and one could, if they were equipped with enough demographic know-how, identify periods of strain/loosening in college admissions or the housing market. Basically, when you see one of these peaks coming, you better build. When you're past the peak, it's time to start drawing things down.

Relative population size by generation is an interesting way to understand the dynamics of housing or higher education (two things near and dear to my heart), but it's clearly not the only thing.

Housing markets everywhere are tight, even when the population pyramids don't show this trend. For instance, take the Irish population pyramid:

It's got a GenX peak among 40-45-year-olds, and then Millennials are missing. Relative populations decline precipitously from ages 40-18: there is no "Millennial crest" in Ireland, but we're certainly getting similar pressures on house prices, higher education places, etc.

Maybe, instead of "Millennials being the problem, even if it's not their fault," we might think instead identify quite a different problem: maybe housing shouldn't be a speculative store of wealth and form the largest passive contributor to an individual's wealth mobility? And maybe, just maybe, we shouldn't expect higher education to hyper-scale?

Just a thought.

#millennials #demography #being 33 has been great!#Youtube

Pushing Past the Competence Ceiling

I’ve been programming for a few years now, and I wonder if this is a common thing, or if this is just me getting a little burnt out.

I started learning to program after living with a Chemical Engineer and a Computer Systems Engineer. Pursuing a dual degree in Geography and Political Science, I would often joke about how I was the “only scientist around.” My training in Political Science was the same as I understand many experiences are at a state school. I realized that I was getting drawn more and more to quantitative work in political geography (stuff by Ron Johnston & John O’Loughlin stands out as initial passions). But, I felt that there was more quantitative modelling that could be done. I was vaguely aware of Simon Jackman’s work, but I didn’t have the quantitative skills to handle it.

This was partially because I spent all of my Political Statistics class making jokes with a close friend. But, that class was also bunk, and I think the teacher (herself a very knowledgeable prof), knew it. The point was to credential the student in basic statistics, which I’d understood since high school. There was no way I could walk out of that course and understand hierarchical linear modelling, or generalized linear regression. To make things worse, after talking with my roommates, I realized that there was no way that I could solve the math problems I wanted to solve by manually entering data into the TI-83 I used in that intro statistics course.

And, at some point it dawned on me: my undergraduate training in statistics gave me tools and skills useful to solve problems that weren’t applicable or interesting to me. Indeed, the tools and skills it gave me were those that could never actually solve the applicable or interesting problems...

That was tough.

Dejected, I wandered around the internet, through a few IRC channels, and ended up starting to learn Haskell. I also started getting into spatial Operations Research in geography, really hitting linear algebra and some basic vector calculus on the way. But, I couldn’t do what I wanted to do in Haskell (or the LP-solvers I started using). The ecosystem for spatial data analysis wasn’t there. I loved using it, but it was way too much work to get to the point where there were things I could use it on.

I ping around R and Python now, mostly trying to keep engaged with the cutting edge, as the furrow I till as an academic gets deeper and narrower. But, that core idea never really left: we should strive to teach undergraduates using tools that are capable of solving the problems they’re interested in—no toy environments, no “at this level, it’s simple enough to do in Excel.”

It’s not quite the two-language problem, but it kind of is---academics have an ethical obligation to students (and society) to teach open, free, and accessible frameworks that are capable of solving real-world problems. No toys, no "the first hit is free."

I'm moving my blog back to Tumblr.

I'm going on sabbatical next year, and I'm finding that it's really tough to update my ljwolf.org blog with content.

This is for a few reasons. First, while I think I'd like to imagine that it's easy to post on a website using git+Hugo, the user experience is absolutely cursed relative to the simple markdown editing experience on tumblr. It's basically impossible to post to my website from mobile using Hugo.

Second, having ljwolf.org hosted separately from blog.ljwolf.org makes it easy to separate talks+research outputs from blogposts, which should be more frequent, less "big", and more tied to what I'm thinking about in the current moment.

Anyway, now I'm back here on (work) Tumblr. Maybe I'll keep going with that other blog too

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

This is an interesting little exploration I did while finishing up my GISRUK data challenge analysis. It's not going into my analysis, but I like the concept, so I'll post it here (cause that's the entire point of this blog!)

I don't really have a good idea for what many places in the UK are like, nor for what the structure of some of this data is when considering its joint structure. So, while my model fits quite well and yields some interesting results, I'm a bit limited because I don't really know what a place like Barrow-in-Furness is like, without looking into it.

In general, it's more difficult to get a sense of what the model's telling me from the conditional estimates because I don't really have a sense of the joint picture: I don't really intuit how they covary across places, like I might in US counties or states.

So, I found myself wanting a kind of "joint" marginal effect, something I could use to work out how my model predictions vary from "places like A" to "places like B" but define those generically, in terms of typical combinations of attributes in my sample.

I started by shifting things linearly along my data's midranges, but this doesn't account for the fact that some attributes may be negatively correlated with other attributes in my design matrix, and so I would expect it to be more typical in my data that Xj increases as Xk decreases, on average. This isn't just a linear shifting using each conditional effect... it's something else.

So, eigenspaces. I strung some code together to:

Grab the sklearn.decomposition.PCA of my model design matrix.

Extract the most relevant dimension.

Sort my data by this dimension and grab the names of observations.

Plot the predicted Brexit % against these names.

Above is the plot of my data's main dimension, the one that explains the most variance in my design matrix. The lines are the predicted % Brexit, observed % Brexit, and "breakeven point," along with the names of places sorted by this dimension on the vertical axis.

Now, I can get a sense of how these types of places (sort of like area profiles) relate to one another in my data. This gives me an idea of what happens when I change from "places like Kensington and Chelsea" to "places like Cornwall," without having to specify the precise covariance structure of my attribute data.

I can slice one dimension off the PCA decomposition, check how varying it changes my model, and see what covariates are related to that dimension.

In a way, gives me the "joint" marginal effect I want: what happens when you move your mean response along many different features, but in a way that reflects how these features covary in your source data.

#statistics #simulation #machine learning #geography #brexit #analysis

I often find myself looking for spaces like those I used to like in Brooklyn or Phoenix

rather than taking what I like about places I find in Bristol at face value.

I guess sometimes it's hard to relinquish that inherent comparative streak.

Regardless, I'm still seeking that best pub that does good coffee and serves beer by 1.

This is a quick exploration/exposition of the fact that estimated substantive effects (fixed effects in H&R's terms) can and sometimes will change in models with spatially dependent error terms when (and only when) the dependence in the error term is collinear with one or more covariates in the design matrix.

Intuitively, this makes sense. In a thought experiment, estimate a regression. Then, introduce a covariate that's collinear with the other covariates. If there's no structure used in the model to keep track of which are the "original" effects and which are the newcomers (e.g. a partial regression structure), then the effect estimates must change, since each is conditional on the other/they're made jointly over the covariates.

#statistics #geography #data science #spatial models

This brexit analysis for the GISRUK competition's making some headway!

That is after, of course, I just decided to focus on other data than data in the competition.

this should probably be a tweet

Modern "nerd" media (rick & morty, bojack, "deep" analyses of pop media like the insufferable "man drones over film clip" channels on youtube or "Scar No One Else Can See" by "Fairy Lives Matter" movie writer Max Landis) & the politically-opposed alt-right adjacent podcast sphere (J. Peterson, Sam Harris, etc) have really demonstrated that the first stanza of Hot Fries is way too real:

talking fast with a dense amount of facile textual references really makes people think you're deep.

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

A brief overview of one way you might extract the new "Amenity Zones" from Google Maps. I was intrigued by their presence, and wanted to see if there was a way to pull them out of the API for a giv...

I've been really interested in these amenity zones Google added to their mapping service ever since I read this blogpost by Justin O'Beirne detailing their rollout.

I'm not so interested about the comparatives with Apple, but down in the Data Alchemy Section, O'Beirne talks about how these pink zones might just correspond to peoples' conceptual notions of where commerce corridors (I've now learned are called "high streets" over here in the UK) are.

I'm willing to bet urban accessibility and desirability is often a function of these "ritz" zones. I know that, when I lived in San Francisco (before these were rolled out), the commercial zone along Cortland down in Bernal Heights was a spot my girlfriend and I frequented.

Jumping forward to when I moved to Brooklyn, I found these pink zones invaluable in judging where I might want to live close to, without ever really targeting any specific amenities within them.

So, I'm curious for a whole lotta reasons. Do they correspond to streetscores? Do they correspond to price-clusters in Airbnb data? Are they really accessible (both temporally and monetarily) to the people who live around them? Is there a way to distinguish them from commute-zones, like the pink clusters around every major intersection in Tempe, AZ?

This kind of algorithmic designation of commerce zones is super interesting to me, and I'd love to keep integrating it into my scholarship.

#geography #urban studies #urban data science #amenity zones

Never thought I'd be complaining about only having 128 GB of memory (half swap).

Not sure where this happens, since I'm just using sparse matrices here. I guess they might get densified somewhere below me & then that overhead allocation doesn't get returned to other processes, it's just claimed by the jupyter kernel.

Which.... means that my two kernels compete for space.

It's neat that this new algorithm is working correctly. Here, a few cluster solutions for spatially-contiguous clustering are shown in Brooklyn using some price data for some Airbnbs. I'm pretty excited about this new technique, so hopefully I'll get it all prepped soon.

I find that the hardest part for me to be motivated about is the stage right after I finish the proof of concept. It's like... once the new stuff is done, I get much less interested than when I'm still trying to figure it out.

I guess that's probably pretty common among researchers, but the thrill of wrestling with a genuinely new problem is interesting, enjoyable, and difficult. Sometimes, it feels like documenting that wrestling is a bit tedious. But, if you don't document it, it essentially never occurred...

#ml #stats #clustering #brooklyn #neighbourhoods #neighborhoods #airbnb #price #economics #geography #gis

I gave a talk at the CARTO spatial data science conference back in December, and it was pretty neat. Check it out!

#data science #brooklyn #NYC #geography #GIS #statistics #neighborhood #neighbourhood

I really liked this paper. It's great when someone so clearly and thoroughly writes about the pitfalls of believing a better procedure will cure inherently theoretical issues with analyses or study design.

A similar version of this comes in from King's "How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About It" from 2014's Political Analysis, and I feel matters in the same way that Geographical Analysis & Spatial Statistics cannot simply be about fixing specifications to account for correlated error.

What's your theory for why the misspecification exists? Why is the error correlated? Is there a way you can directly account for that fact (e.g. competing destinations from interaction modelling) instead of applying a theory-free treatment for it?

I'm not sure if spatial confouding & corrections are necessarily as fundamental as the principles behind RCT, but I'm sure that, as statistics struggles to define itself alongside (or inside of) data science, spatial stats will have to carve a similar niche.

#papers-2018

•18+ Adults Only

Watch Anya Live on Cam

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.

✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions

Free to watch • No registration required • HD streaming

Hacking out the "Sociological [Statistical] Gobbledegook" in Partisan Gerrymandering

Here’s the first of a few blogs I hope to write at the end of my dissertation synthesizing (in as plain language as possible) the work I’ve done and some insight for the layperson on what to think and how to trust these partisan gerrymandering statistics.

I hope to keep them involving statistics & computation, so that you, too, can explore this stuff if you program.

If you don’t (or aren’t too comfortable with math), I still aim to keep the main text accessible. So, skip the equations & distributions if they’re too much :)

#partisan gerrymandering #geography #gis #python #politics #gerrymandering #advantage #partisan bias

I forgot how killer the builtin matplotlib backends are. I've been (foolishly) writing each image to file and viewing using a cli viewer.

Instead, why not just use their frontends and plt.ion()? Unix + tiling WM is truly a system-wide IDE, but I never learned the ``old'' tooling for working with matplotlib at the cli.

The first time I learned matplotlib I learned it in a notebook. Now that I'm wedded to tiling & strongly prefer the terminal ipython to the notebook, I've gotta learn all the older tooling for matplotlib.

Trending Blogs

Last Seen Blogs

Yet Another Geographer