seen from Türkiye
seen from Malaysia

seen from Italy
seen from China

seen from Italy
seen from China

seen from United States

seen from Türkiye
seen from Türkiye

seen from China

seen from Italy
seen from Russia

seen from France

seen from United States
seen from United States

seen from Türkiye
seen from Czechia
seen from Malaysia
seen from China
seen from United States

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Order tickets via Eventbrite: http://phillyyouthmedia-efbevent.eventbrite.com/ On November 9th, PYMC is hosting the Mashed Media Conference. This day-long event will be filled with dynamic speakers, performers, discussion sessions, networking opportunities, media/art making activities and information for both youth producers, adult educators, community leaders, entrepreneurs, non profit organizations and anyone interested in how media arts education can be used to strength youth voices and leadership skills in Philadelphia. Interested in presenting? Read the full call for submissions here: http://tinyurl.com/mashedmediasubmission Bring your innovative ideas, experience, and energy; together, we'll make this an amazing event! Since 2011 Philadelphia Youth Media Collaborative (PYMC) has been convening organizations and youth media makers, facilitating new partnerships and expanding opportunities for educators and youth working in the areas of education, media arts and youth development. Recognized nationally for its organizing work in the field, the Mashed Media Conference will showcase both local and regional examples of innovative and inspiring youth media work. For more information about PYMC, please visit the website:www.phillyyouthmedia.org
Exponential random graph models in python
I have had some recent experience estimating exponential random graph models (ERGMs) in python using PyMC, and I thought it could be useful to put some notes about it up here.
For readers unfamiliar with ERGM, it is a modeling framework for network or graph data. An unfortunate fact of statistical inference on networks is that the independence assumptions of ordinary least squares are violated in deep, complex and interconnected ways (one of the core insights of social network analysis is that whether or not I am friends with you is tightly related to whether or not I am friends with your friends). ERGMs attempt to bring a linear modeling approach to network estimation by assuming that these inherent interdependencies depend mostly on network characteristics that a savvy modeler can explicitly specify. If you have reason to believe that the major network dependencies in your data can be controlled for by reciprocity and transitivity, for instance, you can simply include those terms in your model specification and hope (read: assume) that the rest of your errors are independent. While they are not perfect, ERGMs have become a widely accepted method among social network analysts.
Most often, ERGMs are estimated using the ergm package for the R statistical environment. This is a wonderful package put together by many of the top researchers in the field of network inference. Recently I needed to estimate a model that allowed more low-level control of the modelling than this package allowed however, so I turned to PyMC to see if I could implement ERGM estimation myself. PyMC is an invaluable python package that makes Markov-chain Monte-Carlo (MCMC) estimation straightforward and, importantly, very fast to implement. Getting estimates for pretty much any model in PyMC takes barely any more work than just specifying that model, and it’s well designed enough that writing your own step methods (and even sampling algorithms) in python is a snap.
Gagnon & MacRae prison friendship network
As a sanity check on my work, I decided to estimate the same model on the same data with ergm in R and then with PyMC in python. The data I used is the prison friendship network from Gagnon & MacRae1, which I got from the UCINET IV datasets page. I was only interested in directed friendship ties between inmates, so I discarded any information about the prisoners themselves; you can download my cleaned up adjacency matrix.
The Model
I specify a pretty simple model for the network. I use \(Y\) to refer to the adjacency matrix of the network and \(y_{i,j}\) to refer to the \((i,j)\mathrm{th}\) element of \(Y\). The value of \(y_{i,j}\) is one if prisoner \(i\) listed prisoner \(j\) as a friend, and zero otherwise. The ERGM is specified as
\[\mathrm{Pr}(Y|\theta)\propto\exp\left[\theta'\Psi(Y)\right]\]
where \(\theta\) is our vector of coefficients, and \(\Psi(Y)\) is a vector of network statistics on \(Y\). With ERGMs, \(\Psi\) is where the modeler includes all of the statistics she deems relevant to the network ties. I include a few straightforward statistics:
mutual tie count:
The number of pairs in the network for which both of the directed edges \((i,j)\) and \((j,i)\) exist. (How many pairs of prisoners mutually nominated each other as friends)
density:
The total number of edges present in the graph. (How many total friendship nominations were made)
2-instar count:
The total number of nodes with exactly two incoming edges. (How many prisoners were listed as friends by exactly two other prisoners)
3-instar count:
The total number of nodes with exactly three incoming edges. (How many prisoners were listed as friends by exactly three other prisoners)
2-outstar count:
The total number of nodes with exactly two outgoing edges. (How many prisoners listed exactly two other prisoners as friends)
Estimating in R
Estimating this model in R using ergm is dead simple:
library(ergm) # load the data prisonAdj <- matrix(scan('prison.dat'),nrow=67,byrow=T) # create a 'network' object prisonNet <- network(prisonAdj) # estimate the model prisonEst <- ergm(prisonNet ~ mutual+istar(1:3)+ostar(2))
In the final line there, mutual is the mutual edge count, ostar(2) indicates the 2-outstar count, and istar(1:3) tells ergm to include a term for 1-, 2-, and 3-instars. The number of 1-instars is the same thing as the density.
Estimating in python
Estimating the model in python will require a bit more work. A standard result from ERGMs is that the model can be reduced to a simple logistic regression with a case for each edge in the network. In this case, each covariate represents the change in a particular network statistic if the edge in question were present versus not present. Formally, let \(Y_{i,j}^+\) be the network obtained by setting \(y_{i,j}=1\) in \(Y\), and let \(Y_{i,j}^-\) be the network obtained by setting \(y_{i,j}=0\). For a particular network statistic \(\Phi_k(Y)\) the corresponding change statistic for edge \((i,j)\) is defined as \(\phi_k(Y,i,j)=\Phi_k(Y_{i,j}^+)-\Phi_k(Y_{i,j}^-)\).
So for edge \((i,j)\) the change in the mutual edge count, \(\phi_M(Y,i,j)\) would depend on the value of \(y_{j,i}\). If \(y_{j,i}=0\), then holding the rest of the network fixed edge \((i,j)\) would be unable to change the total mutual edge count, so \(\phi(Y,i,j)=0\). If \(y_{j,i}=1\), then \(y_{i,j}\) would determine whether the edge was mutual and \(\phi(Y,i,j)=1\).
The first step in the python implementation, then, is to construct these \(\phi\) covariates for each of our network statistics.
import scipy as sp from scipy.misc import comb from itertools import product from pymc import Normal, Bernoulli, InvLogit, MCMC,deterministic # functions to get delta matrices def mutualDelta(am): return(am.copy().transpose()) def istarDelta(am,k): if k == 1: # if k == 1 then this is just density res = sp.ones(am.shape) return(res) res = sp.zeros(am.shape,dtype=int) n = am.shape[0] for i,j in product(xrange(n),xrange(n)): if i!=j: nin = am[:,j].sum()-am[i,j] res[i,j] = comb(nin,k-1,exact=True) return(res) def ostarDelta(am,k): if k == 1: # if k == 1 then this is just density res = sp.ones(am.shape) return(res) res = sp.zeros(am.shape,dtype=int) n = am.shape[0] for i,j in product(xrange(n),xrange(n)): if i!=j: nin = am[i,:].sum()-am[i,j] res[i,j] = comb(nin,k-1,exact=True) return(res)
The mutualDelta function takes an input adjacency matrix and returns an indicator of whether the reversed tie is present in the network. This is simply the transpose of the original adjacency matrix. The istarDelta and ostarDelta functions are a little more complicated, but in essence return a matrix that how many \(k\)-instars or -outstars depend on each edge in the network.
From here it is a simple matter to define the model in PyMC.
def makeModel(adjMat): # define and name the deltas termDeltas = { 'deltamutual':mutualDelta(adjMat), 'deltaistar1':istarDelta(adjMat,1), 'deltaistar2':istarDelta(adjMat,2), 'deltaistar3':istarDelta(adjMat,3), 'deltaostar2':ostarDelta(adjMat,2) } # create term list with coefficients termList = [] coefs = {} for dName,d in termDeltas.items(): tName = 'theta'+dName[5:] coefs[tName] = Normal(tName,0,0.001,value=sp.rand()-0.5) termList.append(d*coefs[tName]) # get individual edge probabilities @deterministic(trace=False,plot=False) def probs(termList=termList): probs = 1./(1+sp.exp(-1*sum(termList))) probs[sp.diag_indices_from(probs)]= 0 return(probs) # define the outcome as outcome = Bernoulli('outcome',probs,value=adjMat,observed=True) return(locals())
(This is a little bit kludgey with the renaming of the coefficient keys in the coefs dictionary). The definition of probs contains one important step that defines the probability along the diagonal to be zero, because inmates were not allowed to list themselves.
Finally, we load the data and estimate the model:
if __name__ == '__main__': # load the prison data with open('prison.dat','r') as f: rowList = list() for l in f: rowList.append([int(x) for x in l.strip().split(' ')]) adjMat = sp.array(rowList) # make the model as an MCMC object m = makeModel(adjMat) mc = MCMC(m) # estimate mc.sample(30000,1000,50)
From there PyMC handles sampling and collecting traces—remarkably easy to get use.
All the python code is combined in this python script.
Comparison
We can examine the estimates made by ergm and my PyMC approach to verify that the python version is working. Although the two methods are not formally comparable—ergm approximates the maximum-likelihood estimates while PyMC generates samples from a posterior distribution—they should yield relatively similar point estimates and ranges.
Estimates and 95-percent intervals from ergm (top, orange) and PyMC (bottom, red).
The above figure shows the estimates and the comparable 95-percent intervals for the two methods. For each model term the ergm estimate is on top (with an orange dot) and the PyMC result is on the bottom (with a red dot). In the case of ergm the central dot is the maximum-likelihood estimate of the coefficient and the interval is derived from 1.96 times the estimated standard error. For PyMC the bars indicate the 2.5, 50 and 97.5 percentiles of the MCMC sample. As expected, the results are not identical. The point estimates and the ranges differ noticeably for each term. However both methods produce estimates of the same sign, approximate magnitude, and ‘confidence’. The table below lists the point estimates and standard error/deviation for the methods.
statistic ergm (standard error) PyMC (standard deviation) mutual ties 3.51 (0.34) 3.57 (0.19) density -4.09 (0.33) -4.50 (0.28) 2-instars 0.35 (0.16) 0.47 (0.13) 3-instars -0.06 (0.05) -0.10 (0.04) 2-outstars -0.13 (0.09) -0.06 (0.06)
Conclusion
From these tests we can say that PyMC should provide a straightforward alternative to ergm for estimating exponential graph models. The model I used here was deliberately simplistic and arbitrary—I only needed to verify my work. In this case it probably would have been easier to just stick with the ergm estimates and move on.
But there are situations where ergm won’t work quite so easily. If you need to define a specialized network statistic not included in the package, for instance (I believe there is a way to incorporate user-defined network statistics in ergm, though I’ve never tried). Or you might need to extend the model (as I recently did) to include multiple networks or time-dependence. In each of these cases the PyMC approach is a simple matter of defining the ‘delta’ matrices that indicate how individual edges affect the statistics of interest.
Has anybody else had experience using PyMC for network statistics like this? I’d love to hear how you went about it.
MacRae J. (1960). Direct factor analysis of sociometric data. Sociometry, 23, 360-371↩
My table at A Mashed Art Experience Event!
Inner Power Records wins "Most Philly" award at Philadelphia Youth Media Collaborative (PYMC) Media SLAM! for their video "Forgive and Forget"

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Next youth committe meeting-FRIDAY, feb 1
looking forward to getting to work!
An iPython notebook containing a short PyMC tutorial on Gaussian Processes. by Chris Fonnesbeck
http://nbviewer.ipython.org/2352771