App Engine: How to do an efficient OR query?
Let's say you want to find all of the current user's facebook friends who are using your app. You've stored the facebook id of each connected user on a User entity and the Facebook API returns to you a large list of facebook ids of the friends of the current user. To look them up in your datastore you basically need a giant OR or IN query on the User entity. Here's a traditional SQL example of how you might do this:
SELECT * FROM User WHERE facebook_id = '100213' OR facebook_id = '104413' OR facebook_id = '150273' ... etc.
The way the App Engine documentation says to query for multiple possible values is to use an IN query. Example:
employees = User.gql('WHERE facebook_id IN :1', ['100213', '104413', '150273'])
The problem with this is that it quickly becomes slow and maxes out at 30 values. The reason is that behind the scenes it's actually running an equals query for each possible value. This of course won't work with thousands of facebook ids you might need to look up.
One way to handle this efficiently in App Engine is to create a separate entity in your datastore for just facebook id lookups. You can call it something like FacebookIdIndex. You will need a FacebookIdIndex entity for each User that has an assigned facebook_id. Here's what the model might look like:
class FacebookIdIndex(db.Model) user = db.ReferenceProperty(User, required=True)
Then, when you create each FacebookIdIndex you set the key_name to the facebook id and set the reference property of the corresponding app User entity. Like so:
facebook_index = FacebookIdIndex(key_name=str(facebook_id), user=user) facebook_index.put()
This creates an an entity with the key name of the facebook id and one property with a reference back to the User entity we care about. (NOTE: Make sure you convert the facebook_id to a string just in case you get it as an integer. Key names can not be integer values.)
Now, to look up all Users who are friends with the current user in your app, you can do a bulk key name lookup. This is extremely fast compared to separate queries and there is no limit on the number of key names you can lookup. Given a list of facebook ids, here's how it's done:
# Get all matching FacebookIdIndex's in one quick bulk key name lookup. indexes = FacebookIdIndex.get_by_key_name(facebook_ids) # Now get a list of all of the User entity keys that match the facebook ids. user_keys = [] for index in indexes: # Check if a match. The bulk get by key name will return None for non matches. if index: # Get the key of the User entity we want. user_key = FacebookIdIndex.user.get_value_for_datastore(index) # Check if entity exists just to make sure User entity still exists. if user_key: user_keys.append(user_key) # If any users matched, bulk get all of them. if user_keys: users = User.get(user_keys) # Found all friends of the current user using the app! return users
The get_value_for_datastore() call is used to check for non matches and also to prevent App Engine from making individual get calls for each User entity. We want to delay the get for each User entity for the bulk get() call, which is much faster than doing individual gets for each User entity.
I hope that all makes sense. We've now gone from multiple slow GQL queries to two quick bulk get calls.