The usage of memcache for lowering the load on App Engine Datastore is well known. Here is an approach to versioned caching of datastore entities in App Engine. The Model is deigned primarily for purposes where entity is updated very frequently. The idea for these kind of entity generated while working on the Multiuser Chat room for App Engine using Channel APIs.
Expectations
The Model is designed with the following factors in mind
- Very frequent read/write access to entity
- Effective usage of memcache to reduce load on datastore operations
- Strong consistency between datastore and memcache
Design Plan
The Entities have an internal version number. This version number is increased everytime there is an update in the entity.
from google.appengine.ext import db # The maximum difference in revisions acceptable at any instant # between memcached values and the datastore values. The higher it # is, the greater catastrophe when memcache goes down, but lesser # datastore usage. The lesser it is, the more consistent your datastore # and memcache are, and higher datastore operations. 4~6 FAULT_TOLERANCE = 4 class GlobalVersionedCachingModel(db.Model): """ The Model uses internal versioning of information with prime focus on very high read/writes and consistency. Every entity has a datastore's version number information and the version number from memcache. When the entity is updated, it happens in memcache only and the memcached version number increases. If this number is greater than the datastore version number by a certain amount called "fault tolerance", then the datastor entity is sync'd with the memcache entity. """ _db_version = db.IntegerProperty (default=0, required=True) _cache_version = db.IntegerProperty (default=0, required=True) _fault_tolerance = db.IntegerProperty(default = FAULT_TOLERANCE)
Approach #1
Initially, both the versions are set to 0 when the entity is created. _fault_tolerance is the maximum allowed difference between the cache version and the datastore version. Since the entity is primarily read from and written to memcache, and later updated to datastore, the datastore version number can be behind the memcache version number. When the difference between the two exceeds fault tolerance, then the datastore is updated with the memcache details.
The downside of this approach is that when the memcache goes out, then the datastore entity is fetched. This entity, in worst case scenario, could be lagging from the actual entity by a maximum of _fault_tolerance factor. Fault Tolerance can be decreased to improve the consistency between memcached entity and the datastore entity but that will result in higher datastore read/write operations.
Approach #2
In this approach, instead of directly writing into the datastore, we initiate a task queue and that gets the task done for us. The good part of this approach is that we can keep smaller values of _fault_tolerance and still expect faster processing. This approach is ideal where strong consistency is required between the entities and latency shall also be minimal
Methods
def get2 (keys, **kwargs): keys, multiple = datastore.NormalizeAndTypeCheckKeys (keys) getted_cache = memcache.get_multi (map (str, keys)) ret = map (deserialize_entities, getted_cache.values ()) keys_to_fetch = [key for key in keys if getted_cache.get(key, None) is not None] getted_db = db.get(keys_to_fetch) memcache_to_set = dict ((k,v) for k,v in zip (map (str,keys_to_fetch), map (serialize_entities, getted_db))) ret.extend(getted_db) memcache.set_multi (memcache_to_set) if multiple: return ret if len (ret) > 0: return ret[0] class GlobalVersionedCachingModel(db.Model): """ The Model uses internal versioning of information with prime focus on very high read/writes and consistency. Every entity has a datastore's version number information and the version number from memcache. When the entity is updated, it happens in memcache only and the memcached version number increases. If this number is greater than the datastore version number by a certain amount called "fault tolerance", then the datastor entity is sync'd with the memcache entity. """ _db_version = db.IntegerProperty (default=0, required=True) _cache_version = db.IntegerProperty (default=0, required=True) _fault_tolerance = db.IntegerProperty(default = FAULT_TOLERANCE) created = db.DateTimeProperty (auto_now_add=True) updated = db.DateTimeProperty (auto_now=True) @property def keyname (self): return str (self.key ()) def remove_from_cache (self, update_db=False): """ Removes the cached instance of the entity. If update_db is True, then updates the datastore before removing from cache so that no data is lost. """ if update_db: self.update_to_db() memcache.delete(self.keyname) def update_to_db (self): """ Updates the current state of the entity from memcache to the datastore """ self._db_version = self._cache_version logging.info('About to write into db. Key: %s' %self.keyname) self.update_cache () return super (GlobalVersionedCachingModel, self).put () def update_cache (self): """ Updates the memacahe for this entity """ memcache.set (self.keyname, serialize_entities (self)) def put (self): self._cache_version += 1 memcache.set (self.keyname, serialize_entities (self)) if self._cache_version - self._db_version >= self._fault_tolerance or \ self._cache_version == 1: self.update_to_db () def delete (self): self.remove_from_cache() return super (GlobalVersionedCachingModel, self).delete () @classmethod def get_by_key_name (cls, key_names, parent=None, **kwargs): try: parent = db._coerce_to_key (parent) except db.BadKeyError, e: raise db.BadArgumentError (str (e)) rpc = datastore.GetRpcFromKwargs (kwargs) key_names, multiple = datastore.NormalizeAndTypeCheck (key_names, basestring) logging.info(key_names) keys = [datastore.Key.from_path (cls.kind (), name, parent=parent) for name in key_names] if multiple: return get2 (keys) else: return get2 (keys[0], rpc=rpc)
Similar Readings
- Models Caching at App Engine CookBook
- This article at Cook Book - http://appengine-cookbook.appspot.com/recipe/entity-caching-with-internal-version-for-high-readwrite-ops/
No comments:
Post a Comment