Anyone that has spent even a small about of time on any form of website development knows that scalability is always a massive concern. Anything you can do to bring down CPU usage and I/O as your number of concurrent users increases, for example, is of great value.
A website running on the Sitecore CMS is no different. CPU load from work such as Lucene searches can quickly add up. Unless you make a concerted effort to keep this CPU load down, you quickly hit a hard cap of how many concurrent users you can handle, and your site crashes.
In the event that you have multiple content delivery servers serving your website, via some sort of load balancing strategy, you’ll find that your frontends end up duplicating work. For example, if a new item is published, which causes Lucene indexes to update, and this changes the potential result of a Lucene search that populates some part of your site, each frontend would have to clear their HTML cache, and perform the Lucene search to obtain this new result.
In cases like this, it would be preferable to have only one of the content delivery servers do this work, and place it in a cache where the other servers can look for it before duplicating the work.
The memcached shared cache framework works amazingly for this sort of sharing. The memcached service (https://code.google.com/p/memcached/) handles the caching of serialised data, and the Enyim Memcached library (https://github.com/enyim/EnyimMemcached) allows inserting into and retrieving from this cache within .Net code.
It’s worth noting that the memcached server is happiest if running in a Linux environment, and the stable .Net implementations I’ve found thus far have all ended up being a few versions behind their Linux counterparts, and take some effort to get working. So while I have been able to get it to work as a windows service, it’s easier (and perhaps even recommended) that it be run on a cluster of Linux servers instead.
Memcached is essentially a very simple key-value store, similar to the ASP.Net cache, with a few small (but important) differences :
1. Memcached can only store data that can be serialised, either as binary or text. Do not be fooled into thinking that any binary data can be serialised. This includes Sitecore Item objects.
2. Memcached is independant of ASP.Net and IIS, and as such is not cleared when an app pool recycles, or IIS is reset.
3. By having a cluster of memcache servers, shared between a number of application webservers, the cache is shared between those frontends, rather than each having their own isolated (and usually duplicated) cache. Also, the RAM requirement is split between all of these servers, rather than being multiplied by the number of servers. So for example, in a cluster of 5 Sitecore Content Delivery frontends, with 5 memcache servers, with a requirement for 10GB of cached data, each server will store 2GB of cache, rather than the full 10GB. This is where memcache is dramatically superior to the built-in Sitecore caches, and the ASP.Net cache.
The primary issue is the serialisation of data. The point here is to figure out exactly what you’re trying to cache. What exactly are you trying to save on? Do you really need the exact Sitecore Item object, or is the intention more to save on the CPU time required to do Lucene searches? Sitecore’s Item cache works extremely well to cache the underlying DB Item data, so generally DB I/O isn’t an issue. I’ve spent a great amount of time over the last year or so trying to solve the issue of Sitecore Item serialisation, and I eventually came to the conclusion that while it should be technically possible, the performance gain from it would not be enough to make the stability risk worthwhile.
So, you don’t need to cache the Sitecore Item objects. What you need instead is a representation of them, stored in text, that you can use to retrieve them later. Essentially, you need to know what Sitecore DB they came from, and an Item ID. The Sitecore Item URI contains both of these, so you just need to store the value in the Uri property of the Sitecore Item class. In fact, it’s this value that is stored in Lucene indexes, and is exactly how Sitecore retrieves items from the DB, as Lucene indexes do not store the raw item data :
StoreRawObjectInCache(key, item.Uri);
To retrieve the item from this value on the other side, you just need to take the URI object from memcache, get the database, using the Database property, and then call GetItem, passing in the DataUri variant of the ItemUri :
Sitecore.Data.ItemUri uri = (Sitecore.Data.ItemUri)GetRawObjectFromCache(key);
if (uri != null)
{
Sitecore.Data.Database db = Sitecore.Data.Database.GetDatabase(uri.DatabaseName);
if (db != null)
{
Sitecore.Data.Item item = db.GetItem(uri.ToDataUri());
if (item != null)
{
return item;
}
}
}
return null;
You can also use this for your Lucene hits, which is also mostly just a representation of Sitecore Items.
You can also use this for your Lucene hits, which is also mostly just a representation of Sitecore Items.
You can also expand this to store collections of items, which is how you can cache the results of Lucene queries.
Once we set this up in our Content Delivery environment, we immediately noticed a drop in CPU usage across all CD servers, as well as a drop in RAM usage. This allowed us to become a lot more aggressive in terms of what we allow our code to store in memcache (due to all the extra available RAM), which in turn led to further performance gains.
No comments:
Post a Comment