Thursday, September 5, 2013

Understanding the Sitecore DB structure

By now you should be familiar with how to work with Sitecore data via the various APIs, be it Sitecore query, fast query, or via Lucene. However, there are times that it helps to have some knowledge of what is actually going on in the underlying Database structure. Specifically, working via the API can be extremely slow, especially in dev environments, where every time you want to make a change to a query, it means restarting your Sitecore app, which can take some time.


In contrast, a DB query can be quickly changed and re-run. When figuring out how to retrieve data, or even more so when debugging an issue, working directly with the DB can be dramatically faster.


The Sitecore Database structure isn’t particularly complicated, so building your own queries isn’t too difficult.


Commonly used Sitecore DB tables


In master/web/core, you can find the following tables (among others) :
Items : Stores all basic Item data. Name, ID, TemplateID, parentID, created/updated timestamps.
History : Shows create/delete/saved history. Comparing this from master to web can help understand when items were published.
VersionedFields/UnversionedFields/SharedFields : Stores basic field data. ItemID, FieldID (which links to Items), Value. Note that the field name exists in the linked Item representing the field, while the value of the field is in these tables.
There’s also a view named “Fields” which is merely a union of these 3 tables.


So, there’s a few uses for this information, and most of the time it comes down to doing things faster directly with the DB, than they would happen via the Sitecore API.


Direct query vs Fast query


For example, you could find items of a specific template, with a specific value contained within the contents of a specific field (very helpful for finding an id in a pipe-delimited list of item ids in a Multilist field, for example) :


select VF.Value from Items I inner join Items T on I.TemplateID = T.ID inner join Fields F on F.ItemId = I.ID inner join Items F on F.FieldId = F.ID where I.templateID = @templateID and F.Name = @fieldName and F.Value like '%' + @fieldValue + '%'

I’ve found that if data isn’t cached, which it usually isn’t in a dev environment, querying the database directly like this is dramatically faster.


In a live environment, I’ve found that a direct query is dramatically faster than a “fast” query.


For example, if I wanted to do a fast query for the case detailed above, for all items under the home item, with template [[templatename]], with [[fieldvalue]] anywhere in a field called [[fieldname]], this is the query that Sitecore runs on your database :


exec sp_executesql N'SELECT DISTINCT [i].[ID] [ID], [i].[ParentID] [ParentID] FROM [Items] [i] LEFT OUTER JOIN (SELECT [Fields].* from [Fields] INNER JOIN [Items] ON [Fields].[FieldID] = [Items].[ID] AND lower([Items].[Name]) = ''[[fieldname]]'') [Fields2] ON [i].[ID] = [Fields2].[ItemId] INNER JOIN [Descendants] ON [i].[ID] = [Descendants].[Descendant] INNER JOIN (SELECT DISTINCT [i].[ID] [ID], [i].[ParentID] [ParentID] FROM [Items] [i] INNER JOIN (SELECT DISTINCT [i].[ID] [ID], [i].[ParentID] [ParentID] FROM [Items] [i] INNER JOIN (SELECT DISTINCT [i].[ID] [ID], [i].[ParentID] [ParentID] FROM [Items] [i] WHERE LOWER([i].[Name]) = ''sitecore'' AND [i].[ParentID] = @value1) [a] ON [i].[ParentID] = [a].[ID] WHERE LOWER([i].[Name]) = ''content'') [a] ON [i].[ParentID] = [a].[ID] WHERE LOWER([i].[Name]) = ''home'') [a] ON [Descendants].[Ancestor] = [a].[ID], (SELECT [ID] FROM [Items] WHERE [TemplateID] = @value3 AND LOWER([Name]) LIKE @value2) [Templates1] WHERE (([i].[TemplateID] LIKE [Templates1].[ID]) and (coalesce([Fields2].[Value], '''') LIKE @value4))',N'@value1 uniqueidentifier,@value2 nvarchar(11),@value3 uniqueidentifier,@value4 nvarchar(38)',@value1='00000000-0000-0000-0000-000000000000',@value2=N'[[templatename’]]',@value3='templateid',@value4=N'%[[fieldvalue]]%'


Just at first glance, you can tell that’s a bad idea. Well, what if I told you the “fast” query takes 18 seconds to run, but the direct query takes just 500ms?


But what about Lucene?


Also, bear in mind that wildcard searches in Lucene can be extremely expensive. So I’ve used this example deliberately. This sort of search, via Lucene, will cripple a content delivery server’s CPU.


Hopefully this post gives you all something to think about. Don’t blindly trust the Data Access Layer of any framework, even Sitecore. Always be sure of what your code is actually doing on a database level, and if necessary, override the behaviour so the results and performance are more in line with what you expect.

I’ve found a combination of Lucene, and direct DB query, to be the best way to balance out the load so that no single server gets overloaded and crashes.

Thursday, August 1, 2013

Sitecore - The deep end

I suppose I should start off introducing myself, and explaining how I got into Sitecore. My name is Bruce Thwaits, I’m 31 years old, married with a 1 year old daughter. I live in Cape Town, South Africa. I’ve been in dev for around 7 years, having worked for banks, investment houses, and even other media companies. I currently work for Primedia Broadcasting, a media company with 4 large radio stations, and a growing network of CMS-driven websites.


The first of those sites was a news site for Eyewitness News (EWN) - the project I have primarily been involved in since day one. The original site was built by a team of 3, including myself, but for the better part of the last year, I’ve been the sole dev on the project, so I’ve had to delve into the depths of Sitecore without much interference. I’ve learned some interesting lessons along the way, and I hope I can share some of them with others, to perhaps save some of you from some of the mistakes I’ve made along the way to where I am now.



Anyone that has ever worked in Sitecore knows that the learning curve can be pretty intense, especially if you are getting into it without co-workers who already have experience and can guide you as you learn.


In this post I'm going to take you through my general learning process from day one to now, and explain how I had to find most of the answers for myself, with little assistance. Even if you’re in a team with more experienced Sitecore developers, I still encourage you to follow these steps to figure your way around Sitecore, as I still think these are vital parts of the process I followed to where I am now, where I feel like I finally have a solid understanding of how Sitecore works.


Google
I can’t express enough how valuable this is to any developer, whether fresh out of school, or a 20 year veteran. Whatever problem you’re encountering, whatever complicated construct you’re trying to build, someone else has done it before. If you find the right words to enter into google, the answer will jump out at you. With a wealth of Sitecore blogs, sometimes you get extremely lucky, and there’s a step-by-step guide on how to do exactly what you need to do.



Poking through the existing Sitecore code
This is one that’s not a natural thing for most developers. You’re used to using 3rd party tools that come packaged in black box dlls that you are sure you should never mess with. Well, that’s not how Sitecore dev works. Most of the time the logic you want to change is deep inside a Sitecore dll, and rather than writing your own logic from scratch, it’s merely a change to one line of code retrieved from a decompiler. If you’re a Sitecore dev and you don’t have a decompiler like ILSpy, dotPeek, or Reflector on your pc, you’re going to struggle more than you need to.


In fact, the best way to look at it, you’re not writing a new app from scratch. Think of it like you’re extending the functionality of an existing solution. Then you’ll be more willing to reverse engineer the existing code, as if it was merely a dev working for the same company as you that wrote the existing code, rather than a mysterious team of devs from another company.





Don’t be scared to experiment
A lot of what you want to do in Sitecore isn’t obvious, and the final solution isn’t what you think it will be. Rather take a step back, break the problem down, and experiment with each piece of the puzzle until you have it all working. You’ll surprise yourself with the solutions you come up with. I recently solved a particularly tricky CMS UI problem by using some custom javascript embedded into a XAML control to call codebeside methods using parameters calculated client-side in the js. If you’re expecting for everything to be nicely packaged into codebehind c#, you might find that issues are harder than they need to be.