Tim
Life is good
Joined: Nov 2007
Location: Kalamazoo
Changing the tag cloud
Maximizing it's usage, by displaying less?
It dawned on me the other day that the tag cloud is in serious need of being updated. As more and more content is being added, more and more tags are being used, so the tag cloud is only displaying a small subset of the tags used on the site. So here's my plan to change it.
Currently there are two steps to generate the data in the database. Issues will be counted and discussed later on in the post. Step one is a database View (problem 1) that selects the top 60 tags (problem 2) and orders them by "Weight" (problem 3). Wow, 3 problems and I'm not even half done!
Here's the view:
A stored procedure is then called and executes the following code to use that view (problem 3-4):
SELECT *
FROM vwTopTags
ORDER BY Tag
The primary problem here is that I'm doing a SELECT *. This is by far the worst problem. Though it's also how I Handel the problem in the business layer that help with this. But I'm not going to go into that here. First let me explain some of the issues. First of all using a view is all well and good, but it can be simplified to a single stored procedure. Selecting the TOP 60 is not a huge deal but should have been done in the stored procedure.
The main problem using this method is the Weight column. Here I'm actually incrementing a counter every time a tag is used. The counter is then decremented every time a file is deleted if it's used that tag. If I ever get around to deleting tags the same would apply. What I should be doing is using something like this:
Additional problem 3 and 4 are that I'm using two different items to do the job of one and that I"m doing the SELECT *.
So here's how I'm going to fix the problems. The fix really is as simple as combining the two items together and getting rid of the weight column.
SELECT COUNT(1) [Weight], Tags.TagId, Tag, TagDate
FROM Tags
INNER JOIN UsedTags ON UsedTags.TagId = Tags.TagId
WHERE TagType = 0
OR TagType IS NULL
GROUP BY Tags.TagId, Tag, TagDate
By doing it this way I'm not really doing to much to improve the overall performance of the tag cloud. But I am saving a lot of process time by not having to increase and decrease that counted every time there is a new tag used. But I think the single stored procedure with a join and count operation is going to be cleaner and ultimately less overhead than multiple updates to a large table.
Now on to the next issue; the ever increasing number of tags and the lack of rotation in the tag cloud. Based on the current cloud there are a lot of heavy hitters that will be big and gold forever. That's why I'm thinking to only count the tags that were used in the last X number of days.
SELECT COUNT(1) [Weight], Tags.TagId, Tag, TagDate
FROM Tags
INNER JOIN UsedTags ON UsedTags.TagId = Tags.TagId
WHERE DATEDIFF(DAY, UsedDate, GETDATE()) <= 72
AND (TagType = 0 OR TagType IS NULL)
GROUP BY Tags.TagId, Tag, TagDate
Of course X would be definable by me, but set at something like 180 days. So if the tags not been used in that long it may just drop from the list.
Now it's time for some bench marking to see what one takes less time.