Triple Tags: Introduction
Triple tags (another popular name is machine tags) is a way to create very flexible taxonomy of everything without introduction of many entities.
Triple tags mechanic was created by Flickr to classify user’s photos in a different ways and to store this classification as simple as possible.
Minute of Theory
I general, triple tag contains three pieces of data:
namespace
. The way to group tags. For example, we want to store geo-location data in tags. We can use “geo”, “location” or something like this as namespace.name
. Name of tag, obviously. As for our example with geo-location, there will be two names: “latitude” and “longitude”.value
. Optional, value of tag. For geo-location, there will be actual coordinates.
In some cases, we don’t need a value
at all. “license:MIT” — is absolutely valid triple tag and this string contains enough information to understand that tagged object is licensed with MIT license.
Textual represenation of triple tag looks like this: namespace:name=value
. Of course, you can not use symbols :
and =
in your names or values.
Real World Example
We’re building software for parents to help them raise their kids here in Wachanga. The most important entity of app is Task
. User can complete a task by creating a post about it. There are lots of ways to classify the whole bunch of tasks we have:
- Task can belong to several groups (groups have different meaning, but this meaning is hidden deeply in code)
- Task has category
- Task has conditions, who can complete this task: for example, we have tasks only for girls of age 3 months or elder.
Migration of all this stuff to triple tags looks pretty attractive: it can help us to throw away tons of code and simplify many assertions. BTW, we’re only thinking of it now :)
Conditions
Let’s assume we have a task, which should be shown only at autumn for girls elder than 3 months but younger than 1 year.
In terms of triple tags we can encode this condition with something like this: condition:gender=girl
, condition:age_min:3
, condition:age_max:12
, condition:season=autumn
.
How to store the data?
First of all, we need the table for possible tags:
CREATE TABLE `triple_tags` (
id INT(10) NOT NULL AUTO_INCREMENT,
namespace VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
)
Actually, I’d prefer to add some additional columns to store restrictions for value
part: is value required, value type, etc…
The next step, we need to create a table with values:
CREATE TABLE `tasks_tags` (
id INT(10) NOT NULL AUTO_INCREMENT,
task_id INT(10) NOT NULL,
tag_id INT(10) NOT NULL,
value VARCHAR(255) NULL
)
That’s it! Now we will consider some examples how to retrieve the data from these tables.
Retrieve the whole list of tags with values
SELECT if(v.task_id, 1, 0)as is_set, t.namespace, t.tagname, v.value
FROM triple_tags t
LEFT JOIN tags_values v ON v.tag_id=t.id AND v.task_id=$task_id
This query will return something like this:
is_set | namespace | name | value |
---|---|---|---|
0 | condition | season | null |
1 | condition | gender | girl |
Retrieve the list of tasks for set of conditions
This task is a bit more complicated than it looks: if tag condition:gender
is not set for task, it means that child’s gender does not matter and all tasks whould be retrieved.
First step to get this result is to multiply tasks, tags and values into one table:
SELECT t.name, triple_tags.namespace, triple_tags.tagname, IF(v.task_id, 1, 0) as is_set, v.value
FROM tasks t
CROSS JOIN triple_tags
LEFT JOIN tags_values v ON t.id=v.task_id AND triple_tags.id=v.tag_id
Yep, cross join
is simple multiplication of tables without conditions. Ok, now we need to get only tasks that fit our our condition set: gender is “girl” or is not set.
SELECT distinct name
FROM (
SELECT t.name, triple_tags.namespace, triple_tags.tagname, IF(v.task_id, 1, 0) as is_set, v.value
FROM tasks t
CROSS JOIN triple_tags
LEFT JOIN tags_values v ON t.id=v.task_id AND triple_tags.id=v.tag_id
) as tasks_tag_values
WHERE namespace='condition'
AND tagname='gender'
AND (is_set=0 OR value='girl')
All we need to do now is a bit of profiling and adding some indexes.
Obvious optimization
We can reduce first multiplication by getting only necessary tags:
SELECT distinct name
FROM (
SELECT t.name, triple_tags.namespace, triple_tags.tagname, IF(v.task_id, 1, 0) as is_set, v.value
FROM tasks t
CROSS JOIN triple_tags
LEFT JOIN tags_values v ON t.id=v.task_id AND triple_tags.id=v.tag_id
WHERE namespace='condition' AND tagname='gender'
) as tasks_tag_values
WHERE is_set=0 OR value='girl'