Triple Tags: Introduction

Triple tags (another popular name is machine tags) is a way to create very flexible taxonomy of everything without introduction of many entities.

Triple tags mechanic was created by Flickr to classify user’s photos in a different ways and to store this classification as simple as possible.

Minute of Theory

I general, triple tag contains three pieces of data:

  1. namespace. The way to group tags. For example, we want to store geo-location data in tags. We can use “geo”, “location” or something like this as namespace.
  2. name. Name of tag, obviously. As for our example with geo-location, there will be two names: “latitude” and “longitude”.
  3. value. Optional, value of tag. For geo-location, there will be actual coordinates.

In some cases, we don’t need a value at all. “license:MIT” — is absolutely valid triple tag and this string contains enough information to understand that tagged object is licensed with MIT license.

Textual represenation of triple tag looks like this: namespace:name=value. Of course, you can not use symbols : and = in your names or values.

Real World Example

We’re building software for parents to help them raise their kids here in Wachanga. The most important entity of app is Task. User can complete a task by creating a post about it. There are lots of ways to classify the whole bunch of tasks we have:

  1. Task can belong to several groups (groups have different meaning, but this meaning is hidden deeply in code)
  2. Task has category
  3. Task has conditions, who can complete this task: for example, we have tasks only for girls of age 3 months or elder.

Migration of all this stuff to triple tags looks pretty attractive: it can help us to throw away tons of code and simplify many assertions. BTW, we’re only thinking of it now :)

Conditions

Let’s assume we have a task, which should be shown only at autumn for girls elder than 3 months but younger than 1 year.

In terms of triple tags we can encode this condition with something like this: condition:gender=girl, condition:age_min:3, condition:age_max:12, condition:season=autumn.

How to store the data?

First of all, we need the table for possible tags:

CREATE TABLE `triple_tags` (
    id INT(10) NOT NULL AUTO_INCREMENT,
    namespace VARCHAR(255) NOT NULL,
    name VARCHAR(255) NOT NULL,
)

Actually, I’d prefer to add some additional columns to store restrictions for value part: is value required, value type, etc…

The next step, we need to create a table with values:

CREATE TABLE `tasks_tags` (
    id INT(10) NOT NULL AUTO_INCREMENT,
    task_id INT(10) NOT NULL,
    tag_id INT(10) NOT NULL,
    value VARCHAR(255) NULL 
)

That’s it! Now we will consider some examples how to retrieve the data from these tables.

Retrieve the whole list of tags with values

SELECT if(v.task_id, 1, 0)as is_set, t.namespace, t.tagname, v.value 
FROM triple_tags t 
LEFT JOIN tags_values v ON v.tag_id=t.id AND v.task_id=$task_id

This query will return something like this:

is_set namespace name value
0 condition season null
1 condition gender girl

Retrieve the list of tasks for set of conditions

This task is a bit more complicated than it looks: if tag condition:gender is not set for task, it means that child’s gender does not matter and all tasks whould be retrieved.

First step to get this result is to multiply tasks, tags and values into one table:

SELECT t.name, triple_tags.namespace, triple_tags.tagname, IF(v.task_id, 1, 0) as is_set, v.value 
FROM tasks t 
CROSS JOIN triple_tags 
LEFT JOIN tags_values v ON t.id=v.task_id AND triple_tags.id=v.tag_id

Yep, cross join is simple multiplication of tables without conditions. Ok, now we need to get only tasks that fit our our condition set: gender is “girl” or is not set.

SELECT distinct name 
FROM ( 
    SELECT t.name, triple_tags.namespace, triple_tags.tagname, IF(v.task_id, 1, 0) as is_set, v.value 
    FROM tasks t 
    CROSS JOIN triple_tags 
    LEFT JOIN tags_values v ON t.id=v.task_id AND triple_tags.id=v.tag_id 
) as tasks_tag_values 
WHERE namespace='condition' 
AND tagname='gender' 
AND (is_set=0 OR value='girl')

All we need to do now is a bit of profiling and adding some indexes.

Obvious optimization

We can reduce first multiplication by getting only necessary tags:

SELECT distinct name 
FROM ( 
    SELECT t.name, triple_tags.namespace, triple_tags.tagname, IF(v.task_id, 1, 0) as is_set, v.value 
    FROM tasks t 
    CROSS JOIN triple_tags 
    LEFT JOIN tags_values v ON t.id=v.task_id AND triple_tags.id=v.tag_id
    WHERE namespace='condition' AND tagname='gender'
) as tasks_tag_values 
WHERE is_set=0 OR value='girl'
Alex Panshin avatar
About Alex Panshin
Software Engineer from Russia. Interested in PHP, Scala, microservices and Big/Fast Data stuff.
comments powered by HyperComments