MySQL Tip: Using Bitwise Operations for Multivalued Attributes

Recently, I was working on an Application when I decided to use bitwise operations on a multi-valued attribute. I haven’t done any performance benchmarks, but I would think this should improve the performance on making Queries.

Here’s an example: Suppose we want to store Users and their Interests. We should have the following 3 Tables:

  1. Users – Stores User Information
  2. Interests – Various Interests
  3. UserInterests – Stores the Relationship between Users and Interests

Relationship Tables in Database: How it is normally stored

Table 1: Users

Id Name Phone

Table 2: Interests

Id Name
1 Movies
2 Soccer
3 Swimming
4 Running

Table 3: UsersInterests

UserId InterestId

 

Making Queries:

Query 1: Find all users whose interests are either swimming, running, soccer (one or all of them)?

One would query them as:

SELECT * FROM
users WHERE
Id IN (
  SELECT UserId FROM
  UsersInterests WHERE 
  InterestId IN (2, 3, 4)
)

Query 2: Find all users whose interests are swimming AND running AND soccer (All of them)?

Ans: One could query them as:

SELECT * FROM USERS
WHERE ID IN
(
  SELECT USERID FROM UsersInterests
  WHERE InterestId IN  (2, 3, 4)
  GROUP BY UserId
  HAVING COUNT(UserId) = 3
)

Improvement: Storing Interests as Powers of 2

If we decide to store InterestId’s as Powers of 2, then we can easily apply bitwise AND, OR operations as each of the interests Ids would be mutually exclusive to each other.

So, we add another field in UsersTable (sort of Cached) called UserInterests as follows:

Table 1: Users

Id Name Phone UserInterests

Table 2: Interests [Id’s stored as Powers of 2], also I added another column in it’s bit representation

Id Name Bit Representation [not added in DB]
1 Movies 00001
2 Travelling 00010
4 Soccer 00100
8 Swimming 01000
16 Running 10000

Thus, for the following interest combinations, UsersInterests in Users Table could be stored as:

5  // = Movies and Soccer = Bitwise (1 | 4)
13 // = Movies and Swimming and  Soccer  = Bitwise (1 | 4 | 8)

Making the same Queries now:

Now that we have UserInterests already pre-cached/stored in Users Table, we can do the same queries easily

Query 1: Find all users whose interests are either swimming, running, soccer (one or all of them)?

Ans: One could query them as:

SELECT * FROM users 
WHERE (UserInterests & 28) != 0  // 28 = (4 | 8 | 16)

Query 2: Find all users whose interests are swimming AND running AND soccer (All of them)?

Ans: One could query them as:

SELECT * FROM USERS
WHERE UserInterests = 28

No more nested inner queries. It all happens in one shot!

Conclusions & Limitations

This approach has the major limitation of being able to store a maximum of 32 types of multi-valued attributes or 64 (if bigint) is used.

I still need to do benchmarking to see how much of a performance improvements it offers.