Recently, I was working on an Application when I decided to use bitwise operations on a multi-valued attribute. I haven’t done any performance benchmarks, but I would think this should improve the performance on making Queries.
Here’s an example: Suppose we want to store Users and their Interests. We should have the following 3 Tables:
- Users – Stores User Information
- Interests – Various Interests
- UserInterests – Stores the Relationship between Users and Interests
Relationship Tables in Database: How it is normally stored
Table 1: Users
Id | Name | Phone |
---|
Table 2: Interests
Id | Name |
---|---|
1 | Movies |
2 | Soccer |
3 | Swimming |
4 | Running |
Table 3: UsersInterests
UserId | InterestId |
---|
Making Queries:
Query 1: Find all users whose interests are either swimming, running, soccer (one or all of them)?
One would query them as:
SELECT * FROM users WHERE Id IN ( SELECT UserId FROM UsersInterests WHERE InterestId IN (2, 3, 4) )
Query 2: Find all users whose interests are swimming AND running AND soccer (All of them)?
Ans: One could query them as:
SELECT * FROM USERS WHERE ID IN ( SELECT USERID FROM UsersInterests WHERE InterestId IN (2, 3, 4) GROUP BY UserId HAVING COUNT(UserId) = 3 )
Improvement: Storing Interests as Powers of 2
If we decide to store InterestId’s as Powers of 2, then we can easily apply bitwise AND, OR operations as each of the interests Ids would be mutually exclusive to each other.
So, we add another field in UsersTable (sort of Cached) called UserInterests as follows:
Table 1: Users
Id | Name | Phone | UserInterests |
---|
Table 2: Interests [Id’s stored as Powers of 2], also I added another column in it’s bit representation
Id | Name | Bit Representation [not added in DB] |
---|---|---|
1 | Movies | 00001 |
2 | Travelling | 00010 |
4 | Soccer | 00100 |
8 | Swimming | 01000 |
16 | Running | 10000 |
Thus, for the following interest combinations, UsersInterests in Users Table could be stored as:
5 // = Movies and Soccer = Bitwise (1 | 4) 13 // = Movies and Swimming and Soccer = Bitwise (1 | 4 | 8)
Making the same Queries now:
Now that we have UserInterests already pre-cached/stored in Users Table, we can do the same queries easily
Query 1: Find all users whose interests are either swimming, running, soccer (one or all of them)?
Ans: One could query them as:
SELECT * FROM users WHERE (UserInterests & 28) != 0 // 28 = (4 | 8 | 16)
Query 2: Find all users whose interests are swimming AND running AND soccer (All of them)?
Ans: One could query them as:
SELECT * FROM USERS WHERE UserInterests = 28
No more nested inner queries. It all happens in one shot!
Conclusions & Limitations
This approach has the major limitation of being able to store a maximum of 32 types of multi-valued attributes or 64 (if bigint) is used. [p>
I still need to do benchmarking to see how much of a performance improvements it offers.