Why Bit Masks?

Publishing is one of those industries where the possibilities are endless. This translates into 'endless combinations' as well. To sell a book you need 4 pieces of information. Those are:

  1. Customer

  2. Book

    1. Format
  3. Sales Rep

Now in the fictional PrintItNow publishing company here are some base stats

  • Number of Customers: 20,000

  • Number of Titles: 100,000

    • Number of Formats: 5
  • Number of Reps: 1,000

The rules are as follows:

  • A rep can be assigned the following:

    • 1 or all customers

    • 1 or all titles

    • 1 or all formats

  • Two reps cannot sell the same book+format to the same customer. That means I can sell the hardcover and you could sell the paperback to the same client.

  • Reps can be realigned at any time across all formats, books, and customers. This realignment must be retroactively applied to all previous sales. -- GOTCHA!

To put that into perspective if I am allowed to sell all of the titles, but hardcover only, to all customers I have a 'territory' of 20,000*100,000=2 billion entries in my sales table. How often do reps get realigned? The two most common times are when a new rep gets hired and takes over a portion of territory from 1 or more other reps or if a rep leaves and that territory needs to be spread out over other reps. No matter what the case all the sales entries (which could be billions) need to be changed. Adding to that is every day you sell more books, so the universe of sales grows, and you publish new books, so the territory grows. I had tried to tackle this problem a few years ago with the use of bit masks, but the functions were not up to the task. With SQL Server 2022, they released two new functions; SET_BIT and GET_BIT.

Now that we have the background of our publishing company, let's get into what a bit mask is.

First, we need a table to manage the reps themselves.

Create table SalesRep
(RepName varchar(256) not null,
RepId int not null)

This gives me plenty of space for reps (just under 2 million for a 4-byte int)

Now you need to have a table that would manage the sales territory

Create table BookRep
(RepId int not null,
Book bigint not null)

The line is concise (8 bytes), but you could have tens of millions or more based upon the combination of what book a rep can sell. For this example, we have every rep able to sell every book and divide it up by customers. In that case it would be 1000*100000=100 million and it will grow for every rep you hire and every book you publish. Here's a better way.

Bit Masks

A bit mask works much like the lockers we had in school. Imagine a set of lockers numbered 0-7. Now let's take students in alphabetical order by first name: Alice, Bob, Jessica, and Joshua. It is important that you pick some ordering system for your data to insure a consistent mapping.

Now let's assign these students to lockers, starting at 0 for the first student, 1 for the second and so on: (MaskPosition = Locker #)

RepNameEmployeeIdMaskPosition
AliceS1230
BobS2521
JessicaS1102
JoshuaS8713

This would be represented in the code below

Create table SalesRep
(RepName varchar(256) not null,
EmployeeId char(4) not null,
MaskPosition int not null) -- new column to represent the assigned position

Now we will need a data type to represent the lockers. If you haven't already guessed, you need a binary data type and you will need 1 byte for every 8, or fraction of 8, sales reps (lockers) you will employ. In our example, we need a binary(1).

The table would look like this.

Create table BookRepMap
(Book bigint not null, --ISBN13 can be represented as a bigint 
 RepMap binary(1) not null)

This will only ever add a record for each Book, and you add 1 byte to the RepMap column for every 8 reps (starting with the first). Now before we end the first article in the series, let me explain just how powerful this is.

If we take 8000 bytes as the maximum row and subtract 8 bytes for the bigint, that leaves us with 7982 bytes for the RepMap column. You get 8 positions for each byte: 7982*8=63,856 total reps(positions) for single row!

In the next installment of this series, I will show the code used to fill this table.