All posts
Published in mysql

Twitter snowflake approach is cool

Profile image of Atakan Demircioğlu
By Atakan Demircioğlu
Fullstack Developer

I was researching a solution to generate unique IDs and I liked the Twitter snowflake approach. These are my notes about this approach.

Twitter snowflake approach is cool image 1

What is Twitter’s snowflake approach?

It is a solution to generate unique IDs in distributed systems. Twitter uses this approach in Tweets, DM’s, Lists and etc.

  • IDs are unique and sortable
  • IDs include time. (ordered by date)
  • IDs fit 64-bit unsigned integers.
  • Only numerical values.

Sign bit (1 bit): Reserved bit (It is always 0). This can be reserver for future requests. It can be potentially used to make the overall number positive.

Timestamp(41 bit): Epoch timestamp in a millisecond (Snowflake’s default epoch is equal to Nov 04, 2010, 01:42:54 UTC)

Machine ID(10-bit): accommodates 1024 machines

Sequence number(12-bit): It is a local counter per each machine and increments by 1. The number reset to 0 in every millisecond. Theoretically, a machine can support a max of 4096 (2¹²) new IDs per second.

Advantages & Disadvantages of the Twitter Snowflake Approach

  • It is 64-bit long, it is half the size of UUIDs
  • Scalable (it can accommodate 1024 machines)
  • Highly available (Each machine can generate 4096 unique IDs each millisecond)
  • Some of the UUID versions do not include a timestamp. In this case, Twitter Snowflake has a sortable advantage.
  • Design requires Zookeeper (disadvantage)
  • The generated IDs are not random like UUIDs. Future IDs can predictable.
  • The maximum timestamp that can be represented in 41 bits is (~ 69 years). Need a solution after this :)

Usage Notes

  • Discord uses snowflakes, with their epoch set to the first second of the year 2015.
  • Instagram uses a modified version of the format, with 41 bits for a timestamp, 13 bits for a shard ID, and 10 bits for a sequence number.
  • Mastodon’s modified format has 48 bits for a millisecond-level timestamp, it uses the UNIX epoch. The remaining 16 bits are for sequence data.

References