Back-of-envelope calculations
Here are my notes about Back-of-envelope calculations for empowering system design.
Table Of Contents:
∘ What is Back-of-envelope calculations?
∘ Useful Calculations (approx)
∘ Load Estimates
∘ Database Storage estimate
∘ Cache Estimate
∘ Bandwidth estimates
∘ Numbers Everyone Should Know
∘ Availability numbers
∘ Estimate A Ticket System QPS and storage requirements
∘ Some Tips
∘ References
What is Back-of-envelope calculations?
- it is a technique used within software engineering the determinations how a system should be designed.
- it is all about the process.
- it is an approx calculation.
- that helps in choosing good configurations and technologies for the system.
- Generally, it is good for estimating the scale of the system before starting any High Level & Low-level design
- helps to identify the Request & Response Size, DB size, cache size, counts of microservices, Load balancers and etc.
Useful Calculations (approx)
1B = 8bits
1KB = 1000B
1MB = 1000KB
1GB = 1000MB
— — — — — — — — — — — — — — —
B: Byte : Ten: 10
K: Kilo : Thousand: 1000
M: Mega: Million: 1000 0000
G: Giga: Billion: 1000 000 000
T: Tera: Trillion: 1000 000 000 000
P: Peta: Quadrillion: 1000 000 000 000 000
number of zeros after thousands of increments by 3
— — — — — — — — — — — — — — —
char: 1B (8 bits)
char (Unicode): 2B (16 bits)
Short: 2B (16 bits)
Int: 4B (32 bits)
Long: 8B (64 bits)
UUID: 16B
Load Estimates
- the volume of requests a system is going to process
- usually measured per second
- Million Requests per day = ~12 requests/per second
- Understanding read-heavy or write-heavy is important
Database Storage estimate
- the database is an integral component of any application
- Application response time directly depends on the data source and underlying database response time
- structured data should be stored in RDBMS databases
- unstructured data should be stored in No SQL data store (like Mongo DB)
- 1Million Requests per day with 1 kb size is 1 GB per day and 365 GB per year. Then, we can quickly calculate for X years of storage: 1 GB * 365 * X.
Cache Estimate
- there are no hard rules for these requirements
- 10–30% of the Database Storage is as Cache and some go with 20–30% of presently/frequently accessed data
Bandwidth estimates
- Internet speed important
- upstream and downstream speeds important
Numbers Everyone Should Know
- L1 cache reference 0.5 ns
- Branch mispredict 5 ns
- L2 cache reference 7 ns
- Mutex lock/unlock 100 ns
- Main memory reference 100 ns
- Compress 1K bytes with Zippy 10,000 ns
- Send 2K bytes over 1 Gbps network 20,000 ns
- Read 1 MB sequentially from memory 250,000 ns
- Round trip within same data center 500,000 ns
- Disk seek 10,000,000 ns
- Read 1 MB sequentially from network 10,000,000 ns
- Read 1 MB sequentially from disk 30,000,000 ns
- Send packet CA->Netherlands->CA 150,000,000 ns
Availability numbers
- high availability is the ability of a system to be continuously operational for a long period of time
- most services fall between 99% and 100%
Estimate A Ticket System QPS and storage requirements
Just trying to estimate it, it is not real numbers.
Assumptions:
- 50 million monthly active users.
- 2% of users use your ticketing system daily.
- Users post 2 tickets per day on average.
- 20% of tickets contain media.
Estimations:
Estimate QPS (Query Per Second):
- DAU (Daily active users) = 50 million * 2% = 1 million
- Tickets QPS = 1 million * 2 posts / 24 hour / 3600 seconds = ~23,14 (25)
- Peek QPS = 2 * QPS = ~50
Estimate media storage:
- post_id (16 bytes)
- description (200 bytes)
- etc...
- media (8 MB)
- Media Storage = 1 million * 2 * 20% * 10MB = 4 GB per day
Some Tips
- Memory is fast and disks are slow.
- Writes are 40 times more expensive than reads.
- Optimize for low write contention.
- Simple compression algorithms are faster.
- Compress data before sending it over if it is possible.
- Data centers usually with different regions take time to send data between them.
References
- https://matthewdbill.medium.com/back-of-envelope-calculations-cheat-sheet-d6758d276b05
- https://medium.com/@saurabh.engg.it/software-system-design-back-of-envelope-calculations-8f2d9d0f4edd
- https://sre.google/sre-book/availability-table/
- http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-the-envelope-calculations-to-choo.html