All posts
Published in Latest Articles

Back-of-envelope calculations

Profile image of Atakan Demircioğlu
By Atakan Demircioğlu
Fullstack Developer

Here are my notes about Back-of-envelope calculations for empowering system design.

Back-of-envelope calculations image 1

Table Of Contents:

What is Back-of-envelope calculations?
 ∘ Useful Calculations (approx)
 ∘ Load Estimates
 ∘ Database Storage estimate
 ∘ Cache Estimate
 ∘ Bandwidth estimates
 ∘ Numbers Everyone Should Know
 ∘ Availability numbers
 ∘ Estimate A Ticket System QPS and storage requirements
 ∘ Some Tips
 ∘ References

What is Back-of-envelope calculations?

  • it is a technique used within software engineering the determinations how a system should be designed.
  • it is all about the process.
  • it is an approx calculation.
  • that helps in choosing good configurations and technologies for the system.
  • Generally, it is good for estimating the scale of the system before starting any High Level & Low-level design
  • helps to identify the Request & Response Size, DB size, cache size, counts of microservices, Load balancers and etc.

Useful Calculations (approx)

1B = 8bits
1KB = 1000B
1MB = 1000KB
1GB = 1000MB

— — — — — — — — — — — — — — —

B: Byte : Ten: 10
K: Kilo : Thousand: 1000 
M: Mega: Million: 1000 0000 
G: Giga: Billion: 1000 000 000 
T: Tera: Trillion: 1000 000 000 000 
P: Peta: Quadrillion: 1000 000 000 000 000

number of zeros after thousands of increments by 3

— — — — — — — — — — — — — — —

char: 1B (8 bits)
char (Unicode): 2B (16 bits)
Short: 2B (16 bits)
Int: 4B (32 bits)
Long: 8B (64 bits)
UUID: 16B

Load Estimates

  • the volume of requests a system is going to process
  • usually measured per second
  • Million Requests per day = ~12 requests/per second
  • Understanding read-heavy or write-heavy is important

Database Storage estimate

  • the database is an integral component of any application
  • Application response time directly depends on the data source and underlying database response time
  • structured data should be stored in RDBMS databases
  • unstructured data should be stored in No SQL data store (like Mongo DB)
  • 1Million Requests per day with 1 kb size is 1 GB per day and 365 GB per year. Then, we can quickly calculate for X years of storage: 1 GB * 365 * X.

Cache Estimate

  • there are no hard rules for these requirements
  • 10–30% of the Database Storage is as Cache and some go with 20–30% of presently/frequently accessed data

Bandwidth estimates

  • Internet speed important
  • upstream and downstream speeds important

Numbers Everyone Should Know

  • L1 cache reference 0.5 ns
  • Branch mispredict 5 ns
  • L2 cache reference 7 ns
  • Mutex lock/unlock 100 ns
  • Main memory reference 100 ns
  • Compress 1K bytes with Zippy 10,000 ns
  • Send 2K bytes over 1 Gbps network 20,000 ns
  • Read 1 MB sequentially from memory 250,000 ns
  • Round trip within same data center 500,000 ns
  • Disk seek 10,000,000 ns
  • Read 1 MB sequentially from network 10,000,000 ns
  • Read 1 MB sequentially from disk 30,000,000 ns
  • Send packet CA->Netherlands->CA 150,000,000 ns

Availability numbers

  • high availability is the ability of a system to be continuously operational for a long period of time
  • most services fall between 99% and 100%

Back-of-envelope calculations image 2

Estimate A Ticket System QPS and storage requirements

Just trying to estimate it, it is not real numbers.

Assumptions:

  • 50 million monthly active users.
  • 2% of users use your ticketing system daily.
  • Users post 2 tickets per day on average.
  • 20% of tickets contain media.

Estimations:

Estimate QPS (Query Per Second):

  • DAU (Daily active users) = 50 million * 2% = 1 million
  • Tickets QPS = 1 million * 2 posts / 24 hour / 3600 seconds = ~23,14 (25)
  • Peek QPS = 2 * QPS = ~50

Estimate media storage:

  • post_id (16 bytes)
  • description (200 bytes)
  • etc...
  • media (8 MB)
  • Media Storage = 1 million * 2 * 20% * 10MB = 4 GB per day

Some Tips

  • Memory is fast and disks are slow.
  • Writes are 40 times more expensive than reads.
  • Optimize for low write contention.
  • Simple compression algorithms are faster.
  • Compress data before sending it over if it is possible.
  • Data centers usually with different regions take time to send data between them.

References