Published in system design
How to design a URL Shortener?
By Atakan Demircioğlu
Fullstack Developer
How to design a URL Shortener?
These are my notes about how to design a URL shortener.
What is a URL Shortener Service?
- It is for generating shorter aliases for long URLs.
- Common system interview question
Imagine you have an URL like this;
https://www.example.com/ex/testtestestest1.php?a=5&b=4
You will convert to like
https://shorturl.com/y73g2zjl
What are the Requirements
- 1 million URLs are generated per day
- URLs are the combination of [0–9] [a-z, A-Z]
- Generate possibly the shortenest version
Back-of-the-envelope estimation
- Write per second: 1 million / 24 / 3600 = 12
- Read per second: 12 * 10 = 120 (Assuming 10:1)
- After 10 years: 1 million * 365 * 10 = 3.6 billion (3.650.000.000)
- Assume the average URL length is 100 bytes.
- After 10 years: 3.6 billion * 100 bytes = 339.93 Gigabytes
Creating High-Level Design
- We need basically two endpoints. One is for creating a short URL and one is for redirecting to the long version of the URL.
Example Routes
POST /api/url-shortener/
- long-url: Parameter
- create and return a short URL
GET /url-shortener/{slug}
- Check storage, if exists redirect to the corresponding URL
How about 301 redirections and 302 redirection ?
- 301 means permanently moved to the long URL. The browser can cache the response and knows this URL is permanently moved. It will not ask for the URL Shortener Server, it will directly redirect to the URLs server.
- 302 means temporarily moved to the long URL. Subsequent requests for the same URL will be sent to the URL Shortener Server.
- All things in software are about a trade-off. If you want to reduce server load, using 301 is better. If analytics is more important for you, calculating clicks and etc. you need to use 302.
What is the Data Model?
We need to store these values in a storage engine. We can store them in MySQL for this example.
We need a table structured like this;
id (PK)
short-url
long-url
Of course, you can add more fields like creation date, update date, and so on. But for now, we are just creating the initials of this design.
What is the strategy to create short URLs?
Hash function approach
We need a hash function. We will have a long URL and we will store it in a relational database.
- In our requirements, we are allowing [0–9, a-z, A-Z].
- So this means 62 possible chars.
- We need to calculate how long can be our urls. For 10 years approximately we need 3.6 billion URLs. 62⁶ = 56,800,235,584 is totally fine for us. So our short URLs will include 6 fixed chars.
- For a hash function, we can use CRC32, MD5, and SHA-1. These approaches have different lengths. We can cut the first 6 chars but it can cause hash collisions. For this, we need to check from a database. But this solution gets expensive because every time you need to check from the database.
Base 62 conversion approach
- We can covert IDs from base 10 to base 62 and save them as short URLs.
- Collision is not possible
- URL length is not fixed.
- We need to generate a unique ID each time.
- If the ID increments by 1, it can possible to guess the other one. It can be a security problem.