System Design 101: Designing a Scalable URL Shortener (Part 1)
In the last post, we unpacked the core concepts of system design — latency vs throughput, load balancing, caching, the CAP theorem, and scaling strategies. Today, we’ll apply those ideas to a real-world scenario: building a URL shortener, just like Bitly or TinyURL.
At first glance, it sounds simple:
Paste a long URL
Get a short link
Click the short link
Get redirected to the original URL
But as with most systems at scale, simplicity on the surface hides complex decisions underneath. Let’s peel back the layers.
Step 1: What Are We Building?
Before we think about servers or databases, we need to clearly define what the system should do.
Functional Requirements
Accept a long URL and return a short version
Redirect short URLs to the original long URL
Optionally track clicks (e.g. count or timestamp)
Non-Functional Requirements
Low latency: Redirection must be nearly instant
High availability: Links should work reliably 24/7
Scalability: Support millions (or billions) of URLs and redirects
Fault tolerance: The system should continue operating even if parts fail
These are the real drivers of good system design — and where our core concepts come into play.
Step 2: The Naive Version
Let’s say we’re building a prototype.
One web server handles requests
One database table stores mappings like:
short_code | long_url
abc123 | https://example.com/my-very-long-urlA form submits the long URL → a short code is generated and saved
A redirect endpoint matches the code and redirects the user
This works fine… until it doesn’t.
Too many users? The server crashes.
Database slow? Redirects lag.
One machine dies? The whole system goes down.
Time to scale.
Step 3: Enter System Design Principles
Now, let’s level up our architecture using the fundamentals from our last post.
Load Balancing
Instead of one app server, we place a load balancer (like NGINX, HAProxy, or AWS ELB) in front of multiple app servers.
It distributes incoming traffic evenly
Prevents any single server from becoming overwhelmed
Increases fault tolerance by rerouting requests
This improves throughput and protects against downtime.
Caching
URL redirection is read-heavy. Why hit the database for every redirect?
Enter caching:
Store frequently used
short_code → long_urlmappings in Redis or MemcachedKeep cache TTL (time-to-live) reasonably high
Hit the cache first, database second (read-through strategy)
This significantly improves latency and reduces database load.
CAP Theorem
Let’s say a database node becomes unreachable. Should the service:
Refuse to redirect (prioritize Consistency), or
Redirect with possibly outdated data (prioritize Availability)?
In most real-world cases, Availability + Partition Tolerance (AP) wins. Slightly stale metrics are okay — but broken links aren’t.
Scaling Up
We scale horizontally:
Add more app servers behind the load balancer
Add more cache nodes for high-speed reads
Consider a distributed database (like DynamoDB or Cassandra) to handle enormous write volumes
Vertical scaling (adding CPU/RAM to one machine) hits a ceiling fast. Distributed systems are how you grow sustainably.
What’s Next?
We’ve laid the foundation. In the next part, we’ll dig deeper into:
Short code generation: How to create unique, collision-free short links (base62, hashing, randomization)
Database design: How to store, retrieve, and scale your URL mappings
Analytics tracking: How to record and report on clicks with minimal performance impact
Stay tuned — this is where the real system design decisions begin.
📩 Enjoying this series? Share it with a friend or colleague preparing for interviews or designing their own system. And feel free to reply with your questions — I might include them in the next post.

