System Design 101: Designing a Scalable URL Shortener (Part 1)

Jul 08, 2025

In the last post, we unpacked the core concepts of system design — latency vs throughput, load balancing, caching, the CAP theorem, and scaling strategies. Today, we’ll apply those ideas to a real-world scenario: building a URL shortener, just like Bitly or TinyURL.

At first glance, it sounds simple:

Paste a long URL
Get a short link
Click the short link
Get redirected to the original URL

But as with most systems at scale, simplicity on the surface hides complex decisions underneath. Let’s peel back the layers.

Step 1: What Are We Building?

Before we think about servers or databases, we need to clearly define what the system should do.

Functional Requirements

Accept a long URL and return a short version
Redirect short URLs to the original long URL
Optionally track clicks (e.g. count or timestamp)

Non-Functional Requirements

Low latency: Redirection must be nearly instant
High availability: Links should work reliably 24/7
Scalability: Support millions (or billions) of URLs and redirects
Fault tolerance: The system should continue operating even if parts fail

These are the real drivers of good system design — and where our core concepts come into play.

Step 2: The Naive Version

Let’s say we’re building a prototype.

One web server handles requests
One database table stores mappings like:

short_code | long_url
abc123     | https://example.com/my-very-long-url

A form submits the long URL → a short code is generated and saved
A redirect endpoint matches the code and redirects the user

This works fine… until it doesn’t.

Too many users? The server crashes.
Database slow? Redirects lag.
One machine dies? The whole system goes down.

Time to scale.

Step 3: Enter System Design Principles

Now, let’s level up our architecture using the fundamentals from our last post.

Load Balancing

Instead of one app server, we place a load balancer (like NGINX, HAProxy, or AWS ELB) in front of multiple app servers.

It distributes incoming traffic evenly
Prevents any single server from becoming overwhelmed
Increases fault tolerance by rerouting requests

This improves throughput and protects against downtime.

Caching

URL redirection is read-heavy. Why hit the database for every redirect?

Enter caching:

Store frequently used short_code → long_url mappings in Redis or Memcached
Keep cache TTL (time-to-live) reasonably high
Hit the cache first, database second (read-through strategy)

This significantly improves latency and reduces database load.

CAP Theorem

Let’s say a database node becomes unreachable. Should the service:

Refuse to redirect (prioritize Consistency), or
Redirect with possibly outdated data (prioritize Availability)?

In most real-world cases, Availability + Partition Tolerance (AP) wins. Slightly stale metrics are okay — but broken links aren’t.

Scaling Up

We scale horizontally:

Add more app servers behind the load balancer
Add more cache nodes for high-speed reads
Consider a distributed database (like DynamoDB or Cassandra) to handle enormous write volumes

Vertical scaling (adding CPU/RAM to one machine) hits a ceiling fast. Distributed systems are how you grow sustainably.

What’s Next?

We’ve laid the foundation. In the next part, we’ll dig deeper into:

Short code generation: How to create unique, collision-free short links (base62, hashing, randomization)
Database design: How to store, retrieve, and scale your URL mappings
Analytics tracking: How to record and report on clicks with minimal performance impact

Stay tuned — this is where the real system design decisions begin.

📩 Enjoying this series? Share it with a friend or colleague preparing for interviews or designing their own system. And feel free to reply with your questions — I might include them in the next post.

CodStak

Discussion about this post

Ready for more?