If someone handed you a running backend system and asked you to optimize it, how would you go about it? What steps would you take? Here’s my approach.

Overview of the Steps I Apply:

Measure the System’s Current Metrics
→ The goal is to understand how the system is performing and which metrics are hitting their limits.
Identify the Bottleneck
→ Pinpoint the choke point in the system: hardware layer, application layer, or database layer.
Address the Bottleneck with Two Principles:
- Increase processing capacity at the bottleneck.
- Reduce pressure on the bottleneck.

Example: How Would I Solve Traffic Congestion?

Suppose I’m a traffic engineer, and the city is facing severe traffic jams.

Measure the Current Situation
First, I collect data on the number of vehicles per hour, red light wait times, vehicle density at intersections, etc.
Identify the Bottleneck
The data shows a specific intersection overloading during peak hours, causing prolonged traffic jams.
Address the Bottleneck
- Increase Capacity:
  - Widen the road, add more lanes. Build overpasses or tunnels (this gets pricey 😂, much like upgrading server specs or running multiple instances in a backend system).
- Reduce Pressure on the Bottleneck:
  - Reroute traffic, tweak traffic lights to optimize flow.
  - Encourage public transport usage: this is basically batch processing =)) I’ll share an example of this technique at the end of this post.
  - On a broader scale: Implement policies to decentralize population or relocate industrial zones, universities... away from the city center.

I always start optimization by asking a simple question: "Where is my system struggling?" Without knowing where the problem lies, all optimization efforts are like groping in the dark.

First, I Identify Key Metrics:

Throughput: How many requests can the system handle per second?
Latency: How long from sending a request to receiving a response?
Resource Consumption: CPU, RAM, disk, network bandwidth, etc.
Concurrent Connections: How many connections can the system sustain at once without crashing?

Tools for Measuring Metrics

Throughput & Latency: Use Apache JMeter, k6, or Locust to load-test the system.
Resource Consumption: Leverage Prometheus + Grafana to monitor system resources in real time.
Concurrent Connections: Use wrk or siege to test the maximum connections the system can handle.
Scalability: Run scaling tests with Kubernetes HPA (Horizontal Pod Autoscaler) or check performance on the cloud.

Identifying the Bottleneck

Once I have the data, I hunt for the bottleneck. It could be in the hardware, application layer, or database.

1. Hardware Layer

If the server hardware can only handle 1000 requests per second max, no amount of code optimization will break that ceiling. Common issues include:

Too little RAM
Slow disk read/write speeds
Network bandwidth bottlenecks
CPU overload from too many simultaneous requests

If the problem’s here, the easiest fix might be… splashing cash on upgrades. Think more RAM, CPU, storage, or deploying a cluster with load balancing (essentially adding servers or instances). The endgame is pushing that 1000 limit higher. But I won’t focus on hardware here—the real excitement is in the application and database layers.

2. Application Layer

Issues might stem from:

Overloaded Requests: Each request should focus on one main task, not juggle too much at once. For example, if Messi or Ronaldo posts on Instagram, I can’t wait to notify millions of followers before responding—I’d push notifications to a queue and reply once the essentials are saved.
Too Many Synchronous Tasks: These can clog the system.
Suboptimal Code: Are there middlewares every request runs through? That’s where I’d dig in.
Third-Party Services: If a third-party service is slow or jammed, it can drag you down. Caching cuts down on those calls; if caching’s not an option, I’d handle it asynchronously to trim latency—queue the task and respond after saving the key data.

I always ask myself: Is there a way to offload some of the application’s work?

3. Database Layer

This is where problems love to pop up. Say my database maxes out at 1000 queries per second, and each request needs 2 queries—that caps the system at 500 requests per second.

How do I solve this? Two options:

Increase That Max Number
- Connection Pools: Reuse connections to cut setup latency.
- Database Replicas: Offload reads to replicas, boosting read capacity without hurting write performance.
- Query Optimization: Simpler queries mean more requests handled; effective indexing speeds things up.
- Sharding: Split data into smaller chunks for faster retrieval.
Reduce Queries Per Request
- Caching: Speeds up data access for the app and eases database load, cutting queries per request.
- Batch Processing: Group requests to process them together instead of firing off individual queries. It’s like encouraging people to take buses or trains to ease traffic infrastructure strain.

Real-World Case: Handling Like Requests on a Social Media Platform

A classic problem: Messi or Ronaldo, with hundreds of millions of Instagram followers, posts something. Say 1,000,000 people hit "like" in the first minute—that’s 17,000 like requests per second on average. Each request needs 2 queries:

One to create the like record
One to update the like count in the post table

That’s 34,000 queries per second—a big number.

Solution: Batch processing. Instead of writing directly to the database, I’d store likes in Redis first, then sync them to the database in batches.

async toggleLike(userId: string, postId: string) {
  const likeKey = `likes:${postId}`;
  const userLikeKey = `user_like:${userId}:${postId}`;
  const lockKey = `lock:${userId}:${postId}`;

  const lock = await this.redis.set(lockKey, 'locked', 'NX', 'EX', 2);
  if (!lock) return { message: 'Processing like request, try again.' };

  try {
    let hasLiked = await this.redis.get(userLikeKey);
    if (hasLiked === null) {
      const existingLike = await this.likeModel.exists({ userId, postId });
      hasLiked = existingLike ? 'true' : 'false';
      await this.redis.set(userLikeKey, hasLiked, 'EX', 3600);
    }

    if (hasLiked === 'false') {
      await this.redis.incr(likeKey);
      await this.redis.set(userLikeKey, 'true', 'EX', 3600);
      await this.redis.rpush(`pending_likes:${postId}`, JSON.stringify({ userId, action: 'like' }));
    } else {
      await this.redis.decr(likeKey);
      await this.redis.del(userLikeKey);
      await this.redis.rpush(`pending_likes:${postId}`, JSON.stringify({ userId, action: 'unlike' }));
    }

    await this.likeQueue.add('syncLikes', { postId }, { jobId: postId, delay: 5000 });
    return { message: hasLiked === 'true' ? 'Unliked' : 'Liked' };
  } finally {
    await this.redis.del(lockKey);
  }
}

async function syncLikes(postId: string) {
  const pendingLikesKey = `pending_likes:${postId}`;
  const processingLikesKey = `processing_likes:${postId}`;
  const likeKey = `likes:${postId}`;
  const lockKey = `lock:syncLikes:${postId}`;

  // Acquire lock
  const lock = await this.redis.set(lockKey, 'locked', 'NX', 'EX', 10);
  if (!lock) return; // If another sync is running, exit

  try {
    // Move pending actions to a separate processing queue
    await this.redis.rename(pendingLikesKey, processingLikesKey);
  } catch (error) {
    if (error.message.includes('no such key')) return; // No likes to process
    throw error;
  }

  const actions = await this.redis.lrange(processingLikesKey, 0, -1);
  if (actions.length === 0) return;

  const bulkOps = [];
  const userActions = new Map();

  for (const action of actions) {
    const { userId, actionType } = JSON.parse(action);
    userActions.set(userId, actionType);
  }

  for (const [userId, actionType] of userActions) {
    if (actionType === 'like') {
      bulkOps.push({
        updateOne: {
          filter: { userId, postId },
          update: { $set: { userId, postId, createdAt: new Date() } },
          upsert: true,
        },
      });
    } else {
      bulkOps.push({ deleteOne: { filter: { userId, postId } } });
    }
  }

  if (bulkOps.length) {
    // TODO: handle batch size if needed
    await this.likeModel.bulkWrite(bulkOps);
  }

  const redisLikeCount = await this.redis.get(likeKey);
  if (redisLikeCount) {
    await this.postModel.updateOne({ _id: postId }, { $set: { likeCount: parseInt(redisLikeCount) } });
  }

  // Finally, clear processed queue
  await this.redis.del(processingLikesKey);

} finally {
    await this.redis.del(lockKey); // Release lock
}

Coordinating with the Frontend to Address Congestion

Everything I’ve covered so far is from a backend developer’s perspective. But as a full-stack developer, I’ve also applied a few techniques to ease the pressure on the backend. Specifically:

Lazy Load: Only call the API when needed instead of fetching everything upfront—think virtual scroll, for example.
Calling Queue: Limit the number of concurrent requests from the frontend to avoid overwhelming the server with a flood of requests right after a page reload.
Debounce & Throttling: When users type into a search input, I only send a request after a brief pause with no changes or cap the number of requests sent within a set time frame.

Conclusion

Every optimization challenge starts with identifying the bottleneck and ends with choosing the best fit for the real-world situation. There’s no universal formula—just trade-offs between performance, cost, and implementation complexity. But once you pinpoint the choke point, the solution starts to take shape.

Plus, every solution spawns smaller problems. Take caching, for instance—it can bring up issues like Redis penetration or Redis avalanches. I might dive into those and discuss solutions in future posts.