Published on

System design interview

Authors
Table of Contents

System design interviews are a critical component of technical interviews, particularly for software engineering roles. In these interviews, candidates are asked to design a system that meets a particular set of requirements and constraints. The goal is to evaluate the candidate's ability to think systematically and design solutions that are scalable, fault-tolerant, and performant. In this article, we'll explore the different aspects of a system design interview

What qualities companies are looking for in candidates

Problem Exploration

This refers to how well the candidate can analyze the problem and identify the key requirements and constraints. Companies are looking for candidates who can ask the right questions, clarify assumptions, and propose creative solutions.

For example, if the problem is to design a social media platform, the candidate should explore the different features that the platform needs to support, such as user profiles, friend connections, news feeds, and messaging. They should also consider the scalability and performance requirements of the platform, as well as any security or privacy constraints.

Proxy Evaluation:

  • Functional requirement
  • Non-functional requirement
  • Assumptions

Handling Data

The ability to handle data at scale efficiently. This includes skills such as data modeling, data storage, and data retrieval.

For example, if the problem is to design a search engine, the candidate should consider how to store and retrieve a large index of web pages efficiently. They should also consider how to rank the search results based on relevance, which requires sophisticated data modeling and algorithms.

Proxy Evaluation:

  • Data API
  • Data Storage

Component Responsibilities

This refers to how well the candidate can identify and define the responsibilities of each component of the system. Companies are looking for candidates who can think systematically and have a strong understanding of the underlying architecture.

For example, if the problem is to design an e-commerce platform, the candidate should consider the different components of the system, such as the user interface, the database, the payment gateway, and the shipping logistics. They should also consider how these components interact with each other and how they can be scaled and optimized for performance.

Proxy Evaluation:

  • Service separation of concern
  • Database separation of concern
  • API Gateway/Load Balancer

Completeness of Solution

Provide a complete solution that satisfies all the requirements and constraints of the problem. This includes considerations such as scalability, fault tolerance, and performance.

For example, if the problem is to design a streaming video platform, the candidate should consider how to handle the storage and delivery of video content at scale. They should also consider how to ensure high availability and fault tolerance in case of network or server failures.

Proxy Evaluation:

  • Address all functional requirement
  • Address all Non-functional requirement
  • API and system Completeness

Tradeoffs

Identify and explain the tradeoffs of different design decisions. This includes tradeoffs such as complexity vs. simplicity, consistency vs. availability, and cost vs. performance.

For example, if the problem is to design a chat application, the candidate should consider the tradeoff between message delivery latency and message consistency. They should also consider the tradeoff between using a centralized or decentralized architecture for the application.

Proxy Evaluation:

  • Coming up with at least 2 solutions
  • SQL vs NoSQL, depends on throughput data
  • ReST vs gRPC vs GraphQL
  • Pull vs Push

Quantitative Analysis

How well the candidate can analyze and quantify the performance of the system. Companies are looking for candidates who can use metrics and benchmarks to evaluate the effectiveness of the solution.

For example, if the problem is to design a recommendation engine, the candidate should consider how to measure the accuracy and relevance of the recommendations. They should also consider how to optimize the recommendation algorithms for performance and scalability.

Proxy Evaluation:

  • Read per second, Write per second, Read Write ratio
  • Storage Consumption
  • Bandwidth consumption

Deep dive

Demonstrate a deep understanding of a particular aspect of the system. This includes being able to explain the underlying technology or algorithms, or being able to optimize a specific component for performance or scalability.

For example, if the problem is to design a content delivery network (CDN), the candidate should be able to explain the different caching and routing strategies that can be used to optimize content delivery. They should also be able to explain how to measure the effectiveness of the CDN and identify areas for optimization.

Proxy Evaluation:

  • Database sharding and partitioning
  • System scalability
  • Authentication
  • Security

Hands On

  1. Functional requirement
  2. Non-functional requirement
  3. Quantitative Analysis
  4. High level design and data flow
  5. Data API (ReST, GraphQL)
  6. Data schema and Data Store
  7. optimization

1. Functional requirements

  • Understand the problem domain and user needs to identify the key functional requirements
  • Prioritize requirements based on their importance to users and the system's goals
  • Be able to explain trade-offs between different requirements and how they impact the system's design and implementation
  • Be careful with assumptions, because it can lead to different problem

Example:


_3
- Use cases
_3
- Stakeholders
_3
- “As a user I can post comment to a post”

2. Non-functional requirements

  • Be familiar with common non-functional requirements such as performance, scalability, security, and usability
  • Understand how non-functional requirements affect the system's architecture and design
  • Be able to propose solutions for meeting non-functional requirements and explain trade-offs between different options

Example:


_6
- Reliability: 99.99% system availability
_6
- Scalability: can handle up and down traffic
_6
- Security: only one public endpoint. Code is executed safely
_6
- Durability: store data for 10 years
_6
- Latency: p95 200ms
_6
- High Availability vs Strong Consistency

3. Quantitative analysis

  • Be comfortable with basic statistics and data analysis techniques
  • Understand how to collect and analyze data to inform system design and optimization
  • Be able to explain how quantitative analysis can help identify bottlenecks and areas for improvement in the system

Example:


_4
- Number estimation (how many users, how many time use cases, read heavy vs write heavy, read to write ratio)
_4
- Read per sec and Write per sec. Read to Write ratio
_4
- Storage consumption
_4
- Bandwidth (not always important)

Example with number:


_18
Assumptions:
_18
- Active users: 400mio/day
_18
- Only 20% of users write comment
_18
- Read to write ration -> 100 : 1
_18
- Usecase:
_18
- post comment
_18
- read comment
_18
_18
Numbers:
_18
- 1 day = 86400 sec ~ 100K sec
_18
- write per second:
_18
500mio/day * 20%/day
_18
5 * 10^8 * 0.2 / 10^5
_18
1 * 10^3 write/sec
_18
- Read per sec
_18
100 * write per second
_18
10^2 * 10^3 /sec
_18
10^5 read/sec

4. High-level design and data flow

  • Be able to create a high-level architecture diagram that shows the key components and their interactions
  • Understand how to break down the system into smaller components that can be developed and tested independently
  • Be able to explain how data flows through the system and identify potential bottlenecks or areas for optimization

Example:


_3
- Draw: User → Load Balancer → API Gateway → Service → Database
_3
- Draw the arrows
_3
- If data flows through multiple system, write the order of the data flow

5. Data API

  • Understand the differences between REST and GraphQL and when to use each one
  • Be able to design a data API that meets the system's requirements and is easy to use for other developers
  • Understand how to handle errors and edge cases in the API and ensure it is secure and scalable

Example:


_15
- REST vs GraphQL
_15
- POST vs GET
_15
- Both request and response.
_15
_15
REST:
_15
POST /:post_id/comment
_15
request = {
_15
auth_token: string,
_15
post_id: UUID,
_15
comment: string
_15
}
_15
_15
response = {
_15
comment_id: UUID
_15
}

6. Data schema and data store

  • Understand the trade-offs between different types of data stores such as relational databases, NoSQL databases, and file systems
  • Be able to design a data schema that meets the system's requirements and is easy to maintain and extend
  • Understand how to ensure data consistency and integrity in the data store

Example:


_5
- SQL vs NoSQL
_5
- Data type
_5
- Object storage
_5
- In-memory storage, caching, message queue
_5
- Database partitioning/sharding

7. Optimization

  • Understand common optimization techniques such as caching, indexing, and load balancing
  • Be able to identify bottlenecks in the system and propose solutions for improving performance and scalability
  • Understand how to measure the impact of optimizations and balance the cost and benefit of each one

Example:


_10
Authentication and Authorization
_10
Monitoring
_10
Mobile specific knowledge
_10
- Battery
_10
- Offline
_10
Tradeoffs
_10
- SQL vs NoSQL
_10
- Read heavy vs write heavy
_10
- REST vs GraphQL vs gRPC vs Protobuf
_10
- Pull vs Push

8. Core puzzle

  • Be able to identify the key features or components of the system that are critical to its success
  • Understand how to prioritize development efforts based on the importance of each core puzzle component
  • Be able to explain how the core puzzle fits into the overall system design and how it contributes to meeting user needs.

Learning Material

Reference