How WhatsApp scaled to 1 billion users with only 50 engineers

https://www.quastor.org/p/how-whatsapp-scaled-to-1-billion

Hey Everyone,

Today we’ll be talking about

  • How WhatsApp scaled to 1 billion users with only 50 engineers.
    • WhatsApp’s Engineering Culture
    • WhatsApp’s Tech Stack
      • Erlang
      • FreeBSD
      • SoftLayer
  • How to quickly learn new programming languages and frameworks
    • This is from a recent discussion on Hacker News. We summarized some of the best comments.
  • Plus, some tech snippets on
    • An in-depth dive to Hadoop Distributed File System (HDFS)
    • Concurrency in Java vs. Go
    • An amazing free textbook on math for programmers (covers Calculus, Linear Algebra and Discrete Math)

Plus, we have a solution to our last coding interview question on creating a Spiral Matrix and a new interview question from Facebook.

Quastor Daily is a free Software Engineering newsletter sends out Technical Deep Dives, summaries of Engineering Blog Posts, and FAANG Interview questions (with detailed solutions).Subscribe


How WhatsApp served 1 billion users with only 50 engineers.

In 2016, WhatsApp reached more than a billion users and had the following load stats

  • 42 billion messages sent daily
  • 1.6 billion pictures sent daily
  • 250 million videos sent daily

They managed to serve this scale with only 50 engineers.

Here’s a dive into the engineering culture and tech stack that made this possible.

Engineering Culture

WhatsApp’s Engineering culture consists of 3 main principles

  1. Keep Things Small
  2. Keep Things Simple
  3. Have a Single Minded Focus on the Mission

Keep Things Small

WhatsApp consciously keeps the engineering staff small to only about 50 engineers.

Individual engineering teams are also small, consisting of 1 – 3 engineers and teams are each given a great deal of autonomy.

In terms of servers, WhatsApp prefers to use a smaller number of servers and vertically scale each server to the highest extent possible.

Their goal was previously to have 1 million users for every server (but that’s become more difficult as they’ve added more features to the app and as users are generating more activity on a per-user basis).

Having a fewer number of servers means fewer things breaking down, which makes it easier for the team to handle.

The same goes for the software side where they limit the total number of systems and components in production.

That means fewer systems that have to be developed, deployed and supported.

There aren’t many systems/components that are developed and then put into maintenance mode (to eventually become orphans until something goes wrong).

Keep Things Simple

WhatsApp uses the mantra Just Enough Engineering.

They avoid over-investing in systems and components.

Instead, they focus on building just enough for scalability, security and reliability.

One of the key factors when they make technical choices is “what is the simplest approach?”

Also, they avoid investing in automation unless it’s completely necessary.

Have a Single Minded Focus on the Mission

Product Design at WhatsApp is incredibly focused.

It’s dedicated to delivering a core communications app with a great UI.

They avoid extra bells and whistles and don’t implement features that aren’t exclusively focused on core communications.

The simpler product makes it much easier to maintain and scale.

Tech Stack

The tech stack revolves around 3 core components: Erlang, FreeBSD and SoftLayer.

Erlang

Erlang is the programming language of choice for WhatsApp’s backend systems.

Erlang was designed for concurrency from the start, and fault tolerance is a first class feature of the language.

You can read more about Erlang’s fault tolerance here.

Developer productivity with Erlang is also extremely high. However, it is a functional language, so it takes a little getting used to if you’re not familiar with the paradigm.

The language is very concise and it’s easy to do things with very few lines of code.

The OTP (Open Telecom Platform) is a collection of open source middleware, libraries and tools for Erlang.

WhatsApp tries to avoid dependencies as much as possible, but they do make use of Mnesia, a distributed database that’s part of OTP.

Erlang also brings the ability to hotswap code. You can take new application code and load it into a running application without restarting the application.

This makes the iteration cycle very quick and allows WhatsApp to release quick fixes and have extremely long uptimes for their services.

To see exactly how WhatsApp’s backend is built with Erlang, you can watch this talk from 2018.

FreeBSD

FreeBSD is the OS WhatsApp uses for their servers.

The decision to use FreeBSD was made by the founders of WhatsApp, based on their previous experience at Yahoo!

The founders (and a lot of the early team) all used to be part of Yahoo!, where FreeBSD was used extensively.

To see exactly how WhatsApp uses FreeBSD, you can watch this talk.

Just note, the talk is from 2014, so some things may be out of date now.

SoftLayer

SoftLayer is the hosting platform that WhatsApp was using in 2016.

They picked SoftLayer for two main reasons

  1. The availability of FreeBSD as a first class operating system.
  2. The ability to order and operate bare metal servers.

However, SoftLayer is owned by IBM (part of IBM public cloud), and WhatsApp has since moved off SoftLayer to use Facebook’s infrastructure.

They made the transition in 2017.

Tools and Techniques Used to Increase Scalability

You can view the full talk on WhatsApp’s engineering here.

Get more specific details from this High Scalability post on WhatsApp Engineering.


Quastor Daily is a free Software Engineering newsletter sends out FAANG Interview questions (with detailed solutions), Technical Deep Dives and summaries of Engineering Blog Posts.Subscribe


How to learn new programming languages and frameworks quickly

This is an interesting discussion on Hacker News for a process to learn new things quickly.

Here are some of the top answers summarized.

  • For a new programming language, there are a couple of standard things you should try to implement.Write several programs where you
    • Read/write to a file
    • Turn a structured object into JSON
    • Parse JSON into an object
    • A basic script that can be run from the CLI, parses flags/args, reads stdin
    • Send a HTTP request
    • Implement a very basic web server
    When you’re doing these exercises, do not copy/paste any code.You should be typing out the code yourself.
  • Try to bias towards reading (blogs, documentation, code, books, etc.) over watching video tutorials. Reading is a lot faster than watching video and you can be far more efficient with your time.
  • A 3 step process for learning new frameworks is to divide it up into a week’s work (10 – 15 hour total commitment)
    • Prepare – Spend 1 hour Monday through Thursday reading documentation, books, and watching content. This way you’ll have time to sleep on the concepts.
    • Plan – On Friday, spend 1 hour preparing a small project idea for the weekend that will use all the concepts you learned over the week.
    • Project – Spend 4 – 6 hours building the project over the weekend.

Quastor Daily is a free Software Engineering newsletter sends out FAANG Interview questions (with detailed solutions), Technical Deep Dives and summaries of Engineering Blog Posts.Subscribe


Tech Snippets

  • A Programmer’s Introduction to Mathematics – If you’re looking to learn more about areas like machine learning, you might have to brush up on your math background. Jeremy Kun wrote a fantastic textbook on math for programmers that covers topics in Calculus (Single and Multivariable), Linear Algebra, and Discrete Math.You can set your own price when you purchase the e-book, so you can pay anything from $0 to $100.
  • Why you can have millions of Goroutines but only thousands of Java ThreadsIf you’ve been working with JVM based languages for a while, you’ve probably come across a situation where you’ve reached the limit of the number of concurrent threads you can have.On your personal computer, this limit is usually around ~10,000 threads.On the other hand, you can have more than a hundred million goroutines on a laptop with Go.This article explores why you can have so many more Goroutines than threads.There’s two main reasons why
    1. The JVM delegates threading to operating system threads. Go implements its own scheduler that allows many Goroutines to run on the same OS thread.
    2. The JVM defaults to a 1 MB stack per thread. Go’s stacks are dynamically sized and a new goroutine will have a stack of about 4 KB.
  • An in-depth dive into HDFS (Hadoop Distributed File System)The article describes the architecture of HDFS and also goes into the experience of using HDFS to manage 40 petabytes of data at Yahoo!

Interview Question

Implement a BST Iterator class that represents an iterator over the in-order traversal of a Binary Search Tree.

Implement the following methods

  • BSTIterator – constructor. The root of the BST will be passed in as a parameter. The pointer should be initialized to a non-existent number small than any number in the BST.
  • hasNext – Returns true if there exists a number in the traversal to the right of the pointer, otherwise returns false.
  • next – Moves the pointer to the right, then returns the number at the pointer.

Quastor Daily is a free Software Engineering newsletter sends out FAANG Interview questions (with detailed solutions), Technical Deep Dives and summaries of Engineering Blog Posts.Subscribe


Previous Solution

As a reminder, here’s our last question

Given a positive integer i, generate an nxn matrix filled with elements from 1 to i^2 in spiral order.

Example

Input: i = 3

Output: [[1,2,3],[8,9,4],[7,6,5]]

Here’s the question in LeetCode.

Solution

We can solve this question by creating an i x i matrix and then looping through it in a spiral fashion and setting each element to a value from 1 to i^2.

In order to loop through in spiral fashion, we’ll first create 4 variables: leftrighttopbot.

These 4 variables represent pointers to the 4 different sides of our matrix (left column, right column, top row, bottom row).

Left will point to the 0th column, right will point to the last column, top will point to the 0th row and bottom will point to the last row.

When we iterate through our matrix in spiral order, we’ll do it in the following way

  1. Iterate through the top row. Now, increment the top row.
  2. Iterate through the right column. Now, decrement the right column.
  3. Iterate through the bottom row. Now, decrement the bottom row.
  4. Iterate through the left column, Now, increment the left column.

We’ll iterate through these steps while the left pointer is less than or equal to the right pointer and the top pointer is less than or equal to the bottom pointer.

When our loop terminates, we can return the completed matrix.

Here’s the Python 3 code.


Quastor Daily is a free Software Engineering newsletter sends out FAANG Interview questions (with detailed solutions), Technical Deep Dives and summaries of Engineering Blog Posts.

关于 WhatsApp 的故事有许多,其中有趣的部分之一是,它凭借如此小的团队体量实现了如此巨大的规模。

2014 年,Facebook 以 160 亿美元的价格收购即时通讯工具 WhatsApp,其中 40 亿美元为现金、提供价值 120 亿美元的 Facebook 股票、并为 WhatsApp 创始人和团队提供 30 亿美元限制股。

刚被收购时,WhatsApp 只有 35 名工程师,拥有超过 4.5 亿用户。如今,它雇佣了大约 50 名工程师,尽管 WhatsApp 的用户数量已经翻了不止一番,但庞大的用户群依然依靠这一小群工程人员来运行。谈及它成功的原因,WhatsApp 的软件工程师 Jamshid Mahdavi 表示,部分诀窍在于,公司使用了一种名为 Erlang 的编程语言构建其服务。尽管在更广泛的编码社区中 Erlang 并不是很流行,但 Erlang 特别适合处理来自大量用户的通信,它允许工程师动态地部署新代码。但 Mahdavi 表示,态度和技术同样重要。总结起来可以概括为两点,务实、极简的工程文化和正确的技术选型成就了如今的 WhatsApp。

WhatsApp 的工程文化包括三个主要原则:1、大事化小;2、保持简单;3、全神贯注于任务。大事化小

WhatsApp 有意将工程人员保持在小规模,只有大约 50 名工程师。个人工程团队也很小,由 1~3 名工程师组成,每个团队都有很大的自主权。

就服务器而言,WhatsApp 更倾向于使用数量较少的服务器,并且尽可能在最大程度上垂直地扩展每台服务器。

之前,他们的目标是每台服务器拥有 100 万个用户(但是当他们向应用程序添加更多的功能,以及用户在每个用户基础上产生更多的活动时,这变得更加困难)。

服务器数量减少意味着故障减少,从而使团队更易于处理。对于软件来说也是如此,它们限制了生产中系统和组件的总数。也就是说,需要开发、部署和支持的系统更少。不需要开发很多系统 / 组件,然后进入维护模式(最终成为孤儿,直到出错)。保持简单

WhatsApp 的口号是“不要过度工程”。他们避免在系统和组件上过度投资。取而代之的是,他们专注于建立足够的可扩展性、安全性和可靠性。

在进行技术选择的时候,其中一个关键因素是“什么是最简单的方法?”另外,他们避免在自动化方面的投资,除非完全必要。全神贯注于业务

WhatsApp 的产品设计非常专注。其主要目的是提供一个具有良好用户界面的核心通信应用程序。他们避免了额外的花里胡哨的功能,也不去实现那些不完全关注核心通信的功能。WhatsApp 认为,简单的产品更易于维护和扩展。WhatsApp 的技术栈

除了极简主义的工程文化以外,确定正确的技术方向以及合理搭建技术栈也是 WhatsApp 成功的秘诀之一。该技术栈围绕着三个核心组件:Erlang、FreeBSD 和 SoftLayer。Erlang

Erlang 是 WhatsApp 后端系统首选的编程语言。Erlang 从一开始就被设计用于并行性,容错性是该语言的一个主要特点。

了解更多有关 Erlang 的容错性,可复制链接到浏览器打开查看详情:

https://stackoverflow.com/questions/3172542/are-erlang-otp-messages-reliable-can-messages-be-duplicated/3176864#3176864

开发者使用 Erlang 的效率也非常高。但是,它是一种函数式语言,所以如果你不熟悉这种范式,则需要花一些时间来适应。这门语言非常简洁,只需少量几行代码就可以完成工作。

OTP(Open Telecom Platform,开放电信平台)是 Erlang 的开源中间件、库和工具的集合。

WhatsApp 尽量避免依赖关系,但它们确实使用了 Mnesia,一个 OTP 的分布式数据库。

Erlang 也带来了热交换代码的能力。无需重新启动应用就可以将新的应用代码加载到运行中的应用。这样可以让迭代周期非常快,WhatsApp 可以快速地发布补丁,并且服务的运行时间非常长。

想了解 WhatsApp 的后端究竟是如何用 Erlang 构建的,可复制链接到浏览器打开查看详情:

https://www.youtube.com/watch?v=LJx6mUEFAqQ

更重要的是,Erlang 允许编码人员高速工作——这是现代软件开发的另一个重要部分。它提供了在应用程序继续运行时将新代码部署到应用程序的方法。在这个不断变化的时代,这比以往任何时候都有用。

Erlang 语言确实有它的缺点。最主要的缺点之一就是了解 Erlang 的程序员相对较少,而且它不一定与当今互联网公司已经构建的许多代码兼容。Facebook 用 Erlang 构建了最初的 Facebook 聊天应用程序,但最终进行了重建,以便更好地适应其其他基础设施。FreeBSD

FreeBSD 是 WhatsApp 服务器使用的操作系统。

决定使用 FreeBSD 的是 WhatsApp 创始人,这是基于他们之前在雅虎的经验。创始人(以及许多早期团队)都曾经是雅虎的一员,而雅虎广泛使用 FreeBSD。

想了解 WhatsApp 到底是如何使用 FreeBSD 的,可复制链接到浏览器打开查看详情(请注意,这个讲座是 2014 年的,所以其中一些内容现在可能已经过时了):

https://www.youtube.com/watch?v=TneLO5TdW_MSoftLayer

SoftLayer 是 WhatsApp 在 2016 年使用的托管平台。他们选择 SoftLayer 主要有两个原因:

1、拥有一流的操作系统 FreeBSD。

2、能够订购并操作裸机服务器。

但是, SoftLayer 属于 IBM(IBM 公有云的一部分),后来 WhatsApp 从 SoftLayer 迁移到了使用 Facebook 的基础设施。2017 年,他们开始迁移。结  语

低调的工程师们并没有对外界透露出 WhatsApp 更多的成功秘诀,当被要求解释公司成功的秘密时,Mahdavi 的回答很简单。

他表示:

公司之所以成功,是因为聘用了适应性强的工程师——在多个方面都是如此。最重要的就是要非常专注于你需要做的事情。不要把时间花在其他活动、其他技术,甚至是办公室里的事情上,比如会议。在 WhatsApp,员工几乎从不参加会议,这也是关键所在。

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注