Wednesday, July 8, 2020

How to Design Twitter (Part 1)

How to Design Twitter (Part 1) From last weeks survey, most of our subscribers want to know more about system design interview. Its hard to provide tips without any example, thats why in this post Ill use a very popular system design question to show you how to solve it. Its also worth to note that system design interview question can be extremely open-ended, thus theres no such thing as a standard answer. Even for the same question, youll have a totally different discussion with different interviewers for sure. As a result, dont focus too much on the question or solution itself. Instead, youd better try to generalize certain patterns or strategies from the analysis. Well start with this “simple” question how will you design Twitter? Note: I never worked at Twitter and the whole article is to illustrate system design ideas. Also, thanks the whole Gainlo team as we worked together to provide all these analyses. Common misunderstandings Before we jump into the analysis, Id like to clarify one of the most common misunderstandings among job seekers. When asked to design Twitter, many people tend to dive into technical details immediately. One common behavior is to list a bunch of tools or frameworks like MongoDB, Bootstrap, MapReduce etc. and try to explain which particular technology we should use. What interviewers really want is high-level ideas about how you will solve the problem. It doesnt matter what tools you will use, but how you define the problem, how you design the solution and how you analysis the issue step by step are truly important. These are exactly what separate junior developers from senior engineers. If you havent checked this post, you should also take a look. Define the problem Lets get started with the problem how to design Twitter. System design questions are usually very general, thus not well-defined. Thats why many people dont know how to get started. This is similar to real life projects. Therefore, the first thing we should do is to further define the problem, making it clearer and more concrete. In this problem, what I would do first is to compress Twitter to its MVP (minimum viable product). In other words, we will only design core features of Twitter instead of everything. So the whole product should allow people follow each other and view others feeds. Its as simple as it is. (If any feature is needed, the interviewer should be able to clarify). Anything else like registration, moment, security etc. is out of the scope of discussion. High-level solution As we said before, dont jump into all the details immediately, which will confuse interviewers and yourself as well. The common strategy I would use here is to divide the whole system into several core components. There are quite a lot divide strategies, for example, you can divide by frontend/backend, offline/online logic etc.. In this question, I would design solutions for the following two things: 1. Data modeling. 2. How to serve feeds. Data modeling If we want to use a relational database like MySQL, we can define user object and feed object. Two relations are also necessary. One is user can follow each other, the other is each feed has a user owner. Serve feeds The most straightforward way is to fetch feeds from all the people you follow and render them by time. The interview wont be stopped here as there are tons of details we havent covered yet. Its totally up to the interviewer to decide what will be discussed next. Detail questions With that in mind, there can be infinite extensions from the high-level idea. So Ill only cover a few follow up questions here. 1. When users followed a lot of people, fetching and rendering all their feeds can be costly. How to improve this? There are many approaches. Since Twitter has the infinite scroll feature especially on mobile, each time we only need to fetch the most recent N feeds instead of all of them. Then there will many details about how the pagination should be implemented. You may also consider cache, which might also be helpful to speed things up. 2. How to detect fake users? This can be related to machine learning. One way to do it is to identify several related features like registration date, the number of followers, the number of feeds etc. and build a machine learning system to detect if a user is fake. 3. Can we order feed by other algorithms? There are a lot of debate about this topic over the past few weeks. If we want to order based on users interests, how to design the algorithm? I would say few things we should clarify to the interviewer. How to measure the algorithm? Maybe by the average time users spend on Twitter or users interaction like favorite/retweet. What signals to use to evaluate how likely the user will like the feed? Users relation with the author, the number of replies/retweets of this feed, the number of followers of the author etc. might be important. If machine learning is used, how to design the whole system? 4. How to implement the @ feature and retweet feature? For @ feature, we can simply store a list of user IDs inside each feed. So when rendering your feeds, you should also include feeds that have your ID in its @ list. This adds a little bit complexity to the rendering logic. For retweet feature, we could do the similar thing. Inside each feed, a feed ID (pointer) is stored, which indicates the original post if theres any. But be careful that when a user retweets a tweet that has been retweeted, you should be able to figure out the correct logic. This is a product decision whether you want to make it into many layers or only keep the original feed. Conclusion There are just too many things Id like to cover in this topic, thus, I have to divide it into two parts. Its worth to clarify here that theres no such thing as the standard solution for each of the questions Ive mentioned. In other words, there are many alternative solutions that are equally good or better than the one above. Also, the problem has been simplified a lot, Im pretty sure that in production Twitter has developed much more things. Try to emphasize again here, the solution is not what you should care most. Instead, try to understand how I approach and analyze the problem. If you find this post helpful, I would really appreciate if you can share it with your friends. Also you can check more system design interview questions and analysis here. Check Part 2  for further discussion.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.