-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 7fb625e
Showing
88 changed files
with
3,440 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
.DS_Store |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# Machine Learning Systems Design | ||
|
||
This booklet covers four main steps of designing a machine learning system: | ||
|
||
1. Project setup | ||
2. Data pipeline | ||
3. Modeling: selecting, training, and debugging | ||
4. Serving: testing, deploying, and maintaining | ||
|
||
It comes with links to practical resources that explain each aspect in more details. It also suggests case studies written by machine learning engineers at major tech companies who have deployed machine learning systems to solve real-world problems. | ||
|
||
At the end, the booklet contains 27 open-ended machine learning systems design questions that might come up in machine learning interviews. The answers for these questions will be published in the book **Machine Learning Interviews**. You can look at and contribute to community answers to these questions on GitHub [here](https://github.com/chiphuyen/machine-learning-systems-design/tree/master/answers). You can read more about the book and sign up for the book's mailing list [here](https://huyenchip.com/2019/07/21/machine-learning-interviews.html). | ||
|
||
## Read | ||
To read the booklet, you can clone the repository and find the [HTML](https://github.com/chiphuyen/machine-learning-systems-design/tree/master/build/build1/consolidated.html) and [PDF](https://github.com/chiphuyen/machine-learning-systems-design/tree/master/build/build1/consolidated.pdf) versions in the folder `build`. | ||
|
||
## Contribute | ||
This is work-in-progress so any type of contribution is very much appreciated. Here are a few ways you can contribute: | ||
|
||
1. Improve the text by fixing any lexical, grammatical, or technical error | ||
1. Add more relevant resources to each aspect of the machine learning project flow | ||
1. Add/edit questions | ||
1. Add/edit answers | ||
1. Other | ||
|
||
This book was created using the wonderful [`magicbook`](https://github.com/magicbookproject/magicbook) package. For detailed instructions on how to use the package, see their GitHub repo. The package requires that you have `node`. If you're on Mac, you can install `node` using: | ||
|
||
``` | ||
brew install node | ||
``` | ||
|
||
Install `magicbook` with: | ||
|
||
``` | ||
npm install magicbook | ||
``` | ||
|
||
Clone this repository: | ||
|
||
``` | ||
git clone https://github.com/chiphuyen/machine-learning-systems-design.git | ||
cd machine-learning-systems-design | ||
``` | ||
|
||
After you've made changes to the content, you can build the booklet by the following steps: | ||
|
||
``` | ||
magicbook build | ||
``` | ||
|
||
You'll find the HTML and PDF files in the folder `build`. | ||
|
||
## Citation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Answer to question 1 | ||
|
||
**Question 1** | ||
|
||
Duolingo is a platform for language learning. When a student is learning a new language, Duolingo wants to recommend increasingly difficult stories to read. | ||
- How would you measure the difficulty level of a story? | ||
- Given a story, how would you edit it to make it easier or more difficult? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 10 | ||
|
||
**Question 10** | ||
|
||
Autocompletion: how would you build an algorithm to finish your sentence when you text? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 11 | ||
|
||
**Question 11** | ||
|
||
When you type a question on StackOverflow, you're shown a list of similar questions to make sure that your question hasn't been asked before. How do you build such a system? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 12 | ||
|
||
**Question 12** | ||
|
||
How would you design an algorithm to match pool riders for Lyft or Uber? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 13 | ||
|
||
**Question 13** | ||
|
||
On social networks like Facebook, users can choose to list their high schools. Can you estimate what percentage of high schools listed on Facebook are real? How do we find out, and deploy at scale, a way of finding invalid schools? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 14 | ||
|
||
**Question 14** | ||
|
||
How would you build a trigger word detection algorithm to spot the word “activate” in a 10 second long audio clip? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 15 | ||
|
||
**Question 15** | ||
|
||
If you were to build a Netflix clone, how would you build a system that predicts when a user stops watching a TV show, whether they are tired of that show or they’re just taking a break? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 16 | ||
|
||
**Question 16** | ||
|
||
Facebook would like to develop a way to estimate the month and day of people’s birthdays, regardless of whether people give us that information directly. What methods would you propose, and data would you use, to help with that task? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 17 | ||
|
||
**Question 17** | ||
|
||
Build a system to predict the language a text is written in. | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 18 | ||
|
||
**Question 18** | ||
|
||
Predict the house price for a property listed on Zillow. Use that system to predict whether we invest on buying more properties in a certain city. | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 19 | ||
|
||
**Question 19** | ||
|
||
Imagine you were working on iPhone. Everytime users open their phones, you want to suggest one app they are most likely to open first with 90% accuracy. How would you do that? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 2 | ||
|
||
**Question 2** | ||
|
||
Given a dataset of credit card purchases information, each record is labelled as fraudulent or safe, how would you build a fraud detection algorithm? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 20 | ||
|
||
**Question 20** | ||
|
||
How do you map nicknames (Pete, Andy, Nick, Rob, etc) to real names? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 21 | ||
|
||
**Question 21** | ||
|
||
An e-commerce company is trying to minimize the time it takes customers to purchase their selected items. As a machine learning engineer, what can you do to help them? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 22 | ||
|
||
**Question 22** | ||
|
||
Build a chatbot to help people book hotels. | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 23 | ||
|
||
**Question 23** | ||
|
||
How would you design a question answering system that can extract an answer from a large collection of documents given a user query? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 24 | ||
|
||
**Question 24** | ||
|
||
How would you train a model to predict whether the word “jaguar” in a sentence refers to the animal or the car? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 25 | ||
|
||
**Question 25** | ||
|
||
Suppose you’re building a software to manage the stock portfolio of your clients. You manage X amount of money. Imagine that you’ve converted all that amount into stocks, and find a stock that you definitely must buy. How do you decide which of your currently owned stocks to drop so that you can buy this new stock? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 26 | ||
|
||
**Question 26** | ||
|
||
How would you create a model to recognize whether an image is a triangle, a circle, or a square? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 27 | ||
|
||
**Question 27** | ||
|
||
Given only CIFAR-10 dataset, how to build a model to recognize if an image is in the 10 classes of CIFAR-10 or not? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 3 | ||
|
||
**Question 3** | ||
|
||
You run an e-commerce website. Sometimes, users want to buy an item that is no longer available. Build a recommendation system to suggest replacement items. | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 4 | ||
|
||
**Question 4** | ||
|
||
For any user on Twitter, how would you suggest who they should follow? What do you do when that user is new? What are some of the limitations of data-driven recommender systems? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 5 | ||
|
||
**Question 5** | ||
|
||
When you enter a search query on Google, you’re shown a list of related searches. How would you generate a list of related searches for each query? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 6 | ||
|
||
**Question 6** | ||
|
||
Build a system that return images associated with a query like in Google Images. | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 7 | ||
|
||
**Question 7** | ||
|
||
How would you build a system to suggest trending hashtags on Twitter? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 8 | ||
|
||
**Question 8** | ||
|
||
Each question on Quora often gets many different answers. How do you create a model that ranks all these answers? How computationally intensive is this model? | ||
|
||
**Answer** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Answer to question 9 | ||
|
||
**Question 9** | ||
|
||
How to you build a system to display top 10 results when a user searches for rental listings in a certain location on Airbnb? | ||
|
||
**Answer** |
Binary file not shown.
Oops, something went wrong.