For your responses to the questions below, I’m looking for something thoughtful, but not necessarily long. A single sentence is probably not enough to express thoughtfulness, but a well written paragraph can be.

You may submit your responses any time until Friday, April 29 at noon.


No data scientist is the same

I found what looks to be a very interestingly written series of blog posts related to XGBoost but also covering some “big picture” ideas about working as a data scientist. The series includes nine blog posts and an associated python notebook for each one. Here are direct links to the 9 blog posts (they appear on each post as well):

  1. Introducing our data science rock stars
  2. Data to predict which employees are likely to leave
  3. Good model by default using XGBoost
  4. Hyperparameter tuning for hyperaccurate XGBoost model
  5. Beat dirty data
  6. The case of high cardinality kerfuffles
  7. Guide to manage missing data
  8. Visualise the business value of predictive models
  9. No data scientist is the same!

The python notebooks are available at

Rather than have you code up an XGBoost model yourself, I’d like you to read some of the blog posts and answer a few questions. You are welcome to read all 9 posts (and I hope that you do), but you are only required to read enough to answer the questions below.

Side note: I found this because I subscribe to https://medium.com/ and they send me a daily digest of things they think I might be interested in. Since I tend to click on data science-related things, I get lots of data science-related suggestions. You might find this to be a good resource for yourself.

  1. Read the first blog post. Based on the descriptions there, which “data scientist rock star” do you think you are most/least like?

  2. What is the most important thing you learned about XGBoost from these blog posts that you we either didn’t cover in class or that didn’t “click” in class?

  3. What is the most important thing you learned about doing data science from these blog posts? How does it align with, complement, or run counter to what you have been learning in your data science courses at Calvin?

  4. Look through some of the python code (in the blog posts or at Gitlab) and comment on at least one thing that is interesting because it uses some python thing you didn’t know about or would not have thought to do that way.

  5. Now read the last post. Does this change your answer to or the way you think about question 1?

  6. Just because I’m curious: Which of the blog posts did you read? (Just list off the numbers.)

Course Wrap-Up

  1. Thinking back on the book reports we have been hearing, give one example of something you heard that got you thinking.

  2. If you were going to read one of the books that someone else reported on, which one would it be? Why?

  3. Think back a bit about our course as a whole. What do you think you will remember most about it 3 years from now?