

Discover more from DSBoost
Why XGBoost wins competitions? - DSBoost #18
Welcome to the 18th issue of DSBoost, the weekly newsletter where you can discover interesting people in the ML/AI world, get the main takeaways of a relevant podcast, and stay up to date with the latest news in the field!
💬 Interview of the week
This week we interviewed David Miller, who is a Data Scientist. Enjoy:
What did you study/are you studying (if your background is different from DS, how did you end up in the field)?
I studied accounting and finance in college, but recently completed a masters degree in data science. I made the transition over several years by changing departments at work and learning analytical tools and coding languages on the job.
What are your favorite resource sites and books (ML/AI)?
As a beginner, I loved DataCamp because of how easy it was. These days, I get most of my value from GitHub, Kaggle, and Medium.
What got you into your current role (portfolio, certification, etc.)?
I recently moved back into consulting after working as a data scientist at American Express. Previous work experience + my formal schooling are what helped me land these positions.
What do you enjoy the most in your work?
I like data modeling and presenting findings. To be honest, I think fitting ML models can be a bit boring! Going from raw data to data that is ready to be used for analysis and modeling is always a challenging and unique task, and I love the puzzle-solving nature of it. I moved back into consulting because I missed how frequently I got to “present findings”. I like the pressure of knowing that my work is meaningful. Delivering confident and concise findings that get stakeholders exciting is a very rewarding feeling.
What tools do you use the most / favorite tools?
I use Python and Excel 99% of the time. I should really be better at Tableau or PowerBI, but I can’t find the time to level up my skills. Because of my accounting/finance background, I’m very strong in Excel and I love how flexible it is to create unique and nuanced analysis.
Do you use ChatGPT or other Al tools during your work? If so, how do they help you? Do they change your approach to problems?
ChatGPT has made me SO much better at coding in Python. Disclaimer: you definitely need a strong foundation. ChatGPT will not take you from 0. But it is incredibly helpful for writing boilerplate code, debugging issues, and instructing it in plain language when I’m not sure the functions I need to execute the task.
What is your favorite topic within the field?
I’m business-oriented, so I like the common use cases for ML in business: forecasting, lead scoring, marketing optimization, and customer retention.
Which one of the recent AI/ML models will have the most significant impact on the industry in your opinion?
I think ChatGPT will remain a winner because of how valuable it is to integrate into the Microsoft Office stack. Bard may surpass it for individual/personal use though! What are you currently learning or improving (topics you are interested in nowadays)? I’m currently focused on forecasting and automation.
What is the biggest mistake you've made? (preferably DS related)
It took me way too long to learn in public, make connections with others doing the same, and build (and publish) portfolio projects. I probably could’ve saved a ton of money on grad school if I did these things from the start.
What is your most significant achievement? (preferably DS related)
Getting the job at American Express was the culmination of years and years of hard work.
Can you share a fun fact about yourself?
I’m 4 months away from being a first-time father!
🎙️ Podcast of the week
SDS 681: XGBoost: The Ultimate Classifier, with Matt Harrison
Key takeaways:
XGBoost is a tree-based algorithm, it stands for Extreme Gradient Boosting.
Matt shares the golf analogy:
Decision tree is playing golf alone and you have only one chance to hit the hole.
Random forest is like saying, you and 20 good golfer friends all can hit and we're going to average the hits.
A gradient-boosted decision tree would be like, you can hit the ball as many times as you want. And each time you're correcting the errors and getting it closer to the hole.
XGBoost tends to do a decent job out of the box and with a little bit of fine-tuning it is super powerful. XGBoost performs especially well in online competitions and usually wins with the highest accuracy because it is really good with unseen data!
XGBoost slightly overfits out of the box but still gives better results than other models
It works well with missing data and categorical values!
It captures non-linear relationships really well.
🧵 Featured threads
🤖 What happened this week?
Brad Smith, President, and Vice Chair of Microsoft, emphasized the need for faster government regulation of AI due to its immense potential for the benefit of humanity. Smith highlighted the ubiquitous use of AI in various fields such as medicine, drug discovery, disease diagnosis, and disaster response. He dispelled the notion that AI is mysterious and stated that it is becoming more powerful.