

Discover more from DSBoost
"You don't need any math to get started in ML"- DSBoost #12
Welcome to the 12th issue of DSBoost, the weekly newsletter where you can discover interesting people in the ML/AI world, get the main takeaways of a relevant podcast, and stay up to date with the latest news in the field!
💬 Interview of the week
This week we interviewed Tivadar, who is a Mathematician and the author of Mathematics of Machine Learning. Enjoy:
What did you study/are you studying (if your background is different from DS, how did you end up in the field)?
I am a pure mathematician. I started as a researcher of arcane topics such as orthogonal polynomials but moved on early toward computer vision and its applications in biology. This was around the time convolutional networks started to dominate image-related tasks. Naturally, this led me toward deep learning, and the rest is history.
What are your favorite resource sites and books (ML/AI)?
An Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani
Grokking Deep Learning by Andrew Trask
Machine Learning with PyTorch and Scikit-Learn by Sebastian Raschka, Yuxi Liu, and Vahid Mirjalili
What got you into your current role (portfolio, certification, etc.)?
My insatiable craving for huge risks and making the world a better place at the same time :D Currently, I am all in as a machine learning content creator.
What do you enjoy the most in your work?
Writing my Mathematics of Machine Learning book. When I can put 100% focus on writing my book, I am a happy person.
Please tell us more about your book.
Around two years ago, when my last project failed, I started to post machine learning math threads on Twitter for fun. Instead of the usual arcane approach, I was aiming to explain concepts from the perspective of a machine learning engineer. My threads quickly found a large audience. As I really enjoy talking about math and was looking for a new project, I realized that compiling my threads into a book is what I want to do. This became the Mathematics of Machine Learning book.
Currently, the book is out in early access, and slowly nearing its completion. I plan to fully cover all the linear algebra, calculus, probability theory, and statistics that are used in machine learning.
There is controversy around the level of math required for Machine Learning. The question always is: "What level of math do I need to start?" What is your opinion?
In 2023, I don’t think you need any math to get started. Even I would recommend getting your hands dirty with hands-on projects such as handwritten digit recognition, or any simpler Kaggle competition. Nowadays, the technical complexities of a machine learning algorithm are hidden behind abstractions such as libraries and APIs. I find this to be extremely beneficial, as it supercharges the development of machine learning-based solutions.
Math becomes useful when you have to pop the hood and look inside the black boxes; and you’ll have to do this eventually. Even then, you’ll only need basic linear algebra, calculus, and probability.
Math is a hard topic both to learn and teach. How do you learn efficiently and what is your approach to teaching concepts?
The best way to learn is to solve problems. This applies not just to mathematics, but every other field in general.
My approach is to layer clear and application-oriented explanations and practical problem-solving, like implementing what you’ve just learned. Check out the first two chapter of my book: we start with the mathematical concept of vectors, then immediately dive into how they are represented inside a computer.
What makes math fun for you? and how do you make it fun for others?
As a pure mathematician, mathematics is fun because of its arcane beauty. However, this exactly why it is frustrating for engineers and machine learning practitioners.
Thus, I have to highlight other facets of math. I have two approaches that can make end-users appreciate math: “look how mind-blowingly useful this concept is in your work”, and “look, here is what’s behind this simple formula you have been using for years”.
What tools do you use the most / favorite tools?
Jupyter Notebook for writing
Jupyter Book for turning notebooks into a beautiful book
ChatGPT for overcoming writer’s block
Spotify to keep me fueled with music
Do you use ChatGPT or other Al tools during your work? If so, how do they help you? Do they change your approach to problems?
What do you think answered this question? :D In all seriousness, I use ChatGPT all the time. Not just in my work, but also in my personal life.
I never use language models to write, but I always use them for research. They are excellent at providing a large amount of information in short bursts. Unlike, say, Google, where you would have to sift through pages of results, and might not even find what you want.
What is your favorite topic within the field?
Neural networks and convolutional networks. As I am coming from computer vision, I am biased towards them, but damn, they are powerful and amazing.
Which one of the recent AI/ML models will have the most significant impact on the industry in your opinion?
I am not going out on a limb here if I assume that almost everyone answers either Stable Diffusion or GPT-4 here. So, let’s talk about something more niche, yet insanely impactful.
The recent Segment Anything Model (SAM) from Meta AI is going to be crazy. Segmentation and object detection are pretty much the two fundamental problems of computer vision, and these have thousands of industrial applications. Self-driving vehicles. Industrial quality control. Medical diagnostics. Microscopy. I could go on for hours: SAM is going to be a game-changer.
What are you currently learning or improving (topics you are interested in nowadays)?
I am more of a content creator than a machine learning practitioner now, so I’ll focus on soft skills here that are useful for everyone.
Right now, I am working on optimizing my exercise and diet. Trust me, the right lifestyle can enhance your career more than any data science hard skill. I have recently introduced intermittent fasting, and my focus has never been better. I feel like I am 20 again. (And I also lost the remaining excess fat I had.)
What is the biggest mistake you've made? (preferably DS related)
Instead of big mistakes, I tend to make many small mistakes that eventually pile up and trainwreck the project. I am notorious for going in over my head and setting too big goals. Like writing a book about all the mathematics of machine learning :)
What is your most significant achievement? (preferably DS related)
Convincing machine learning engineers that mathematics is useful and interesting.
Can you share a fun fact about yourself?
When I was five, I had the idea that licking the ice off the inside of our fridge will be as good as ice cream. My tongue instantly got glued to the freezer, and my parents were out gardening. It took them a while to hear my muffled screams and melt the ice around my tongue. Twenty-seven years later, I find this story extremely funny.
🎙️ Podcast of the week
DataFramed #132 “The Past, Present, and Future, of the Data Science Notebook”
Key takeaways:
Jupyter notebooks are a critical aspect of the data science ecosystem, and they have become the de facto tool for doing data science work.
Tooling fragmentation is a major challenge in the data space, and notebooks have helped democratize insights by making it easier to access resources.
Jupyter notebooks have become a direct launch pad into production, with AWS SageMaker prediction endpoints being one of the first examples of this.
Large Language Models and chat mean that notebooks will become even more important in the future of data science.
Collaborative and cloud-based notebooks have made it easier for people to learn data science and get to models much faster.
Thanks for reading DSBoost! Subscribe for free to receive new posts and support my work.
🧵 Featured threads
🤖 What happened this week?
Elon Musk, who previously expressed his opposition to the development of language models such as GPT-4, has now introduced X.AI as a competing project to OpenAI.
Why is there so much hype about Auto-GPT? This is an autonomous AI system that uses GPT-4 to write and execute Python scripts, with the ability to recursively debug, develop, and self-improve. It employs a feedback loop to improve its behaviours.
Discover HuggingGPT, which uses ChatGPT for task planning upon receiving a user request, selects appropriate models from Hugging Face's function descriptions, executes each subtask using the chosen AI model, and finally summarizes the response based on the execution results!