DSBoost

Share this post

Get your hands dirty if you want to learn ML - DSBoost #6

dsboost.dev

Discover more from DSBoost

Boost your Data Science knowledge
Over 3,000 subscribers
Continue reading
Sign in

Get your hands dirty if you want to learn ML - DSBoost #6

David Andrés
and
Levi
Mar 7, 2023
2
Share this post

Get your hands dirty if you want to learn ML - DSBoost #6

dsboost.dev
Share

Welcome to the sixth issue of DSBoost, the weekly newsletter where you can discover interesting people in the ML/AI world, get the main takeaways of a relevant podcast, and stay up to date with the latest news in the field!

💬 Interview of the week

This week we interviewed Pau Labarta, who is a Machine Learning Engineer. Enjoy:

  • What did you study/are you studying (if your background is different from DS, how did you end up in the field)?

I studied Mathematics and Finance. I started my professional career as a quant analyst in a bank, developing mathematical models to price and hedge financial derivatives. That was 10 years ago, and “data science” was still not such a big thing.

After that, I got my first “proper” data science job in a mobile gaming company. There I learned how to build dashboards with Tableau, mine raw datasets using Hadoop and SQL, and build automation and ML with Python. I was very fortunate to work close to a great team of data engineers, and told myself “when I grow up I wanna be a data engineer”.

Eventually, I decided it was time to build my thing, and started freelancing for international clients doing Data Science and ML development, and creating educational content.

  • What are your favorite resource sites and books (ML/AI)?

Kaggle is a great place to learn from the best. I used it intensively at the beginning when I was focused on building ML model prototypes.

Nowadays I base my learning on building projects, and when I get stuck, I google. I truly believe the only way to learn ML is to get your hands dirty, and read code (not arxiv articles), so go to Github a lot.

  • What got you into your current role (portfolio, certification, etc.)?

At the beginning of my DS career, I had always to pass some kind of screening process, where I had to build some kind of my project.

Nowadays, my approach is different. I build a reputation online, by sharing lots of free content on ML, and clients directly come to me. It takes time, true, but it is really worth it if you plan on being a solopreneur.

  • What do you enjoy the most in your work?

I love seeing the final product WORKING. Every project is a challenge, where you face many potential blockers (data quality, deployment challenges, communication with non-technical stakeholders…). Making it to the end makes you feel great.

  • What tools do you use the most / favorite tools?

VSCode, git and a bit of GitHub Copilot.

  • Do you use ChatGPT or other Al tools during your work? If so, how do they help you? Do they change your approach to problems?

I recently started using Github Copilot inside my VSCode. I use it for:

  • Boilerplate code generation, when I develop in Python and I wanna move fast.

  • Code suggestion, when I develop in Rust, as I am still not comfortable at it, and the system helps me level up my understanding of the language.

  • What is your favorite topic within the field?

Machine Learning in production (aka MLOps).

  • Which one of the recent AI/ML models will have the most significant impact on the industry in your opinion?

Generative models of all sorts (text, image, video) are here to stay. And things will get better, as we improve the interfaces to use them (aka prompt engineering).

  • What are you currently learning or improving (topics you are interested in nowadays)?

I started learning Rust, as a way to expand my development skills. I am also interested in creating production-grade MLOps platforms using serverless stacks.

  • What is the biggest mistake you've made? (preferably DS related)

My first ML project in a company was an absolute bummer… because I spent the first 4 weeks training models without even looking once at the data… And the data was shit…

It was very frustrating to tell my team lead, 2 months into the project, that the project was not going anywhere.

  • What is your most significant achievement? (preferably DS related)

When I was working in a mobile gaming company, I developed an ML system using GAN (Generate Adversarial Networks) that generated soccer/football player profile pictures at a massive scale. That was an incredibly fulfilling and enriching learning experience.

  • Can you share a fun fact about yourself?

My wife is a stand-up comedian in Serbia, where we live. And I started doing stand-up, as well, in Serbian! My next gig is in 3 weeks, so wish me luck.

Follow Pau

🎙️ Podcast of the week

How to Learn Data Engineering by Super Data Science

Key takeaways:

  • What is Data Engineering?

It allows data scientists to be getting refined data. Engineers process, clean very large amount of data. They process data between the data source and Data Scientists.

  • What tools should Data Engineers use?

It depends on the pipeline. Data is usually coming from relational databases.

Also, enginners need to be comfortable with APIs.

Then in the data processing phase: AWS, Spark, NoSQL database.

  • A path to becoming a Data Engineer:

The is no single path.

You need to understand the flow of data. Where the data is coming in, how to preprocess it. Learn tools that are fitting the pipeline.

AWS or Azure are usually the best starting points.

Thanks for reading DSBoost! Subscribe for free to receive new posts and support my work.

🧵 Featured threads

Twitter avatar for @SanthoshKumarS_
Santhosh Kumar @SanthoshKumarS_
Harvard University offers Free online Machine Learning courses. No Application or Fee is required. Here are 5 FREE courses you don't want to miss:
3:00 PM ∙ Mar 3, 2023
1,391Likes506Retweets
Twitter avatar for @akshay_pachaar
Akshay 🚀 @akshay_pachaar
Machine Learning cheatsheets for Stanford's CS 229! 🔥 Topics covered: - Supervised & unsupervised ML - Deep Learning - Probability & statistics - Algebra & Calculus Keep them handy! 🧵👇
Image
Image
12:30 PM ∙ Mar 4, 2023
1,569Likes444Retweets

🤖 What happened this week?

  1. A few days ago, Meta released LLaMA, a new large language model similar to GPT-3 but smaller in size, with 65B parameters, yet still performing well in many tasks.

  2. OpenAI has just enabled access to ChatGPT and Whisper models in its API.

  3. Microsoft has made progress in the field of multimodality by training KOSMOS-1, a model that can perceive and combine textual, visual, and audio information.

  4. A new experimental version of DALLE-2 has been released, which enhances the quality and fidelity of the model to the given prompt.

👥 Under the radar

Words from Sachin:

I'm Sachin Kumar, a Business Analyst at Octaculus Company. As a passionate Data Science enthusiast, I enjoy tweeting about a wide range of topics including Python, SQL, Tableau, Excel, Power BI, and Data Science in general.

Additionally, I've recently appeared for the Civil Services Examination in India. Outside of work and academics, I have a keen interest in reading news related to geopolitics.

Follow Sachin

2
Share this post

Get your hands dirty if you want to learn ML - DSBoost #6

dsboost.dev
Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

© 2023 David & Levi
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing