

Discover more from DSBoost
"Every Data Scientist should read this book!"- DSBoost #28
💬 Interview of the week
This week we interviewed Tomi, who is a Data Analyst & Author of Data36. Enjoy:
What did you study/are you studying (if your background is different from DS, how did you end up in the field)?
I've studied architecture… which is completely different from data science. Or well, it has some similarities. For instance, both fields require a keen sense of detail and an understanding of complex systems. They both also demand creative problem-solving skills and a strong ability to visualize abstract concepts. But apart from that, I came with bare feets to data – especially to coding.
My path toward the field was quite unusual – although in this field having an unusual career journey… is pretty much usual, as I've learned from many colleagues.Anyways: when I studied architecture, I realized that I don't really like it. So at the same university I started taking the classes of the Economic Faculty instead. Luckily, this was allowed in our university for anyone and for free. So I tried to make the most out of my originally bad choice. It went so well that I was hired as a tutor at one of the apartments of the Economic Faculty. Here, I participated in many extracurricular activities – the most important one was a research contest that we won with a friend. As a consequence of that – and with the help of a few awesome university professors – I got my first internship opportunity at a startup company. It started as an unpaid internship, but I took it anyway because I wanted to learn.
What are your favorite resource sites and books (ML/AI)?
I have many but the one that had the most influence on my data mindset was Lean Analytics by Croll & Yoskovitz. I've read this book early on in my career and read since then at least ~4-5 times. While most data books are about coding and statistics, this one is about the crucial business thinking aspect… which is very-very important, yet not really taught or written about in learning materials. That's why I love this book, I really think every data scientist should read it.
What got you into your current role (portfolio, certification, etc.)?
During the internship that I mentioned above – a few simpler data tasks fell on my table (building some charts and cleaning some spreadsheets) – and I really enjoyed these. My manager was really happy with the outcomes of those tasks. And he was also extremely helpful and supportive – basically, he started to give me more and more data related tasks. When he saw that I continuously delivered value through these mini-projects, he helped me to grow into a full-time junior data analyst role, too. He found me mentors and sent me to workshops and conferences (on a company budget). Of course, it wasn't an easy path. I didn't know any coding at the beginning, for instance – so next to the full-time job, all my free time went into learning more about data science (through hobby projects, books and online tutorials). Thanks to my manager, my mentors and my own hard work, in ~1-2 years I transitioned my career 100% into data science.
But again, I was lucky, too, that I got the chance to 100% learn by doing via real life projects (surrounded with senior data scientists).
What do you enjoy the most in your work?
Every part – I guess the beauty of it is its complexity. I love to code which can really put me into a flow state of mind. But I also love when I have to share my findings with people.
If I really have to pick one thing though, I'd say I love the unique experience of the moment when you find something new in the data set. Something game-changing, that no one (yet) knows about, only you… That moment of discovery is priceless. Then you share it with others and you see the impact it makes on business, it's also really-really great.
What tools do you use the most / favorite tools?
I don't have any favorite tools. I use the tools that get the job done. For now, it's Python (with all its cool libraries: pandas, numpy, sklearn, etc.) SQL, bash, a few dataviz tools and of course, data scientists new best friend: ChatGPT
Do you use ChatGPT or other Al tools during your work? If so, how do they help you? Do they change your approach to problems?
Absolutely. ChatGPT is almost constantly open in a tab on an external monitor nowadays. I still write most of my code by myself, especially the core logic.
But I usually use ChatGPT to:
finish my code
find and fix bugs
refactor my code
write regexp 🙂
and for most of the things that so far I Googled
I don't use ChatGPT to write my full code – because it performs really poorly with that. (Especially in the niche fields that I work in.)
I also don't upload huge datasets to ChatGPT and ask it to find something useful for me… For two reasons: it's not really good with these types of tasks either (it's not trained for that) – but most importantly, it's because of my NDAs. :-)
I think I should use GitHub's Co-pilot, as well – so far I haven't tested it though because it doesn't have an integration to my current work environment… But I have heard so many good things about it from colleagues that I really consider changing my coding environment just because of Co-pilot.
What is your favorite topic within the field?
I don't really have a main domain. I came from the online world (worked with startups, e-commerce businesses and similar), so obviously I have more experience in these fields. But occasionally, I end up on projects that are in a whole different domain like investment, manufacturing, etc. These are also exciting for me – exactly because they are new for me.
Which one of the recent AI/ML models will have the most significant impact on the industry in your opinion?
100% deep neural networks. I mean… after all the recent buzz regarding AI, what else. ;-)
What are you currently learning or improving (topics you are interested in nowadays)?
I try to continuously learn about every part of DS: coding, business and statistics. For now, I try to dig deeper into neural networks (of course) and understand the concept to the tiniest details. I find the free learning materials of Andrej Karpathy extremely useful. (He has a Youtube channel.)
But I'm still a firm believer of learning by doing, so most of my learnings happen throughout my projects. (Currently, one of my public data-related projects is whiskyreturns.com)
What is the biggest mistake you've made? (preferably DS related)
I had many. One of the most memorable mistakes I have made was during my junior years. I wrote a script that analyzed a 2-dimensional csv table. In the script, I referred to the columns not by their name but by their number. That's a huge mistake – I learned that the hard way.
The short story: the script automatically pulled the data and created a chart from it. My manager used this chart to monitor a key-metric of our product. When the company released a new feature, the chart showed a 10% increase in key-metrics. Everyone was happy, until I learned that the 10% increase is not real. It was not because of the improved user experience – but because of my mistake in the code. When the new feature was released, the data table I worked with was expanded with one more column. Because in my script, I didn't refer to my columns with names, but numbers, my script started to pull data from the wrong column. When I fixed the issue, we saw that the key-metric changed in fact with 0%. Ouch.
My manager had to email his manager… So it was a bit embarrassing.
Luckily, our company culture supported failures, so it was all OK – but I'll never forget this mistake.
What is your most significant achievement? (preferably DS related)
I really-really hope that my most significant achievement in data science is yet to come. But I'm really proud of what I created in data science education – especially data36.com with all the free articles, and with my unique online course: The Junior Data Scientist's First Month.
Please tell us more about Data36!
It's my website where I teach data science. I guess my headline sentence really summarizes it all: "Learn Data Science the Hard Way!" If that makes you interested, just check out the website itself: data36.com
🎙️ Podcast of the week
What is LangChain and Why Will it Change the World? (Greg Kamradt) - KNN Ep. 160
Key takeaways:
Success is simple but not easy; doing simple tasks and overcoming difficult challenges leads to success.
Find something you have energy for and build on top of it. Follow the edge of what excites you but also makes you uncomfortable.
Build in public and share your projects with others for feedback and improvement.
Be clear about what you want to get out of a role when joining a company and be open to exploring different career paths.
Proof of work and personal projects are compelling arguments when seeking job opportunities.
Cold emails to recruiters and hiring managers can lead to higher response rates than applying through traditional job portals.
Different strategies are required for job hunting at different career levels.
Building a personal brand and sharing in public can lead to more opportunities.
Build a personal brand, share your work, and experiment with different projects to create opportunities.
To transition to a decision-making role, be intentional about your career path and seek guidance from those ahead of you.
Use tools like ChatGPT to brainstorm and find ideas for transitioning your career.
🧵 Featured Content
🤖 What happened this week?
OpenAI has launched GPTBot, a web crawler aimed at enhancing future AI models like GPT-4 and GPT-5, with the ability to filter out sources violating privacy policies; website owners can choose to restrict or customize GPTBot's access, sparking debates on ethical and legal concerns regarding scraped data use in AI training.
AudioCraft by Meta AI introduces a streamlined approach to building generative models for audio, using a single autoregressive language model to effectively capture long-term dependencies and generate high-quality audio sequences, leveraging the EnCodec neural audio codec for token-based representation and decoding. The platform includes AudioGen for text-to-sound and MusicGen for text-to-music generation tasks.