Discover more from DSBoost
Small incremental changes can compound into substantial knowledge growth - DSBoost #31
💬 Interview of the week
This week we interviewed Egor Howell, who is a Data Scientist. Enjoy:
What did you study/are you studying (if your background is different from DS, how did you end up in the field)?
When I was a child, I enjoyed maths and science, which eventually guided me towards studying Physics at university. I always wanted to do a PhD and conduct research in fields like quantum mechanics or astrophysics. However, during my master's research year at university, I quickly realised that this route was not for me. The pace and nature of current Physics research were a lot slower than I expected. I had hoped it would be like the rapid succession of discoveries in the early 19th century, but reality proved otherwise.
As fate would have it, around the same time, DeepMind’s documentary on AlphaGo kept appearing in my YouTube recommendations. Watching that documentary was single-handedly the reason I started to get interested in data science and machine learning. I subsequently enrolled in online courses, learned Python, and began applying to data science graduate programs, ultimately securing a role. My pathway wasn't too unique, I'd say it's rather traditional, all things considered.
How do you seek out opportunities for professional growth and learning?
For professional growth, I actively seek out volunteer opportunities in my daily job. Things such as presenting my work, coordinating events, or trying new projects even if they're outside my day-to-day role. Long story short, I try to put myself outside my comfort zone, which cultivates me to improve many soft and professional skills.
My learning strategy is twofold. Outside of work, I'm an avid self-learner. This generally involves identifying a topic I am interested in and dedicating time to study it. I often take things a bit further and document what I learn through blog posts on Medium and YouTube videos. Articulating my insights not only reinforces my understanding but also creates a 'personal learning journal' for future me to look back on. Within my job, I try to experiment with new tools and techniques. For instance, I might opt for Polars or Spark over Pandas for my next exploratory data analysis. Small incremental changes like this can compound into substantial knowledge growth over time.
What do you enjoy the most in your work?
I love maths, especially when it comes to deriving models or concepts mathematically. The beauty lies in implementing these mathematical results into code, leading to tangible benefits for a company. As data scientists, the creativity and freedom we possess to explore solutions is, without a doubt, the most rewarding aspect.
How do you handle feedback, especially when it's critical of your analysis or models?
Luckily, in the organisations I have worked in, I've never faced any strong harshness or criticism regarding my work. However, feedback is an integral part of growth as a Data Scientist. I make an effort to remove emotion from the feedback I receive and process it analytically. The person offering feedback likely has both my best interests and the company's at heart.
What are you currently learning or improving (topics you are interested in nowadays)? And what resources do you use?
As I stated in my above answers, the maths and theory behind data science and machine learning are what I enjoy the most. However, in the real world, you also have to be able to deploy your model. So, I am currently learning some software engineering best practices and how to productionize models using some AWS. For most of this stuff I generally just ask the software engineers in my company and then I go research it and write about it. There is also this great course from freecodecamp. I am also tinkering a bit with reinforcement learning through David Silver’s old UCL lectures.
What is the biggest mistake you've made?
This might be familiar to many junior Data Scientists, but I once presented data analysis to senior stakeholders that was entirely incorrect. I mistakenly performed a left join between two datasets, confusing the datatypes of some of the columns. Sending the subsequent apology email was certainly not one of my proudest moments.
How do you adapt when a model or approach you've invested time in doesn't yield the expected results?
Building a model that doesn't deliver the anticipated results is just a step in the learning and iteration process. As the cliche saying goes, it's not a failure; instead, it's a step closer to success. I typically respond by finding the model's weaknesses, like which subset of predictions it performs poorly on, and dive into those. I can then refine the model and continue this iterative process until the model meets a certain standard.
You are currently developing a YouTube course about Time Series. Tell us a bit about it. Also, what do you find more challenging about doing this? Is it preparing the content, talking to the camera…?
About two years ago, I started my data science blog on Medium. It was primarily for me to solidify my understanding of the topics I was exploring and to document my learning journey. Somehow, it resonated with quite a few people, and I now have over 2.1k followers!
I've always enjoyed chatting and teaching technical topics as well as writing, and I had this content backlog from my blog. So, I had a perfect recipe to start a YouTube channel!
The course I'm currently creating focuses on time series analysis, aiming to be focused more on the intuition of the concept. While there's some maths, I try to avoid overwhelming viewers with endless equations. Instead, I aim to make the content more digestible and engaging than a traditional textbook or lecture.
The most challenging aspect? Speaking with clarity and precision. I naturally speak quickly and don't explain myself articulately. But YouTube is helping me address these tendencies. Still, I recognise there's much more to learn and improve upon!
How do you balance the demands of a data science career with personal life and self-care?
I have tried to master the art of focus, it's majorly underrated nowadays. If I give something my undivided attention for an hour (plus some lo-fi beats!) it's amazing what you can get done. Luckily for me, my work doesn’t go into overtime that much, I often clock off at 6 pm, which isn’t too bad. The hard part is managing all my other things outside of work: learning, blog, newsletter, and YouTube. I normally always prioritise self-care and personal commitments around those things, that's what works best for me.
🧵 Featured content
Interested in becoming a Data Engineer? This job title is becoming more and more relevant recently. Today we share tips and a roadmap to transition into this trendy and promising role!
In this Reddit post 16 tips are shared:
Data Engineering is fundamentally just moving data around, and reshaping it.
Be curious. Learn how things work. Try stuff out. Experiment.
Become intimately familiar with data types, sources, and structures.
Learn a General Purpose Programming Language. It doesn't really matter which one, it's the fundamentals that are important—everything else is just syntax.
If you don't know which one to pick, start with Python
Get good at SQL. It's nearly 50 years old and you're probably going to retire before it does.
No matter what systems and tools you use there's a good chance that it probably uses SQL or something pretty similar.
Even more modern data stores, like data lakes, are still queried using SQL
Learn to use the command line (PowerShell, CMD or Bash). There are so many problems that can be solved much faster in the terminal.
Learn how computers work, at least a little bit. How do they communicate? How do they process and store information? What does a server do?
Do as many personal projects as you can. Sign up for a GitHub account and publish them there.
Get really comfortable using APIs and parsing JSON data. Outside of databases this is probably how you're going to interact with most of your data.
Get really good at your tools, and then get better. But, also be at least familiar with what else is out there.
Understand the differences between a Database, Data Lake, and Data Warehouse. What's the difference between OLTP and OLAP?
Learn to use a Cloud Platform. AWS has a pretty good Free Tier, try it out and learn what the different services do.
Strong business knowledge is extremely valuable in both Data Engineering and Data Analytics.
Understand different business metrics and how they're calculated.
Learn to find the grain (level of detail) of data. How is it structured? What is the smallest unit? What exactly is a "row" in this table?
When it comes to data, everything (almost) is either a JSON, XML, CSV/*SV, SQLite, or Database.
Even proprietary files with different extensions are probably one of these. Tableau and Alteryx files are just XML files, and many applications store data in .db files (SQLite).
Sometimes a file is just a zipped folder of files. Excel for example is just a zipped folder of XML files.
These are all great pieces of advice! I guess you have some of them but not everything yet. Here you have a roadmap shared by Avi Kumar Talaviya with specific courses you can take to achieve your goal! 👇
When you finally achieve it, remember, never stop learning… Especially in this field, everything is moving so fast!