

Discover more from DSBoost
Companies still don't realize how useful data can be - DSBoost #16
Welcome to the 16th issue of DSBoost, the weekly newsletter where you can discover interesting people in the ML/AI world, get the main takeaways of a relevant podcast, and stay up to date with the latest news in the field!
💬 Interview of the week
This week we interviewed Josep, who is a full time Analytics engineer and part-time content creator on Medium. Enjoy:
What did you study/are you studying (if your background is different from DS, how did you end up in the field)?
I studied for a Bachelor of Physics Engineering, followed by a Master's in Big Data. Although my background wasn't directly related to Data Science, I have always enjoyed all data-related subjects the most. And knowing that Data Science can be applied in a variety of fields together with my curious nature, led me to this fascinating domain.
What are your favorite resource sites and books (ML/AI)?
That's a tough question because there are so many online resources, especially with the recent growth in interest. I usually like to stick to well-known sources like O'Reilly's books. But I also stay up-to-date by visiting websites that specialize in the topic, like KDnuggets and Fast.ai.
I use Medium a lot because it has great tips and easy-to-follow guides from other data experts. In my opinion, Medium is a super helpful place to learn!
What got you into your current role (portfolio, certification, etc.)?
Both my bachelor's and my master's were required to get into my current job. However, it required more than just computer work and looking at numbers. I enjoy seeing the whole picture of the projects I'm working on. This is why having both good technical skills and being able to support people with data-driven decision making helped me get the job I have now.
What do you enjoy the most in your work?
I really enjoy looking at data by myself and discovering hidden patterns in it. But I also like to do more than just computer stuff - I want to see the whole picture. That's why I love participating in end-to-end projects that start with complex technical deployments and end up with intelligence products. My ultimate purpose is to make data available for everyone and show people the importance of using data in their daily work. A lot of jobs and companies still don't realize how useful data can be, and I want to help change that.
What tools do you use the most / favorite tools?
In terms of programming languages, Python holds a special place as my one and only. Additionally, I regularly employ SQL in my day-to-day tasks.
As for working in cloud environments, I have a strong preference for the Google Cloud Platform. Its intuitiveness consistently impresses me, especially in comparison to Microsoft's Azure, which I currently use in my professional work.
When it comes to local development, I frequently rely on Jupyter Notebooks for quick analyses, as they offer remarkable flexibility and ease of use. However, for more complex coding tasks, I turn to PyCharm. And of course, GitHub to store all my projects! :)
Do you use ChatGPT or other Al tools during your work? If so, how do they help you? Do they change your approach to problems?
Ever since ChatGPT was launched I have introduced it to my daily routine. I use ChatGPT for a wide variety of goals:
Enhancing my writing: I utilize ChatGPT as a valuable resource for inspiration, creating article outlines, and overcoming the challenge of staring at a blank page. It has greatly improved my writing process.
Efficiently searching for information: Rather than relying solely on Google, I now turn to ChatGPT first for quick answers. It not only saves time but also presents information in a natural, easily digestible manner, further optimizing my learning experience.
Staying updated on data science news: I actively use AutoGPT to stay informed about the latest developments in the field of artificial intelligence.
Regarding my professional life, I still do not use much ChatGPT. There is a lot of concern about privacy regulations. As it is still not clear how ChatGPT deals with history data and given prompts, I prefer keeping it out of any sensitive info.
However, I do think ChatGPT allows us to think out of the box and optimize our daily work. It will change our way of working for sure.
What is your favorite topic within the field?
I am deeply passionate about all aspects of geographical data. During my Master's thesis, I concentrated on the study of traffic flow, and my two most recent job positions have been centered around human mobility, specifically in urban mobility and tourism. As for data science, I am fascinated by nearly every domain within the field. However, I recognize the need to further develop my expertise in areas such as Natural Language Processing and Computer Vision.
Which one of the recent AI/ML models will have the most significant impact on the industry in your opinion?
Undoubtedly, OpenAI's GPT (Generative Pretrained Transformer) stands out as a highly promising AI/ML model with the capacity to revolutionize multiple industries.
Indeed, I am pretty sure it has already changed most companies' and people’s way of working. As a cutting-edge language model, GPT excels in natural language understanding, generation, and transfer learning, showcasing remarkable capabilities. This completely changes the way we interact with technology.
The influence of GPT is anticipated to be substantial, spanning a diverse array of industries and applications, further transforming the landscape of artificial intelligence.
What are you currently learning or improving (topics you are interested in nowadays)?
I am currently focused on enhancing my programming skills by consistently participating in advanced Python courses. In addition, I am actively working to improve my GitHub abilities, which allows me to showcase my coding projects and collaborate with other developers. As I continue to grow as a programmer, I am considering expanding my skill set by exploring new languages, such as Julia, known for its high performance and versatility in various fields.
What is the biggest mistake you've made? (preferably DS related)
I've made some big mistakes, like accidentally deleting production databases or making a warning system that sent over thousands of messages to a slack chat group. But, hey, if you don't mess up sometimes, it means you're not trying hard enough - or at least I hope so! ;)
What is your most significant achievement? (preferably DS related)
For now, my most significant achievement - and the one I am the proudest - has been creating my own personal brand as a data science content creator. I had never thought I could share insightful info with other professionals, but ever since I published my first Medium article I realized it was completely feasible.
Now, I'm putting all my energy into getting even better at it! :D
Can you share a fun fact about yourself?
I love traveling around and one of my dreams would be to backpack south-east Asia. I lived in Taiwan for several months before Covid and I would love to be back for a while in the coming years!
As an additional fun fact, I am learning Chinese but my partner - with Chinese roots - does not speak it fluently… it is a sad story indeed :(
🎙️ Podcast of the week
SuperDataScience 675: Pandas for Data Analysis and Visualization
Key takeaways:
Pandas is a key library for Data Scientists:
It has a market share advantage due to its popularity and extensive resources available in the community.
It is a well-tested library, making it reliable and trustworthy.
It is great for prototyping and exploring data analysis, especially for working with two-dimensional tables of data.
Pandas 2.0 uses Apache Arrow in the backend, making it extensible over multiple devices, which is advantageous when working with large datasets.
Whether someone should start with statistics or programming depends on their learning style and interests. It's important to do what works for you and stay positive about it.
It's great to learn statistics and programming together, as it allows for hands-on learning and experimentation.
When working with pandas, it's important to read the documentation carefully as different libraries treat metrics like standard deviation differently.
🧵 Featured threads
🤖 What happened this week?
Google made several AI-related announcements during the Google I/O event in 2023. The company has now positioned itself as an "AI-first" company and has "reimagined all of its core products."
The top five announcements are:
Gmail's Help Me Write feature allows users to auto-generate entire emails, expanding on the Smart Reply and Smart Compose features.
Google Maps is introducing an AI-powered Immersive View for Routes, which will transport users into a digitally created model of their exact route, allowing them to spot landmarks and stops along the way.
Google Photos is adding Magic Editor, which moves individual elements in a photo and auto-generates elements that are cut-off by the photo to make them whole if a user wants to center them in a photo.
Google Search will respond to users' search queries in a conversational way, find web results for multifaceted queries, and suggest additional follow-up search query questions.
Google's AI chatbot Bard, now powered by PaLM 2, Google's newest large language model, will support 40 new languages and integrate with third-party apps and platforms. Additionally, generative AI features will soon be coming to Google Workspace products.