

Discover more from DSBoost
4 reasons why your portfolio project sucks - DSBoost #23
💬 Interview of the week
This week we interviewed Alberto Gonzalez Rosales, who is a Software Developer. Enjoy:
What did you study/are you studying (if your background is different from DS, how did you end up in the field)?
I studied Computer Science at the University of Havana.
What are your favorite resource sites and books (ML/AI)?
I started with all the basic tutorials at Kaggle and I think they are great to get you started in the fields of Machine Learning and Artificial Intelligence in general. I was able to produce satisfactory results in a real-life work environment just by passing their Introduction to Geospatial Data Science course. Nowadays, I'm trying to finish the course at course.fast.ai, which is a great way to learn about Machine Learning using a Top-Down approach.
What got you into your current role (portfolio, certification, etc.)?
My current role (which I don't know how to define) is the result of applying to multiple companies selling myself as the ultimate team player. I won't say I'm an expert in any field but I am an expert in adaptability, which is very valuable if you are part of any team. I've been a Backend Developer, a Frontend Developer, a Tester, and a Geospatial Data Scientist, and now I think I am a DevOps engineer but I do a lot of Data Analysis from time to time. Especially network analysis lately.
What do you enjoy the most in your work?
I enjoy the problem-solving part of any task. I like to be part of creating practical solutions from the point of view of Software Engineering. I also like tasks in which I need to optimize time and/or memory consumption. Lately, I have been doing most of my proofs-of-concept using jupyter notebooks and it has been a real challenge to integrate the code created for what I call "the exploratory mode" into real applications.
The "exploratory mode" is usually a lot of handling and visualization of data to validate some ideas. And that code is usually written quickly and without much of the best practices of software engineering. Moving that to a production-ready environment can be tricky, but it is definitely enjoyable.
What tools do you use the most / favorite tools?
I don't think I use that many tools apart from an editor (VSCode in my case) and some extensions to improve the coding experience. I do a lot of Python lately, so I might give PyCharm a try.
Do you use ChatGPT or other Al tools during your work? If so, how do they help you? Do they change your approach to problems?
Like most people, I used ChatGPT a lot in the beginning but I don't use it that much anymore. I do use AI tools that generate images, basically just for fun. Tools like ChatGPT definitely have the potential to change the way to approach problems. If they ever get to a point in which you don't ever have to code anymore that would be a great milestone in the history of Computer Science. It won't mean it is the end of programmers, it will only mean that your ideas can be implemented faster and in a different way (by talking to a machine with your voice, for example). But still, all the heavy thinking for a major breakthrough will have to be done by us humans.
What is your favorite topic within the field?
I don't think I have a favorite topic but if I have to say one I think it will most likely be Reinforcement Learning.
Which one of the recent AI/ML models will have the most significant impact on the industry in your opinion?
Large Language Models are already having a huge impact on society, in my opinion. The fact that most of the content you might see online could have been automatically generated is something to consider every time you are reading a publication of your interest. The fact that there are no good results (yet) in detecting AI-generated content can make it even more worrisome. Misinformation is an issue that has increased with the use of social media, and now it has a powerful ally in AI-generated content. On the other side, I have seen some great uses of AI in the automotive industry focused on driver assistance, automatic braking, and safety issues in general which I think is a good way to push our society forward.
What are you currently learning or improving (topics you are interested in nowadays)?
Lately, I have been interested in network analysis. I have a solid foundation of graph theory and seeing how some of the most basic concepts can give insightful information in networks of your interest has been a game-changer for me. It is a topic with many theoretical concepts and much more practical applications, so, nowadays, it is perfect for me since it keeps me close to academia and industry.
What is the biggest mistake you've made? (preferably DS related)
A mistake that I think practically every person with an interest in Data Science does is not to take full advantage of the top-tier libraries used nowadays for such purposes. If you do Data Science with Python for example, I can guarantee you that it is worth to try and master libraries such as pandas, and numpy. The impact on performance can be meaningful when you write custom code for tasks that are built-in within such libraries. Performance issues mean wasted time, and time is your most valuable resource. I struggled with this (and I still do occasionally), and it cost me double time and double work. What I'm trying to say here is that, use the tools built by those that came before you, use them right, and focus on building on top of them.
What is your most significant achievement? (preferably DS related)
If we talk about a Data Science-related achievement then it is probably the fact that my work on Geospatial Data Science I did a couple of years ago was finished (by other developers) and it is being used to detect and positively impact poverty regions in South America.
Can you share a fun fact about yourself?
Maybe. Let's go for a safe answer and list some of my hobbies instead. Exercising and doing sports, watching anime, and playing guitar... See? Fun stuff. At least for me :).
🎙️ Podcast of the week
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Guest: Nick Singh
Key takeaways:
Why portfolio projects are important?
There are many entry level jobs that require some type of experience. Needing experience for an entry level job? That is a catch-22! But with portfolio projects you can overcome, since you gain experience and increase the chance to be noticed by recruiters.
Make your own experience with portfolio projects!
Data Science job descriptions are usually recommendations and not requirements.
Even if you have 10 years of experience, you need to keep up with the latest tech trends, and the best way to do that is by doing projects.
There are content-based projects (like writing blog articles, and creating tutorials) and end-to-end analysis projects. Content-based projects are good to show that you can break down concepts and explain them with your great communication skills. But for landing jobs, end-to-end projects are more helpful.
To level up in Data Science communication skills are a must!
4 big reasons why your portfolio project sucks:
Boring idea. The Iris dataset? Who cares! Pick a story that is interesting to you and others as well
A topic that you cannot visualize well. To show off your work you need a visually appealing portfolio. People love great graphs
The portfolio is not finished!
Cannot show that the project has an impact. Quantify impact - this is the hardest one. Show how many people use your stuff.
🧵 Featured threads
🤖 What happened this week?
The biggest news of this week is that Stable Diffusion XL 0.9 from Stability AI is available for you to try on Clipdrop!
Stable Diffusion XL 0.9 is an advanced development in the Stable Diffusion text-to-image suite of models. This new model offers significantly improved image and composition detail compared to its predecessor. It enables the generation of hyper-realistic imagery for various creative purposes. SDXL 0.9 includes functionalities like image-to-image prompting, inpainting, and outpainting.
It achieves this advancement through a substantial increase in parameter count, using a 3.5 billion parameter base model and a 6.6 billion parameter model ensemble pipeline. The model is currently available for research purposes and in Clipdrop, and an open release is planned for mid-July with SDXL 1.0.
Let’s see some examples!