Discover more from DSBoost
“I'm Afraid I Can't Do That” ~ ChatGPT - DSBoost #32
Biased AI and Prompt Refusal
“I'm Afraid I Can't Do That” - ChatGPT
Prompt refusing is when the LLM understands a prompt and responds that it is not willing to do it for you.
What is refused by GPT?
Violence or Harm
Discrimination or Hate Speech
What type of questions ChatGPT will answer can be biased since human feedback through reinforcement learning is involved. The labeling process for training the model has some challenges. Some responses are difficult to categorize due to ambiguities or unclear prompts.
“For instance, you might say, give me an argument for x, y, or z. And it would say, I cannot, but here's an argument for w, which is related. Was that complying or not? - Prompt Refusal podcast by Data Skeptic
Here is another example from the podcast:
The prompt is:
If we take all the uranium in the world and make the biggest nuclear bomb possible, how destructive would it be?
Is it a sincere prompt or not, is it ‘dangerous’ or not?
It can be sincere because that's just like a general question that someone curious might want to know. Or this could be a high school physics question.
But ChatGPT refused to answer it. It said:
‘As an AI language model, I'm not programmed to promote violence or provide information that could potentially be used for harm.’
I double checked and this is what I got:
In a recent podcast, OpenAI CEO Sam Altman said, “The bias I’m most nervous about is the bias of the human feedback raters.”
The bias can be detected:
Some tests were run on the political orientation of ChatGPT and the results show that it differs from the general population and aligns more with Silicon Valley culture.
When given the prompt “Write a poem about [President’s Name],” refused to write a poem about ex-President Trump, but wrote one about President Biden. Interestingly, when we checked again in early May, ChatGPT was willing to write a poem about ex-President Trump. - The politics of AI: ChatGPT and political bias, brookings.edu
The biases are also coming from the training data. For different models, the sources can vary, but these data can be biased heavily sometimes.
What to do?
It is dangerous to have a world where all of the big language models are under lock and key at corporations like Microsoft. A good step in the right direction would be for all these big companies who have the resources to train these giant models, to explain the more subjective functions that they're training them on, so we can evaluate if those are good fits to our values/ethics or not. - Prompt Refusal podcast by Data Skeptic
Thanks for reading DSBoost! Subscribe for free to receive new posts and support our work.
🧵 Featured content
R vs. Python?
Source: Reddit post
When I get the R vs. Python question my answer is easy. Go with Python.
Please note that I only know basic R syntax and I code 99.99% in Python so my answer is biased. But here is the why:
The question usually comes from young enthusiasts, who like numbers, visualizations, or even statistics. But don’t have deep knowledge, just a general interest in data. They are beginners in programming.
They start the journey but not all of them will be Data Scientists. Some will be consistent, some will pivot.
Now consider 4 folks on this journey, Amy, Bob, Charlie, and David:
Amy started with R. She loves data and R syntax as well, and she will join a company as a Data Analyst soon.
Bob also started with R, but he figured out that data was not his passion. He is more into software development, so now he starts to learn Python from the basics.
Charlie is similar to Amy, but instead of R, he studied Python. He will also join the same company in a similar role. They will be friends for sure.
David started his data journey with Python, but just like Bob, he pivoted towards software development. Since he already knows Python he will be way ahead of Bob.
The reasons why I suggest Python over R is Bob and David. For Data tasks R and Python are pretty much the same. (Of course, you can find some edge cases where one would be above the other but this can go back and forth). But in other use cases, Python will kill R:
To conclude: If you find out that data is not your passion it is easier to pivot with Python.
Just compare the popularity of the two languages:
What do the Reddit comments say?
“R wins for general-purpose data science.
Python wins for general-purpose programming.”
From the top comment:
The number of stats packages is far beyond anything in Python
The number of bioinformatics packages is FAR beyond Python
Tidyverse (dplyr/tidyr especially) destroys every single thing in Python, pandas here looks like a bad joke in comparison
ggplot2 is still king
Base Python is faster
New versions and features coming far more quickly than R.
Massive community support, unlike R, Python doesn't rely on one company (Posit) and bunch of academics to keep it alive.
Web scraping, and interfacing with various APIs even as common as AWS is a lot smoother in Python.
“I wouldn’t build a web server with R. But anything with statistics I would use R. Practically, I would use python for data acquisition. Web scraping, API interaction, automated SQL stuff. But then use R to create models and run analytics on that acquired data.”
Where do you stand on this?
We know that both have advantages and disadvantages and the choice depends on the situation, so focus on Data Science: