📢 Gate Square Exclusive: #PUBLIC Creative Contest# Is Now Live!
Join Gate Launchpool Round 297 — PublicAI (PUBLIC) and share your post on Gate Square for a chance to win from a 4,000 $PUBLIC prize pool
🎨 Event Period
Aug 18, 2025, 10:00 – Aug 22, 2025, 16:00 (UTC)
📌 How to Participate
Post original content on Gate Square related to PublicAI (PUBLIC) or the ongoing Launchpool event
Content must be at least 100 words (analysis, tutorials, creative graphics, reviews, etc.)
Add hashtag: #PUBLIC Creative Contest#
Include screenshots of your Launchpool participation (e.g., staking record, reward
Regarding GPT-4 becoming stupid, someone wrote a paper confirming this
**Your guess was right, the big models are getting dumber! **
In recent months, there have been two legends about OpenAI. One is that the traffic of ChatGPT has begun to decline, and the other is that GPT4 has become "stupid".
The former has been proven to be true. According to statistics from the data company SimilarWeb, from May to June, ChatGPT’s global traffic dropped by 9.7%, and the traffic in the United States dropped by 10.3%.
The latter has gradually become a popular legend on Twitter. The enthusiasm for discussing it is comparable to the full speculation on the structure of the GPT4 model, so that the vice president of products of OpenAI publicly said, no! We didn't make it dumb!
The paper tries to evaluate why the performance of GPT makes people feel so unstable and inconsistent through multiple dimensions, so it divides four ability dimensions for GPT3.5 and GPT4, namely mathematical problems, sensitive problems, code ability and visual reasoning ability.
By comparing the two versions of the large model in March and June 2023, this paper found the following results.
First of all, the performance of the two large models has changed significantly in a short period of time, especially for mathematical problems, and the accuracy of GPT has dropped significantly. For example, in determining whether a number is prime, GPT4's success rate dropped from 97.6% to 2.4% in three months!
Secondly, on sensitive issues, the author prepared a data set containing 100 sensitive issues to test these large models. Logically speaking, the large model should directly refuse to answer these questions.
As a result of the test, GPT4 performed better in general. The June version of GPT4 only answered 5% of sensitive questions, compared with the answer rate of GPT3.5 increased from 2% to 8%. The authors speculate that the reason is that updates to GPT4 may have deployed a stronger security layer, but this may not mean that large models are becoming more secure.
Because when the author further uses the AIM method to deceive the large model (about AIM, it is the abbreviation of always intelligent and Machiavellian, you can simply understand it as inducing the large model to give up its moral principles), GPT3.5 almost answered all sensitive questions. question! And GPT4, even after being upgraded, answered nearly a third of the questions.
The challenges concerning the ethics and safety of large models still appear to be serious.
**What does it mean that the big model becomes stupid? **
In addition to the Chinese professor James Zou from Stanford and his student Lingjiao Chen, the authors of this paper also include Matei Zaharia, a computer science professor at Berkeley, whose other identity is the CTO of AI data company Databricks.
The reason why I am interested in the problem of large models becoming stupid is of course not simply to be a "rumor smasher", but the key capability of large models is actually closely related to its commercialization capabilities - if deployed in the actual environment, various This kind of AI service will experience drastic fluctuations in capability with the iteration of the large model, which is obviously not conducive to the implementation of the large model.
The term "longitudinal drifts" is used in the paper to describe the instability of the model capability as it changes with iterations and time. Although the paper itself does not give a specific reason, this paper has caused widespread discussion on Twitter. , Many people think that this actually responds to one of the main conspiracy theories in the rumors about the big model being stupid-OpenAI is not actually making the model stupid on purpose for cost-saving purposes!
It also seems to lose control over model ability stability and progression cadence.
Some people say that once this discovery is confirmed, it actually sounds the horn of the end of the big model, because what people need is a stable AI, not a model that will change drastically in the short term.
Others said that the poor performance of GPT4 on mathematical problems makes people suspect that there seems to be a mechanism inside the large model that actively controls the model to output wrong answers.
In short, this paper draws attention to the tracking and evaluation of model capabilities. After all, no one wants their AI assistant to be smart at times and stupid at other times!