Some ChatGPT users have reportedly found ChatGPT to be getting a little lazy over the last few days with some suggesting it may be imitating how humans slow down as winter sets in.
If you’ve been using ChatGPT lately you may have had to prod it a little more than usual to get it to do what you want. Some users have found its responses shorter than usual. At times ChatGPT explains how to do something instead of actually doing it for you.
Most ChatGPT users live in the northern hemisphere and know all too well that enthusiasm for work can flag as the year winds down and winter sets in. Could ChatGPT have learned this behavior from us?
OpenAI has been its usual opaque self and hasn’t offered any explanations. But there have been enough complaints to elicit a “we’re looking into fixing it” response from the company.
we’ve heard all your feedback about GPT4 getting lazier! we haven’t updated the model since Nov 11th, and this certainly isn’t intentional. model behavior can be unpredictable, and we’re looking into fixing it
— ChatGPT (@ChatGPTapp) December 8, 2023
Seasonal affective disorder is a real phenomenon but could it affect a machine running code just because of its system date? ChatGPT reflects our biases and cultural references in its responses so might it imitate the way we slow down in December?
Rob Lynch posted results of his experiment on X that seemed to suggest there may be something to the AI winter break hypothesis. Running a code completion task via the API saw ChatGPT deliver statistically significant shorter responses with a December system date than with a May date.
@ChatGPTapp @OpenAI @tszzl @emollick @voooooogel Wild result. gpt-4-turbo over the API produces (statistically significant) shorter completions when it “thinks” its December vs. when it thinks its May (as determined by the date in the system prompt).
I took the same exact prompt… pic.twitter.com/mA7sqZUA0r
— Rob Lynch (@RobLynch99) December 11, 2023
Ian Arawjo, a Postdoctoral Fellow at Harvard University, pointed out that the token and character lengths in ChatGPT’s responses aren’t normally distributed so Lynch’s test was invalid.
Without going into detail on the math behind statistics, Arawjo ran the correct type of test for this kind of data and found no statistical difference in the length of responses. Arawjo says there’s no evidence of ChatGPT experiencing some form of seasonal affective disorder.
Is ChatGPT getting lazy or are we imagining it? Even the engineers who create AI models find them inscrutable at times so it’s hard to say. We’ll have to wait to see if OpenAI provides feedback on the issue.