Menu

Mode Gelap
Innovation Becomes Secondary at Small Firms as Tariffs Dominate Their Focus

Technology

Study Finds Advanced AI Experiences ‘Complete Accuracy Collapse’ When Confronted With Complex Problems

badge-check


					Study Finds Advanced AI Experiences ‘Complete Accuracy Collapse’ When Confronted With Complex Problems Perbesar

The race to develop artificial general intelligence (AGI)—machines capable of performing any intellectual task as well as a human—has captivated the tech world. But a new study by Apple researchers is raising critical concerns about the direction of current AI development. According to the paper, even the most advanced large reasoning models (LRMs) show alarming limitations when tasked with solving complex problems. These findings suggest the industry might be reaching a critical impasse, challenging long-held beliefs about the scalability and potential of AI systems.


I. Apple’s AI Study: Exposing the Cracks

1. Complete Performance Collapse in Complex Tasks

In their recently published research, Apple scientists examined the capabilities of LRMs—an advanced type of AI designed to solve difficult problems by breaking them down into smaller, logical steps. Surprisingly, they discovered that these models not only underperformed but experienced a “complete accuracy collapse” when handling high-complexity tasks. In contrast, simpler AI systems often produced better results on basic tasks, outperforming their more sophisticated counterparts.

2. Reasoning Effort Diminishes With Greater Challenge

One of the most troubling findings was that as LRMs approached tasks beyond their capability, they actually decreased their reasoning efforts. This counterintuitive behavior indicated that instead of ramping up logical processing under pressure, the models scaled back their analysis—precisely when deeper thinking was needed. Apple researchers described this decline in effort as particularly concerning, suggesting an underlying weakness in how these models are trained to reason.


II. Industry Reactions and Broader Implications

1. Experts Warn of Overconfidence in AI Progress

Gary Marcus, a well-known critic of unchecked optimism in AI, described Apple’s findings as “devastating.” Writing in his Substack newsletter, he emphasized that the results raise serious questions about the feasibility of achieving AGI with current methods. Marcus challenged the belief that large language models (LLMs), which power tools like ChatGPT, represent a clear path toward AGI. “Anyone who thinks LLMs will lead directly to AGI is deluding themselves,” he stated.

2. Inefficient Use of Computational Resources

The study also found that LRMs often wasted computational resources. When dealing with straightforward problems, the models would sometimes solve them too quickly—expending effort unnecessarily at the start of the process. However, once the complexity increased slightly, they often explored incorrect answers before finally finding the correct one. For even harder problems, the systems failed entirely, even when given an algorithm capable of producing the right solution.

3. Critical Thresholds Signal Model Failure

According to the research, there is a critical threshold where the models’ performance breaks down completely. Instead of pushing harder, the AI inexplicably begins to apply less reasoning effort. This behavior was identified as a “scaling limitation,” highlighting the inability of current models to effectively generalize or adapt their logic as challenges grow more difficult.


III. Testing the Models: Puzzle-Based Evaluation

1. Puzzle Challenges as Benchmarks

To evaluate the reasoning ability of AI systems, Apple’s researchers presented the models with classic logic puzzles such as the Tower of Hanoi and River Crossing problems. These tests are designed to measure logical progression, problem-solving, and cognitive agility. While the puzzles provided valuable insight, the researchers acknowledged that using only puzzles may have limited the scope of their findings.

2. AI Systems Tested

The study included a range of leading models, including OpenAI’s o3, Google’s Gemini Thinking, Anthropic’s Claude 3.7 Sonnet-Thinking, and DeepSeek-R1. Despite their diverse architectures and creators, all models demonstrated similar vulnerabilities when confronted with increased problem complexity. Apple’s findings imply that these weaknesses are systemic across platforms, rather than isolated to one developer or approach.


IV. Future of AI: Rethinking the Road to AGI

1. Questioning the Current Trajectory

The paper’s conclusion delivers a stark warning: the present trajectory in AI development may be approaching its limits. The observed inability to maintain reasoning under pressure suggests that existing training and model architectures are not equipped to scale with increasing task complexity. In short, the road to AGI may not be as direct—or achievable—as many in the industry have assumed.

2. Expert Views on a Dead End

Andrew Rogoyski from the Institute for People-Centred AI at the University of Surrey echoed the paper’s implications, suggesting the industry might be facing a developmental “cul-de-sac.” He pointed out that while LRMs handle low and medium complexity tasks well, their collapse under higher cognitive load indicates a need to rethink current strategies. “This might be a sign we’ve reached a dead end with the current approach,” Rogoyski warned.

3. Challenges in Generalizable Reasoning

A core issue raised by the study is the failure of AI to perform “generalizable reasoning”—the ability to apply a solution from one context to a broader set of problems. This is a critical component of human intelligence and a major hurdle for AI models seeking to reach AGI. The research calls into question the assumption that simply scaling existing models will eventually lead to machines that can reason as humans do.


Conclusion

Apple’s research delivers a sobering reality check for the AI industry. Despite the enormous progress made in developing large language and reasoning models, fundamental weaknesses remain—especially when it comes to handling complex, abstract problems. These findings suggest that the current path toward AGI may require a dramatic shift in approach rather than incremental advancements. If the industry continues to push forward without addressing these foundational flaws, the dream of machines matching human cognition may remain elusive for far longer than anticipated.

Facebook Comments Box

Tinggalkan Balasan

Alamat email Anda tidak akan dipublikasikan. Ruas yang wajib ditandai *

Baca Lainnya

WhatsApp Defends ‘Optional’ AI Tool That Can’t Be Turned Off

2 Juli 2025 - 00:38 WIB

Meta Urged to Do More in Crackdown on “Nudify” Apps

2 Juli 2025 - 00:38 WIB

Meta AI Searches Made Public – But Do All Its Users Realize?

2 Juli 2025 - 00:38 WIB

Council Says AI Trial Helps Reduce Staff Workload

2 Juli 2025 - 00:33 WIB

Trump Says He Has ‘A Group of Very Wealthy People’ to Buy TikTok

2 Juli 2025 - 00:33 WIB

Trending di Tech News