AT2k Design BBS Message Area
Casually read the BBS message area using an easy to use interface. Messages are categorized exactly like they are on the BBS. You may post new messages or reply to existing messages!

You are not logged in. Login here for full access privileges.

Previous Message | Next Message | Back to Slashdot  <--  <--- Return to Home Page
   Local Database  Slashdot   [78 / 112] RSS
 From   To   Subject   Date/Time 
Message   VRSS    All   LLMs' 'Simulated Reasoning' Abilities Are a 'Brittle Mirage,' Re   August 11, 2025
 10:40 PM  

Feed: Slashdot
Feed Link: https://slashdot.org/
---

Title: LLMs' 'Simulated Reasoning' Abilities Are a 'Brittle Mirage,'
Researchers Find

Link: https://slashdot.org/story/25/08/11/2253229/l...

An anonymous reader quotes a report from Ars Technica: In recent months, the
AI industry has started moving toward so-called simulated reasoning models
that use a "chain of thought" process to work through tricky problems in
multiple logical steps. At the same time, recent research has cast doubt on
whether those models have even a basic understanding of general logical
concepts or an accurate grasp of their own "thought process." Similar
research shows that these "reasoning" models can often produce incoherent,
logically unsound answers when questions include irrelevant clauses or
deviate even slightly from common templates found in their training data. In
a recent pre-print paper, researchers from the University of Arizona
summarize this existing work as "suggest[ing] that LLMs are not principled
reasoners but rather sophisticated simulators of reasoning-like text." To
pull on that thread, the researchers created a carefully controlled LLM
environment in an attempt to measure just how well chain-of-thought reasoning
works when presented with "out of domain" logical problems that don't match
the specific logical patterns found in their training data. The results
suggest that the seemingly large performance leaps made by chain-of-thought
models are "largely a brittle mirage" that "become[s] fragile and prone to
failure even under moderate distribution shifts," the researchers write.
"Rather than demonstrating a true understanding of text, CoT reasoning under
task transformations appears to reflect a replication of patterns learned
during training." [...] Rather than showing the capability for generalized
logical inference, these chain-of-thought models are "a sophisticated form of
structured pattern matching" that "degrades significantly" when pushed even
slightly outside of its training distribution, the researchers write.
Further, the ability of these models to generate "fluent nonsense" creates "a
false aura of dependability" that does not stand up to a careful audit. As
such, the researchers warn heavily against "equating [chain-of-thought]-style
output with human thinking" especially in "high-stakes domains like medicine,
finance, or legal analysis." Current tests and benchmarks should prioritize
tasks that fall outside of any training set to probe for these kinds of
errors, while future models will need to move beyond "surface-level pattern
recognition to exhibit deeper inferential competence," they write.

Read more of this story at Slashdot.

---
VRSS v2.1.180528
  Show ANSI Codes | Hide BBCodes | Show Color Codes | Hide Encoding | Hide HTML Tags | Show Routing
Previous Message | Next Message | Back to Slashdot  <--  <--- Return to Home Page

VADV-PHP
Execution Time: 0.0118 seconds

If you experience any problems with this website or need help, contact the webmaster.
VADV-PHP Copyright © 2002-2025 Steve Winn, Aspect Technologies. All Rights Reserved.
Virtual Advanced Copyright © 1995-1997 Roland De Graaf.
v2.1.250224