aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Research
  • Technology

Researchers discover a shortcoming that makes LLMs less reliable

  • aster.cloud
  • December 4, 2025
  • 5 minute read

Large language models can learn to mistakenly link certain sentence patterns with specific topics — and may then repeat these patterns instead of reasoning.

Adam Zewe | MIT News
https://news.mit.edu/2025/shortcoming-makes-llms-less-reliable-1126


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

Caption:An LLM might learn that a question like “Where is Paris located?” is structured as adverb/verb/proper noun/verb. If the model is given a new question with the same grammatical structure but nonsense words, like “Quickly sit Paris clouded?” it might answer “France” even though that answer makes no sense. Credits:Image: MIT News; iStock

Large language models (LLMs) sometimes learn the wrong lessons, according to an MIT study.

Rather than answering a query based on domain knowledge, an LLM could respond by leveraging grammatical patterns it learned during training. This can cause a model to fail unexpectedly when deployed on new tasks.

The researchers found that models can mistakenly link certain sentence patterns to specific topics, so an LLM might give a convincing answer by recognizing familiar phrasing instead of understanding the question.

Their experiments showed that even the most powerful LLMs can make this mistake.

This shortcoming could reduce the reliability of LLMs that perform tasks like handling customer inquiries, summarizing clinical notes, and generating financial reports.

It could also have safety risks. A nefarious actor could exploit this to trick LLMs into producing harmful content, even when the models have safeguards to prevent such responses.

After identifying this phenomenon and exploring its implications, the researchers developed a benchmarking procedure to evaluate a model’s reliance on these incorrect correlations. The procedure could help developers mitigate the problem before deploying LLMs.

“This is a byproduct of how we train models, but models are now used in practice in safety-critical domains far beyond the tasks that created these syntactic failure modes. If you’re not familiar with model training as an end-user, this is likely to be unexpected,” says Marzyeh Ghassemi, an associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), a member of the MIT Institute of Medical Engineering Sciences and the Laboratory for Information and Decision Systems, and the senior author of the study.

Read More  GraphCast: AI Model For Faster And More Accurate Global Weather Forecasting

Ghassemi is joined by co-lead authors Chantal Shaib, a graduate student at Northeastern University and visiting student at MIT; and Vinith Suriyakumar, an MIT graduate student; as well as Levent Sagun, a research scientist at Meta; and Byron Wallace, the Sy and Laurie Sternberg Interdisciplinary Associate Professor and associate dean of research at Northeastern University’s Khoury College of Computer Sciences. A paper describing the work will be presented at the Conference on Neural Information Processing Systems.

Stuck on syntax

LLMs are trained on a massive amount of text from the internet. During this training process, the model learns to understand the relationships between words and phrases — knowledge it uses later when responding to queries.

In prior work, the researchers found that LLMs pick up patterns in the parts of speech that frequently appear together in training data. They call these part-of-speech patterns “syntactic templates.”

LLMs need this understanding of syntax, along with semantic knowledge, to answer questions in a particular domain.

“In the news domain, for instance, there is a particular style of writing. So, not only is the model learning the semantics, it is also learning the underlying structure of how sentences should be put together to follow a specific style for that domain,” Shaib explains.   

But in this research, they determined that LLMs learn to associate these syntactic templates with specific domains. The model may incorrectly rely solely on this learned association when answering questions, rather than on an understanding of the query and subject matter.

For instance, an LLM might learn that a question like “Where is Paris located?” is structured as adverb/verb/proper noun/verb. If there are many examples of sentence construction in the model’s training data, the LLM may associate that syntactic template with questions about countries.

Read More  Introducing AI-Powered Insights In Threat Intelligence

So, if the model is given a new question with the same grammatical structure but nonsense words, like “Quickly sit Paris clouded?” it might answer “France” even though that answer makes no sense.

“This is an overlooked type of association that the model learns in order to answer questions correctly. We should be paying closer attention to not only the semantics but the syntax of the data we use to train our models,” Shaib says.

Missing the meaning

The researchers tested this phenomenon by designing synthetic experiments in which only one syntactic template appeared in the model’s training data for each domain. They tested the models by substituting words with synonyms, antonyms, or random words, but kept the underlying syntax the same.

In each instance, they found that LLMs often still responded with the correct answer, even when the question was complete nonsense.

When they restructured the same question using a new part-of-speech pattern, the LLMs often failed to give the correct response, even though the underlying meaning of the question remained the same.

They used this approach to test pre-trained LLMs like GPT-4 and Llama, and found that this same learned behavior significantly lowered their performance.

Curious about the broader implications of these findings, the researchers studied whether someone could exploit this phenomenon to elicit harmful responses from an LLM that has been deliberately trained to refuse such requests.

They found that, by phrasing the question using a syntactic template the model associates with a “safe” dataset (one that doesn’t contain harmful information), they could trick the model into overriding its refusal policy and generating harmful content.

Read More  What Not To Miss At CES 2024

“From this work, it is clear to me that we need more robust defenses to address security vulnerabilities in LLMs. In this paper, we identified a new vulnerability that arises due to the way LLMs learn. So, we need to figure out new defenses based on how LLMs learn language, rather than just ad hoc solutions to different vulnerabilities,” Suriyakumar says.

While the researchers didn’t explore mitigation strategies in this work, they developed an automatic benchmarking technique one could use to evaluate an LLM’s reliance on this incorrect syntax-domain correlation. This new test could help developers proactively address this shortcoming in their models, reducing safety risks and improving performance.

In the future, the researchers want to study potential mitigation strategies, which could involve augmenting training data to provide a wider variety of syntactic templates. They are also interested in exploring this phenomenon in reasoning models, special types of LLMs designed to tackle multi-step tasks.

“I think this is a really creative angle to study failure modes of LLMs. This work highlights the importance of linguistic knowledge and analysis in LLM safety research, an aspect that hasn’t been at the center stage but clearly should be,” says Jessy Li, an associate professor at the University of Texas at Austin, who was not involved with this work.

This work is funded, in part, by a Bridgewater AIA Labs Fellowship, the National Science Foundation, the Gordon and Betty Moore Foundation, a Google Research Award, and Schmidt Sciences.

Reprinted with permission of MIT News
http://news.mit.edu/

Source: zedreviews.com


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • AI
  • LLM
  • ML
  • Science
You May Also Like
View Post
  • Gears
  • Technology

Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection

  • June 15, 2026
View Post
  • Technology

The consequences of relying on AI for accurate news

  • June 10, 2026
View Post
  • Gears
  • Technology

WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements

  • June 8, 2026
View Post
  • Technology

IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery

  • June 4, 2026
View Post
  • Technology

Banks race to patch new cyber vulnerabilities, and other cybersecurity news

  • May 25, 2026
pope-leo-xiv-cq5dam-1500.844
View Post
  • Technology

Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May

  • May 22, 2026
View Post
  • Technology

Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work

  • May 20, 2026
reMarkable Paper Pure
View Post
  • Gears
  • Technology

Everything The reMarkable Paper Pure Actually Does

  • May 14, 2026

Stay Connected!
LATEST
  • digital-nomad-freelancer-worker-2151205464 1
    One paperwork problem – Get your Digital Nomad Visa employment documents fast from UK, EU or Singapore
    • June 16, 2026
  • 2
    Samsung Art Store Brings Art Basel to Homes Worldwide With New Curated Collection
    • June 15, 2026
  • 3
    You Do Not Need to Invest in the IPO of SpaceX, Anthropic, and OpenAI
    • June 10, 2026
  • 4
    The consequences of relying on AI for accurate news
    • June 10, 2026
  • 5
    Connecting AI agents with unstructured data using Google Cloud Storage MCP Servers
    • June 10, 2026
  • 6
    WWDC26: Apple unveils next generation of Apple Intelligence, Siri AI, powerful parental controls, and an expansive set of software improvements
    • June 8, 2026
  • 7
    IBM and Google Cloud Announce Strategic Partnership to Scale AI with Human Expertise and AI‑Powered Delivery
    • June 4, 2026
  • Data center 8
    Data Sovereignty in Spain. It’s Not Just About the Law, It’s About Efficiency
    • June 3, 2026
  • 9
    Ink vs Pixels. What you miss versus what you are actually missing.
    • June 1, 2026
  • 10
    Banks race to patch new cyber vulnerabilities, and other cybersecurity news
    • May 25, 2026
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • pope-leo-xiv-cq5dam-1500.844 1
    Pope Leo XIV to Publish First Encyclical on Artificial Intelligence and Human Dignity on 25 May
    • May 22, 2026
  • 2
    Portfolio to Clients, and is Strengthened by Ongoing Project Glasswing Work
    • May 20, 2026
  • reMarkable Paper Pure 3
    Everything The reMarkable Paper Pure Actually Does
    • May 14, 2026
  • 4
    Scaling cloud and AI: Microsoft Azure’s commitment to Europe’s digital future
    • May 11, 2026
  • Anthropic Institute 5
    Introducing The Anthropic Institute
    • March 11, 2026
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.