• Washington DC |
  • New York |
  • Toronto |
  • Distribution: (800) 510 9863
Friday, April 24, 2026
  • Login
No Result
View All Result
NEWSLETTER
New Edge Times
  • World
  • U.S.
  • Politics
  • Business
  • Science
  • Tech
  • Youth
  • Entertainment
    • All
    • Arts
    • Gaming
    • Movie
    • Music
    Saudis Withdraw Offer of Millions to Metropolitan Opera

    Saudis Withdraw Offer of Millions to Metropolitan Opera

    Joy Harmon, Car-Washing Temptress in ‘Cool Hand Luke,’ Dies at 87

    Joy Harmon, Car-Washing Temptress in ‘Cool Hand Luke,’ Dies at 87

    D4vd Murder Case: Celeste Rivas Hernandez’s Cause of Death Is Revealed

    D4vd Murder Case: Celeste Rivas Hernandez’s Cause of Death Is Revealed

    ‘Michael’ Review: A Jackson Biopic Leaves Too Much Unsaid

    ‘Michael’ Review: A Jackson Biopic Leaves Too Much Unsaid

    Video: Anne Hathaway and Michaela Coel in a Spooky, Tangled Thriller

    Video: Anne Hathaway and Michaela Coel in a Spooky, Tangled Thriller

    Video: Movie Review: You, Me & Tuscany

    Video: Movie Review: You, Me & Tuscany

    Josefina Aguilar, Who Depicted Mexican Life in Clay, Dies at 80

    Josefina Aguilar, Who Depicted Mexican Life in Clay, Dies at 80

    At ‘Baywatch’ Tryouts, Hoping to Be the Next Pam Anderson or Jason Momoa

    At ‘Baywatch’ Tryouts, Hoping to Be the Next Pam Anderson or Jason Momoa

    • Gaming
    • Movie
    • Music
    • Arts
  • Sports
  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    New Phishing Scam: Fake Invitations

    New Phishing Scam: Fake Invitations

    A Four-Ingredient Cookie That’s Tender and Crunchy

    A Four-Ingredient Cookie That’s Tender and Crunchy

    This Beef Patty Holds Many Secrets

    This Beef Patty Holds Many Secrets

    An expert talks: the best the best dental care for dog

    An expert talks: the best the best dental care for dog

    Video: Designer Fashion Hits the 2026 WNBA Draft

    Video: Designer Fashion Hits the 2026 WNBA Draft

    Video: The New Aesthetic of ‘Euphoria’

    Video: The New Aesthetic of ‘Euphoria’

    Is There a Perfect Way to Cook Eggs?

    Is There a Perfect Way to Cook Eggs?

    Bran Muffins Can Be Tender and Moist. Here’s How.

    Bran Muffins Can Be Tender and Moist. Here’s How.

    • Fashion
    • Food
    • Health
    • Travel
  • Reviews
  • Trending
  • World
  • U.S.
  • Politics
  • Business
  • Science
  • Tech
  • Youth
  • Entertainment
    • All
    • Arts
    • Gaming
    • Movie
    • Music
    Saudis Withdraw Offer of Millions to Metropolitan Opera

    Saudis Withdraw Offer of Millions to Metropolitan Opera

    Joy Harmon, Car-Washing Temptress in ‘Cool Hand Luke,’ Dies at 87

    Joy Harmon, Car-Washing Temptress in ‘Cool Hand Luke,’ Dies at 87

    D4vd Murder Case: Celeste Rivas Hernandez’s Cause of Death Is Revealed

    D4vd Murder Case: Celeste Rivas Hernandez’s Cause of Death Is Revealed

    ‘Michael’ Review: A Jackson Biopic Leaves Too Much Unsaid

    ‘Michael’ Review: A Jackson Biopic Leaves Too Much Unsaid

    Video: Anne Hathaway and Michaela Coel in a Spooky, Tangled Thriller

    Video: Anne Hathaway and Michaela Coel in a Spooky, Tangled Thriller

    Video: Movie Review: You, Me & Tuscany

    Video: Movie Review: You, Me & Tuscany

    Josefina Aguilar, Who Depicted Mexican Life in Clay, Dies at 80

    Josefina Aguilar, Who Depicted Mexican Life in Clay, Dies at 80

    At ‘Baywatch’ Tryouts, Hoping to Be the Next Pam Anderson or Jason Momoa

    At ‘Baywatch’ Tryouts, Hoping to Be the Next Pam Anderson or Jason Momoa

    • Gaming
    • Movie
    • Music
    • Arts
  • Sports
  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    New Phishing Scam: Fake Invitations

    New Phishing Scam: Fake Invitations

    A Four-Ingredient Cookie That’s Tender and Crunchy

    A Four-Ingredient Cookie That’s Tender and Crunchy

    This Beef Patty Holds Many Secrets

    This Beef Patty Holds Many Secrets

    An expert talks: the best the best dental care for dog

    An expert talks: the best the best dental care for dog

    Video: Designer Fashion Hits the 2026 WNBA Draft

    Video: Designer Fashion Hits the 2026 WNBA Draft

    Video: The New Aesthetic of ‘Euphoria’

    Video: The New Aesthetic of ‘Euphoria’

    Is There a Perfect Way to Cook Eggs?

    Is There a Perfect Way to Cook Eggs?

    Bran Muffins Can Be Tender and Moist. Here’s How.

    Bran Muffins Can Be Tender and Moist. Here’s How.

    • Fashion
    • Food
    • Health
    • Travel
  • Reviews
  • Trending
No Result
View All Result
New Edge Times
No Result
View All Result
Home Tech

A.I.’s Black Boxes Just Got a Little Less Mysterious

by New Edge Times Report
May 21, 2024
in Tech
A.I.’s Black Boxes Just Got a Little Less Mysterious
Share on FacebookShare on Twitter

One of the weirder, more unnerving things about today’s leading artificial intelligence systems is that nobody — not even the people who build them — really knows how the systems work.

That’s because large language models, the type of A.I. systems that power ChatGPT and other popular chatbots, are not programmed line by line by human engineers, as conventional computer programs are.

Instead, these systems essentially learn on their own, by ingesting vast amounts of data and identifying patterns and relationships in language, then using that knowledge to predict the next words in a sequence.

One consequence of building A.I. systems this way is that it’s difficult to reverse-engineer them or to fix problems by identifying specific bugs in the code. Right now, if a user types “Which American city has the best food?” and a chatbot responds with “Tokyo,” there’s no real way of understanding why the model made that error, or why the next person who asks may receive a different answer.

And when large language models do misbehave or go off the rails, nobody can really explain why. (I encountered this problem last year when a Bing chatbot acted in an unhinged way during an interaction with me. Not even top executives at Microsoft could tell me with any certainty what had gone wrong.)

The inscrutability of large language models is not just an annoyance but a major reason some researchers fear that powerful A.I. systems could eventually become a threat to humanity.

After all, if we can’t understand what’s happening inside these models, how will we know if they can be used to create novel bioweapons, spread political propaganda or write malicious computer code for cyberattacks? If powerful A.I. systems start to disobey or deceive us, how can we stop them if we can’t understand what’s causing that behavior in the first place?

To address these problems, a small subfield of A.I. research known as “mechanistic interpretability” has spent years trying to peer inside the guts of A.I. language models. The work has been slow going, and progress has been incremental.

There has also been growing resistance to the idea that A.I. systems pose much risk at all. Last week, two senior safety researchers at OpenAI, the maker of ChatGPT, left the company amid conflict with executives about whether the company was doing enough to make its products safe.

But this week, a team of researchers at the A.I. company Anthropic announced what they called a major breakthrough — one they hope will give us the ability to understand more about how A.I. language models actually work, and to possibly prevent them from becoming harmful.

The team summarized its findings in a blog post called “Mapping the Mind of a Large Language Model.”

The researchers looked inside one of Anthropic’s A.I. models — Claude 3 Sonnet, a version of the company’s Claude 3 language model — and used a technique known as “dictionary learning” to uncover patterns in how combinations of neurons, the mathematical units inside the A.I. model, were activated when Claude was prompted to talk about certain topics. They identified roughly 10 million of these patterns, which they call “features.”

They found that one feature, for example, was active whenever Claude was asked to talk about San Francisco. Other features were active whenever topics like immunology or specific scientific terms, such as the chemical element lithium, were mentioned. And some features were linked to more abstract concepts, like deception or gender bias.

They also found that manually turning certain features on or off could change how the A.I. system behaved, or could get the system to even break its own rules.

For example, they discovered that if they forced a feature linked to the concept of sycophancy to activate more strongly, Claude would respond with flowery, over-the-top praise for the user, including in situations where flattery was inappropriate.

Chris Olah, who led the Anthropic interpretability research team, said in an interview that these findings could allow A.I. companies to control their models more effectively.

“We’re discovering features that may shed light on concerns about bias, safety risks and autonomy,” he said. “I’m feeling really excited that we might be able to turn these controversial questions that people argue about into things we can actually have more productive discourse on.”

Other researchers have found similar phenomena in small- and medium-size language models. But Anthropic’s team is among the first to apply these techniques to a full-size model.

Jacob Andreas, an associate professor of computer science at M.I.T., who reviewed a summary of Anthropic’s research, characterized it as a hopeful sign that large-scale interpretability might be possible.

“In the same way that understanding basic things about how people work has helped us cure diseases, understanding how these models work will both let us recognize when things are about to go wrong and let us build better tools for controlling them,” he said.

Mr. Olah, the Anthropic research leader, cautioned that while the new findings represented important progress, A.I. interpretability was still far from a solved problem.

For starters, he said, the largest A.I. models most likely contain billions of features representing distinct concepts — many more than the 10 million or so features that Anthropic’s team claims to have discovered. Finding them all would require enormous amounts of computing power and would be too costly for all but the richest A.I. companies to attempt.

Even if researchers were to identify every feature in a large A.I. model, they would still need more information to understand the full inner workings of the model. There is also no guarantee that A.I. companies would act to make their systems safer.

Still, Mr. Olah said, even prying open these A.I. black boxes a little bit could allow companies, regulators and the general public to feel more confident that these systems can be controlled.

“There are lots of other challenges ahead of us, but the thing that seemed scariest no longer seems like a roadblock,” he said.

Previous Post

7 New Songs You Should Hear Now

Next Post

Rent Is Harder to Handle and Inflation Is a Burden, a Fed Financial Survey Finds

Related Posts

The Impact of AI on the IT Job Market in 2026 and Beyond: A Crisp Analysis
Tech

The Impact of AI on the IT Job Market in 2026 and Beyond: A Crisp Analysis

by New Edge Times Report
April 20, 2026
Sends shares Q1 2026 business update and product progress
Tech

Sends shares Q1 2026 business update and product progress

by New Edge Times Report
April 14, 2026
Labarcasa Robotics Reinvents the Household Tray With Silent Item-Tracking Technology
Tech

Labarcasa Robotics Reinvents the Household Tray With Silent Item-Tracking Technology

by New Edge Times Report
April 11, 2026
Leave Comment
New Edge Times

© 2025 New Edge Times or its affiliated companies. All rights reserved.

Navigate Site

  • About
  • Advertise
  • Terms & Conditions
  • Privacy Policy
  • Disclaimer
  • Contact

Follow Us

No Result
View All Result
  • World
  • U.S.
  • Politics
  • Business
  • Science
  • Tech
  • Youth
  • Entertainment
    • Gaming
    • Movie
    • Music
    • Arts
  • Sports
  • Lifestyle
    • Fashion
    • Food
    • Health
    • Travel
  • Reviews
  • Trending

© 2025 New Edge Times or its affiliated companies. All rights reserved.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In