---
title: "Humanity's Last Exam (and Why DeepSeek Changes Everything)"
date: "2025-12-01"
category: "AI & TECH"
readTime: "5 min"
excerpt: "Three years ago, OpenAI launched ChatGPT. Today, Chinese company DeepSeek released a model comparable to the best American ones – and it's completely free. What does this mean?"
---

Humanity's Last Exam. In reality, it's not as dramatic as it sounds – in short, it's a very (emphasis on *very*) difficult test where AI models are compared.

Today it's relevant for two reasons: Three years ago OpenAI launched ChatGPT, and today Chinese company DeepSeek released a model that achieves results comparable to the best American models on that "last exam" – and it's completely free.

## DeepSeek

DeepSeek is a Chinese AI model that anyone can use for free. As long as you don't ask it "what happened at Tiananmen Square" or similar politically sensitive questions, it can be a very useful assistant.

But never share personal data there (and I'd consider whether you even want that app on your phone). It's a Chinese model and no one really knows where the data is processed.

What's interesting isn't that another AI exists. What's interesting is how much it cost and how good it is.

Training DeepSeek (at least according to public information) cost about $5.5 million. OpenAI probably spent hundreds of millions on GPT-4, maybe over a billion. Result? DeepSeek is comparable or better, cheaper, and open-source (theoretically you can run it yourself at home).

It's like someone built a Ferrari for the price of a Škoda and ran it for the cost of a bus ticket.

## Humanity's Last Exam

In January 2025, a group of scientists from the Center for AI Safety created a benchmark test called Humanity's Last Exam. The reason was simple: existing tests were too easy – AI was achieving 90%+ success rates. It was no longer possible to measure how good models actually are.

So they created the hardest possible test:

- 2,500 questions from dozens of fields (mathematics, physics, biology, chemistry, computer science, humanities)
- Expert-level difficulty – created by PhD students and scientists from around the world
- Each question had to first fail on the best AI models – if a model answered correctly, the question wasn't included in the test

The result is a test where 76% of questions have exact-match answers (either you know it precisely or you don't), 14% require understanding text + image, and answers can't be found on the internet.

And how do individual AIs perform? Currently the best is Google Gemini with 37.7%, GPT 5 Pro with 31.64%, and third is DeepSeek with 30.6%. For context on how big a leap this is: exactly one year ago, ChatGPT scored 2.72%.

## A Crazy Paradox

And here comes a crazy paradox that complicates the whole situation.

While AI models fail on HLE, other AI systems are meanwhile discovering new physical laws. A team from Emory University published a study where AI analyzed "dusty plasma" – charged gas with dust particles (you find it everywhere from Saturn's rings to forest fires). AI described forces between particles with 99% accuracy and discovered an asymmetric interaction that was theoretically predicted but never before modeled.

And most importantly: AI corrected two long-held assumptions that physicists believed for decades. It turned out that particle charge doesn't grow proportionally with radius (it depends on temperature and density of surrounding plasma) and that the force between particles doesn't decrease exponentially just with distance (it also depends on particle size).

## How Can You Fail Tests and Discover Physics?

Because these are two completely different types of abilities.

**Discovering physical laws = finding patterns in massive data.** AI models can detect thyroid, breast, or lung cancer with higher accuracy than human radiologists. They're shortening new drug development from 10+ years to months, AlphaFold predicted the structure of 200 million proteins. Because AI processes so much data at once that no human could handle it.

**Solving complex math-physics problems = flexible reasoning:**

- Ability to combine different concepts in new ways
- Recognizing when you know something and when you don't
- Applying knowledge in unknown contexts
- Common sense and "feel" for the problem

In this, AI is still at the level of "sophisticated bullshit generator with absolute confidence."

## What's Next?

I can't say whether the world has changed more or less in these 3 years than I expected. And I can't really say how much of what "already exists" affects humanity and individuals.

Replacing people with AI usually ends in failure – and sometimes spectacularly (looking at you, Deloitte), individuals are starting to replace human relationships with AI, there are several startups creating digital "copies" of dead people...

In short, pretty wild times where I have no idea what things will look like in another 3 years.

When I train teachers on AI, I emphasize mainly 3 things:

1. Always read the AI output before you release it
2. Always remember that it's still a program – use it as a tool, an assistant, not a life partner
3. Don't be afraid to discuss anything with AI – from health to relationships to creating songs. But when it's "about life" (health, law, finances), always discuss it with a person in that field