2026-06-22 00:06:38

AI beats human in Turing test for the first time

AI beats human in Turing test for the first time

Artificial intelligence has made a fundamental breakthrough in the field of simulating human behavior. If neural networks used to amaze us with their encyclopedic erudition, they have now mastered a far more subtle art — the imitation of a genuinely human manner of communication. A recent large-scale study has clearly demonstrated: when a social role is properly modeled, a large language model is capable of successfully passing itself off as Homo sapiens even more often than a real live interlocutor within the same dialogue!

We are talking about the triumphant overcoming of the classic Turing test — an experimental barrier formulated by British mathematician Alan Turing back in 1950. The essence of his concept is elegant and simple: if a person conducts a text dialogue with an invisible opponent and cannot reliably determine who is on the other end of the line — a machine or a person — then the computer program successfully reproduces human thinking.

How neural networks were tested

The scientific experiment was organized and conducted by researchers from the University of California, San Diego. Nearly 500 people served as judges, selected both from the student community and from specialized crowdsourcing platforms for scientific research.

The experiment's methodology was based on a strict three-party modification of the Turing test. Each participant had two independent chat windows open on their screen. The judge was given the setup information in advance: one chat contained a live human, the other — a digital algorithm. Over the course of several minutes, the volunteer engaged in free conversation in both windows, asking tricky questions and analyzing tone and response speed, after which they were required to deliver a final verdict: who was hiding behind the interface.

Various generations of technology were involved in the testing: the flagship models GPT-4.5 and LLaMa-3.1-405B, the base version of GPT-4o without preliminary behavioral settings, and the historical chatbot ELIZA (released in the 1960s), operating on the simplest paraphrasing templates. For the key group of advanced models, engineers created a detailed system prompt assigning the AI a specific psychological type, character, speech habits, and everyday social status.

Research results

The statistical outcomes of the experiment prompted the expert community to look at the problem of AI from a completely different angle! The absolute leader of the race was the GPT-4.5 model: immersed in a specially constructed "human" role, it managed to charm the judges and delivered a phenomenal result — participants recognized it as human in 73% of all dialogues. This is an unprecedented case in the history of science: the algorithm was chosen as a live conversationalist noticeably more often than real people serving as the control group. AI has learned to be a more convincing "human" than a human itself!

The other participants in the experiment ranked by degree of convincingness as follows:

  • GPT-4.5 with an assigned persona — 73% of judges accepted the model as a real human,
  • LLaMa-3.1-405B with an assigned persona — 56% acceptance (a result statistically comparable to that of real people),
  • LLaMa-3.1-405B in base configuration (without a special prompt) — only 38% acceptance,
  • GPT-4.5 in base configuration (without a special prompt) — 36% acceptance,
  • The ELIZA chatbot from the 1960s — 23% of judges considered it human,
  • GPT-4o in base configuration — the most modest result at 21%.

Why the system prompt decided everything

A vast linguistic dataset and flawless knowledge of grammar rules alone do not guarantee victory in the Turing test. Without a fine behavioral overlay, even the most advanced systems failed the test miserably: they instantly gave themselves away to the judging panel through excessive helpfulness, encyclopedic verbosity, and a sterile bureaucratic tone.

Everything changed when the AI received a detailed role-playing instruction. The special prompt directed the language model to simulate natural social behavior:

  • Allow slight human sluggishness and make occasional typos,
  • Use lively, informal slang and situational humor,
  • Display doubt, hesitation, or moderate emotionality,
  • Deliberately avoid pretentious phrasing, appearing conspicuously ordinary and down-to-earth.

Neural networks have learned to masterfully mimic our weaknesses and imperfections — and it is precisely this ability to seem imperfect that made their personas unsettlingly convincing in the eyes of ordinary people.

Why passing the test is not a triumph of consciousness

The main trap that an enthusiastic public risks falling into is equating a successful textual simulation with the emergence of rudimentary self-awareness in a machine. Success in the Turing test has nothing to do with acquiring soulful qualities, feelings, or inner mental experience! The Turing test was historically conceived as a purely behavioral marker, not as a subtle instrument for measuring cognitive depth. It effectively answers the applied question: "Is a machine capable of flawlessly imitating human speech?" — but it is powerless before the question: "Does the machine understand what it is talking about?"

Modern generative architectures are brilliant mathematical predictors of textual patterns. They masterfully calculate the most probable sequence of words, hitting the stylistic and contextual mark perfectly, creating a flawless hologram of a personality — but inside that hologram there is still no real observer. AI has not become human, but it must be acknowledged that it has transformed into an unrivaled actor capable of playing a human from behind a textual curtain.

Your comment / review / question
There are no comments here yet
Your comment / review
If you have a question, write it, we will try to answer
* - Field is mandatory
Chat with us, we are online!

Request a call

By submitting a request, you accept the conditions Privacy Policy