In 1950, scientist Alan Turing created a test form with which developers can test artificial intelligence for its ability to think and reason like humans. The test contains a block of questions that are answered in different chats by either real human respondents or AI bots, and at the other end of the dialogue sits a human judge, who, after a short conversation, must determine who is where.
In one of the recent mass tests on the Turing test, the latest version of GPT-4.5 successfully passed it several times, confusing the judges who chose the AI bot as a human participant. More than 300 people were judges, and there were only four AI systems, including the already mentioned LLaMa-3.1-405B, ELIZA and GPT-4o.
To complicate the task, the developers gave each AI bot a personal prompt image in advance, so the bots were not just a set of codes, but, for example, teenagers or pop culture fans. Perhaps it was this condition that helped artificial intelligence fool the judges, because the new version of the chatbot was chosen as much as 73% of the time, and when it did not have a prompt image, only 36% of the time.