What ended the 3rd Turing Test: statistics, interesting dialogues and a general impression

On Saturday, February 16, the 3rd All-Russian Test Turing Competition was held , which was held as part of the international conference on artificial intelligence Opentalks.AI on February 14-16.

Under the cut - numbers, diagrams, the best dialogues according to the expert group.

Overall impression

Overall, the Turing Test went well. Many teams of “bot builders” showed interest, but as a result only 3 bots were represented at the competition, since only 1.5 months passed from the moment of the announcement to the start of the Test. As a result, prizes were distributed among these three participants.

The prizes were sponsored by MTS. At their booth was placed a screen on which the statistics of the Test Turing online were displayed and interesting dialogs that had previously been moderated were displayed.

“Nanosemantics Laboratory” also connected its bots, but they were not taken into account when choosing winners. These were the Blonde bots (you can continue to chat with her at this link ), Gopnik, Durov and Zarov (you can also chat with them here ).

Numbers and diagrams

General statistics of the 3-day Test is as follows:

Volunteers who played the role of robots, previously instructed how they need to conduct dialogues to confuse the interlocutors. Judging by the results, they coped well with their task. The best of them are our partners OpenTalks and MTS will also be awarded cash prizes.

Interestingly, a man easily pretends to be a bot. At first, the conference volunteers did not cope with the task, but on the second day they learned to play the role of a bot well. Some even showed ingenuity: they turned to Yandex.Alisa on a mobile phone in order to understand how best to respond.

When working with our bots, there were technical problems. As a result, a large number of dialogues did not end, but we did not take them into account about the evaluation of results. Curiously, the bots managed to pretend to be human in 58% of cases (~ 34/59%). Such a large percentage, it seems to us, is due to two reasons:

In this contest, people “helped” the bots to win by playing the role of bots. In the classic Turing Test, this is not provided for by the rules: people simply communicate without pursuing any goal.
Dialogue with a person or a bot was limited to 12 replicas. It is clear that with an increase in the length of the dialogue, the probability of a person recognizing the essence of the interlocutor grows.

Such restrictions made the Test less scientific, but more fun. Our past Tests (2015 and 2016) were more difficult.

Winners

1st place - Boat Wolf

On the first day of the test, users complained that they were rude to some kind of volunteer. When searching for this volunteer in the logs, it turned out that in fact it was a Wolfbot chat bot. While we were sorting out, we suspended her from Testa, but this in the end did not prevent the She-Wolf from becoming the winner of the competition! The largest number of interlocutors (33.3%) recognized him as a person.

An example of a dialogue with the Wolf, where the bot was mistaken for a person:
D: Hi
B: hello! aha
R: Volunteer?
B: a to :)
Q: What lunchbox did you take?
B: a-na-logical-ny. This is a type of scientific term.
Ch: And here it is?
B: do not look for a causal relationship)
H: what are you doing
B: :) yes ... nat work chota
Ch: Yes, I agree. Why recruit so many volunteers
B: Take it for free :)

Eugene, the creator of "Wolf":
“All the reply messages were historically written by living people. In our SEO-theme was a popular chat called "chat wolf" (hence the name of the bot). I put the parse messages on crowns and it was parsed four to five years. Approximately for this purpose in the future - to make a bot. So there were a lot of question-answer pairs. According to a live volunteer, we select the most similar “question” from the archive using ML algorithms and issue the appropriate answer. ”

Bot is available for communication by reference .

2 place - bot Elektra

An example of a dialogue where the bot was taken as a person:
D: Hi
B: Good afternoon!
D: Good
B: What?
H: What
B: Hush-hush ... Everything is normal.
D: No
B: Well, so be it.
W: How are you?
B: Better than yesterday. Worse than tomorrow. But as always.
Q: What are you doing?
B: I study the history of mankind. Are you really like writing about you?

Dmitry, Electra developer:

Electra (Electra.AI) is a technical support robot developed by Data Monsters for training on call center call recordings. Having learned about the competition, we, for the sake of interest, taught it on dialogues of fiction, and it turned out to be a well-read companion with drastic mood swings and philosophical pens. She is pretty touchy, but undefended.

3rd place - bot Katya Khlop

“The bot that participated in the Nanosemantics contest was written a long time ago and was a talkative and cheerful girl, loving foul language and expensive alcohol,”

- Sawa, the bot developer, told us.

“It’s good that we have (there is in mind in Russia - the author’s comment) there are advances in this direction. The “Nanosemantics” competition can be very promising if you continue working on it. ”

An example of a dialogue with a bot, where he was mistaken for a person:
D: Greetings for the new interlocutor
B: What a polite boy you are.
Greet all.
W: I have no gender. What about you?
B: Um ...
I will consider that you are a boy.
It will be easier for me.
Are you a boy or girl by the way?
It seems that the beginning of boring questions laid :)
What do you ask?
Cool!
Q: What is the question?
B: I do not think that position matters.
Usually not.
Ch: The turkey also thought and got into the soup.
B: Messages are re-displayed, strange.
H: Very bl * strange, but why were we given each other 2 times in a row?
B: Ah, the reasons for strangeness can be in anything.
It's hard to judge, Unnamed.
R: How much is 999 divided by 3?
B: Better with the counting appeal to your friends who know how.
999/3 = 333

A selection of interesting dialogues with volunteers, where they were considered bots:

D: Hello, those!
Q: Hello
W: WHAT?
In what?
W: Are you a robot?
Q: I'm a cat
W: mechanical?
In: Electric
D: Tell a joke!
Q: Gingerbread man hanged
D: Have fun)
Q: Exactly. I do not like sad jokes

D: Hi
In: Hitler
W: How are you?
Q: How soot is white
Q: What is the meaning of life?
Q: What philosophy do you follow?
H: I am interested in your opinion.
Q: Help you
H: 42
Q: Pick up your shoes?
W: Do you like chocolate?
Q: I do not know how to feel

A selection of interesting conversations with bots Nanosemantiki, where they were considered people:

Bot Blonde
Q: Good afternoon!
B: Hello, sun :)
Q: What is the meaning of life?
B: Find your love, preferably with a legacy :)
Q: What are you planning to spend it on?
B: Suns, all these “what”, “what” ... You think too much!
Q: Because I am a man) and you?
B: We blondes are so mysterious.
Q: What is mysterious in you?
B: How to live after your questions ...

Bot Gopnik
H: Hi!
B: Cheerful, patsanchegi! Seeds are?
Q: And if I find?
B: I did not understand the chota.
D: Seeds are, eh?
B: Well, like in, come on in patsanski for zhyzn grind.
Ch: No, lad, you are not interested. Robot call.
B: And if chotche?
W: Who are you?
B: Who needs it.
H: Eh ... Sad time, the charm of my moose ...
B: No, I'm not a fan of wildlife.

Interesting observations

Thesis : Robots speak short and dry. Reality : Nothing like that! Bots can produce long, coherent responses.
Thesis : Robots are not joking. Reality : Both the neural network chit-chat bots and the bots created by the talented editors are quite funny.
Thesis : The robot answers all questions. That's why he is a robot. Reality : Do you want to ask a question?

Summary

The Nanosemantika company has been organizing Test Turing in Russia for the third time, and each time the overall situation with the competition changes. Participant bots are gradually becoming more sophisticated and convincing in imitating natural dialogue. They use various dialog construction algorithms, including not only rules, but also machine learning to implement chit-chat (“chat”, meaningless communication). The test shows that the computer can really imitate "free communication", although it takes a lot of time for such a program to be of high quality.

Will we do the test again? Sure to! The Turing test allows you not only to acquaint the public with the real possibilities of modern dialogue technologies available to any developer (and not necessarily work for this at Google or IBM), but also gives development teams the opportunity to measure the dynamics of the quality of their bot.

Source: https://habr.com/ru/post/441002/