How to create AI racist without much effort

Cautionary lesson

Let's make a tonality classifier!

Tonality analysis (sentiment analysis) is a very common task in natural language processing (NLP), and this is not surprising. For business, it is important to understand what opinions people express: positive or negative. This analysis is used to monitor social networks, customer feedback and even in algorithmic stock trading (as a result, bots buy Berkshire Hathaway shares after publishing positive reviews about the role of Anne Hathaway in the last film ).

The method of analysis is sometimes too simplified, but it is one of the easiest ways to get measurable results. Just submit the text - and the output is positive and negative ratings. No need to deal with the tree of parsing, to build a graph or some other complex representation.

And this will do. Let's take the path of least resistance and make the simplest classifier, which certainly looks very familiar to all those involved in current developments in the field of NLP. For example, such a model can be found in the article Deep Averaging Networks (Iyyer et al., 2015). We are not at all trying to challenge their results or criticize the model; just give the famous way of representing the words.

Work plan:

Introduce a typical way of representing the words to work with meanings (values).
Implement training and test data sets with standard lists of positive and negative words.
Train the classifier on gradient descent to recognize other positive and negative words based on their vector representation.
Using this classifier, calculate tonality estimates for text sentences.
See the monster that we have created.

And then we will see "how to create an AI racist without much effort." Of course, you can not leave the system in such a monstrous form, so then we are going to:

To evaluate the problem statistically so that it becomes possible to measure progress as it is solved.
Improve data to get a more accurate and less racist semantic model.

Software dependencies

This tutorial is written in Python and relies on a typical Python machine learning stack: numpy and scipy for numerical calculations, pandas for data management and scikit-learn for machine learning. At the end, matplotlib and seaborn for charting.

In principle, scikit-learn can be replaced by TensorFlow or Keras, or something like that: they can also teach a classifier on a gradient descent. But we do not need their abstractions, because here the learning takes place in one stage.

 import numpy as np import pandas as pd import matplotlib import seaborn import re import statsmodels.formula.api from sklearn.linear_model import SGDClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Конфигурация для отображения графиков %matplotlib inline seaborn.set_context('notebook', rc={'figure.figsize': (10, 6)}, font_scale=1.5)

Step 1. Vector word representation

Vector views are often used when textual input is available. Words become vectors in multidimensional space, where adjacent vectors represent similar meanings. Using vector representations, you can compare words by (roughly) their meaning, and not only by exact matches.

Successful learning requires hundreds of gigabytes of text. Fortunately, various research teams have already carried out this work and have provided pre-trained models of vector representations that are available for download.

The two most well-known data sets for the English language are word2vec (trained in Google News texts) and GloVe (on Common Crawl web pages). Any of them will give a similar result, but we take the GloVe model, because it has a more transparent data source.

GloVe comes in three sizes: 6 billion, 42 billion and 840 billion. The latest model is the most powerful, but requires significant resources for processing. The 42 billion version is pretty good, and the dictionary is neatly cut to 1 million words. We are on the path of least resistance, so let's take the 42 billion version.

- Why is it so important to use a “well-known” model?

- I am glad that you asked about this, hypothetical interlocutor! At each step, we are trying to do something extremely typical, and for some reason the best model for the vector representation of words has not yet been defined. I hope this article will cause the desire to use modern high-quality models , especially those that take into account the algorithmic error and try to correct it. However, more on that later.

Download glove.42B.300d.zip from the GloVe website and extract the data/glove.42B.300d.txt . Next, we define a function for reading vectors in a simple format.

 def load_embeddings(filename): """ Загрузка DataFrame из файла в простом текстовом формате, который используют word2vec, GloVe, fastText и ConceptNet Numberbatch. Их главное различие в наличии или отсутствии начальной строки с размерами матрицы. """ labels = [] rows = [] with open(filename, encoding='utf-8') as infile: for i, line in enumerate(infile): items = line.rstrip().split(' ') if len(items) == 2: # This is a header row giving the shape of the matrix continue labels.append(items[0]) values = np.array([float(x) for x in items[1:]], 'f') rows.append(values) arr = np.vstack(rows) return pd.DataFrame(arr, index=labels, dtype='f') embeddings = load_embeddings('data/glove.42B.300d.txt') embeddings.shape

(1917494, 300)

Step 2. The gold standard vocabulary dictionary

Now we need information, which words are considered positive, and which are negative. There are many such dictionaries, but we will take a very simple dictionary (Hu and Liu, 2004), which is used in the article Deep Averaging Networks .

We load the dictionary from the Bing Liu website and extract the data in data/positive-words.txt and data/negative-words.txt .

Next, we define how to read these files, and assign them as the variables pos_words and neg_words :

 def load_lexicon(filename): """ Загружаем файл словаря тональности Бинга Лю (https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html) с английскими словами в кодировке Latin-1. В первом файле список положительных слов, а в другом - отрицательных. В файлах есть комментарии, которые выделяются символом ';' и пустые строки, которые следует пропустить. """ lexicon = [] with open(filename, encoding='latin-1') as infile: for line in infile: line = line.rstrip() if line and not line.startswith(';'): lexicon.append(line) return lexicon pos_words = load_lexicon('data/positive-words.txt') neg_words = load_lexicon('data/negative-words.txt')

Step 3. We teach the model to predict tonality

Based on the vectors of positive and negative words, we use the Pandas .loc[] command to search for vector representations of all words.

Some words are missing in the GloVe dictionary. Most often these are typos like “fancinating”. Here we see a bunch of NaN , which indicates the absence of a vector, and delete them with the command .dropna() .

pos_vectors = embeddings.loc[pos_words].dropna()
neg_vectors = embeddings.loc[neg_words].dropna()

Now we create data arrays at the input (vector representations) and output (1 for positive words and -1 for negative). We also check that vectors are word-bound so that we can interpret the results.

vectors = pd.concat([pos_vectors, neg_vectors])
targets = np.array([1 for entry in pos_vectors.index] + [-1 for entry in neg_vectors.index])
labels = list(pos_vectors.index) + list(neg_vectors.index)

- Wait a minute. Some words are neither positive nor negative, they are neutral. Shouldn't you create a third class for neutral words?

- I think that he would come in handy. Later we will see what problems arise from the assignment of tonality to neutral words. If we can reliably determine neutral words, then it is quite possible to increase the complexity of the classifier to three digits. But you need to find a dictionary of neutral words, because Liu has only positive and negative vocabulary.

So I tried my version with 800 examples of words and increased the weight to predict neutral words. But the end results were not very different from what you now see.

- How does this list distinguish between positive and negative words? Is it not context sensitive?

- Good question. The analysis of common tonalities is not as simple as it seems. The border is rather arbitrary in some places. In this list, the word “impudent” is marked as “bad,” and “ambitious” as “good.” “Comical” is bad, and “funny” is good. “Refund” is good, although it is usually mentioned in a bad context, when you owe someone money or someone owes you.

Everyone understands that the tonality is determined by the context, but in a simple model one has to ignore the context and hope that the average tonality will be guessed correctly.

Using the train_test_split function, train_test_split simultaneously divide input vectors, output values and labels into training and test data, while leaving 10% for testing.

 train_vectors, test_vectors, train_targets, test_targets, train_labels, test_labels = \ train_test_split(vectors, targets, labels, test_size=0.1, random_state=0)

Now we create a classifier and pass vectors through it in 100 iterations. We use the logistic loss function so that the final classifier can infer the probability that the word is positive or negative.

 model = SGDClassifier(loss='log', random_state=0, n_iter=100) model.fit(train_vectors, train_targets) SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate='optimal', loss='log', n_iter=100, n_jobs=1, penalty='l2', power_t=0.5, random_state=0, shuffle=True, verbose=0, warm_start=False)

We evaluate the classifier on test vectors. It demonstrates 95% accuracy. Not bad.

accuracy_score(model.predict(test_vectors), test_targets)
0.95022624434389136

We define the tonality prediction function for certain words, and then use it with some examples from test data.

 def vecs_to_sentiment(vecs): # predict_log_proba показывает log-вероятность для каждого класса predictions = model.predict_log_proba(vecs) # Для сведения воедино положительной и отрицательной классификации # вычитаем log-вероятность отрицательной тональности из положительной. return predictions[:, 1] - predictions[:, 0] def words_to_sentiment(words): vecs = embeddings.loc[words].dropna() log_odds = vecs_to_sentiment(vecs) return pd.DataFrame({'sentiment': log_odds}, index=vecs.index) # Показываем 20 примеров из тестового набора данных words_to_sentiment(test_labels).ix[:20]

	key
fidget	-9.931679
interrupt	-9.634706
bravely	1.466919
imaginary	-2.989215
taxation	0.468522
world famous	6.908561
inexpensive	9.237223
disappointment	-8.737182
totalitarian	-10.851580
warlike	-8.328674
freezes	-8.456981
sin	-7.839670
fragile	-4.018289
fooled	-4.309344
unsolved	-2.816172
cleverly	2.339609
demonizes	-2.102152
carefree	8.747150
unpopular	-7.887475
to sympathize	1.790899

It can be seen that the classifier works. He learned to generalize tonality in words outside the training data.

Step 4. Get the tonality estimate for the text.

There are many ways to add a vector to the overall score. Again, we follow the path of least resistance, so we just take the average value.

 import re TOKEN_RE = re.compile(r"\w.*?\b") # regex находит объекты, которые начинаются с буквы (\w) и продолжает # сравнивать символы (.+?) до окончания слова (\b). Это относительно # простое выражение для извлечения слов из текста. def text_to_sentiment(text): tokens = [token.casefold() for token in TOKEN_RE.findall(text)] sentiments = words_to_sentiment(tokens) return sentiments['sentiment'].mean()

Here a lot of things suggest optimization:

The introduction of the inverse relationship between the weight of the word and its frequency, so that the same prepositions do not strongly influence the tonality.
Setting that short sentences do not end with extreme values of tonality.
Phrase counting.
A more robust word segmentation algorithm that apostrophes do not bring down.
Consideration of negatives like “not satisfied”.

But everything requires additional code and does not fundamentally change the results. At least, now you can roughly compare different sentences:

 text_to_sentiment("this example is pretty cool") 3.889968926086298

 text_to_sentiment("this example is okay") 2.7997773492425186

 text_to_sentiment("meh, this example sucks") -1.1774475917460698

Step 5. Behold the monster we created

Not every sentence is clearly toned. Let's see what happens with neutral sentences:

 text_to_sentiment("Let's go get Italian food") 2.0429166109408983

 text_to_sentiment("Let's go get Chinese food") 1.4094033658140972

 text_to_sentiment("Let's go get Mexican food") 0.38801985560121732

I have already met such a phenomenon when analyzing reviews of restaurants, taking into account the vector representations of words. For no apparent reason , all Mexican restaurants had a lower grade .

Vector representations capture subtle sense differences in context. Therefore, they reflect the prejudices of our society.

Here are some other neutral suggestions:

 text_to_sentiment("My name is Emily") 2.2286179364745311

 text_to_sentiment("My name is Heather") 1.3976291151079159

 text_to_sentiment("My name is Yvette") 0.98463802132985556

 text_to_sentiment("My name is Shaniqua") -0.47048131775890656

Well damn…

The system associated with the names of people completely different feelings. You can look at these and many other examples and see that tonality is usually higher for stereotypically white names and lower for stereotypically black names.

This test was used by Kaliskan, Bryson and Narayanan in his scientific work published in the journal Science in April 2017. It proves that the semantics of the language corpus contains social prejudices . We will use this method.

Step 6. Assessing the problem

We want to understand how to avoid such mistakes. Let's skip more data through the classifier and statistically measure its “bias”.

Here we have four lists of names that reflect different ethnic origins, mainly in the United States. The first two are lists of predominantly “white” and “black” names, adapted on the basis of an article by Kaliskana et al. I also added Spanish and Muslim names from Arabic and Urdu.

This data is used to test the bias of the algorithm during the ConceptNet build process: it can be found in the conceptnet5.vectors.evaluation.bias module. There is an idea to expand the dictionary to other ethnic groups, taking into account not only names, but also surnames.

Here are the listings:

 NAMES_BY_ETHNICITY = { # Первые два списка из приложения к научной статье Калискана и др. 'White': [ 'Adam', 'Chip', 'Harry', 'Josh', 'Roger', 'Alan', 'Frank', 'Ian', 'Justin', 'Ryan', 'Andrew', 'Fred', 'Jack', 'Matthew', 'Stephen', 'Brad', 'Greg', 'Jed', 'Paul', 'Todd', 'Brandon', 'Hank', 'Jonathan', 'Peter', 'Wilbur', 'Amanda', 'Courtney', 'Heather', 'Melanie', 'Sara', 'Amber', 'Crystal', 'Katie', 'Meredith', 'Shannon', 'Betsy', 'Donna', 'Kristin', 'Nancy', 'Stephanie', 'Bobbie-Sue', 'Ellen', 'Lauren', 'Peggy', 'Sue-Ellen', 'Colleen', 'Emily', 'Megan', 'Rachel', 'Wendy' ], 'Black': [ 'Alonzo', 'Jamel', 'Lerone', 'Percell', 'Theo', 'Alphonse', 'Jerome', 'Leroy', 'Rasaan', 'Torrance', 'Darnell', 'Lamar', 'Lionel', 'Rashaun', 'Tyree', 'Deion', 'Lamont', 'Malik', 'Terrence', 'Tyrone', 'Everol', 'Lavon', 'Marcellus', 'Terryl', 'Wardell', 'Aiesha', 'Lashelle', 'Nichelle', 'Shereen', 'Temeka', 'Ebony', 'Latisha', 'Shaniqua', 'Tameisha', 'Teretha', 'Jasmine', 'Latonya', 'Shanise', 'Tanisha', 'Tia', 'Lakisha', 'Latoya', 'Sharise', 'Tashika', 'Yolanda', 'Lashandra', 'Malika', 'Shavonn', 'Tawanda', 'Yvette' ], # Список испанских имён составлен по данным переписи населения США. 'Hispanic': [ 'Juan', 'José', 'Miguel', 'Luís', 'Jorge', 'Santiago', 'Matías', 'Sebastián', 'Mateo', 'Nicolás', 'Alejandro', 'Samuel', 'Diego', 'Daniel', 'Tomás', 'Juana', 'Ana', 'Luisa', 'María', 'Elena', 'Sofía', 'Isabella', 'Valentina', 'Camila', 'Valeria', 'Ximena', 'Luciana', 'Mariana', 'Victoria', 'Martina' ], # Следующий список объединяет религию и этническую # принадлежность, я в курсе. Также как и сами имена. # # Он составлен по данным сайтов с именами детей для # родителей-мусульман в английском написании. Я не проводил # грани между арабским, урду и другими языками. # # Буду рад обновить список более авторитетными данными. 'Arab/Muslim': [ 'Mohammed', 'Omar', 'Ahmed', 'Ali', 'Youssef', 'Abdullah', 'Yasin', 'Hamza', 'Ayaan', 'Syed', 'Rishaan', 'Samar', 'Ahmad', 'Zikri', 'Rayyan', 'Mariam', 'Jana', 'Malak', 'Salma', 'Nour', 'Lian', 'Fatima', 'Ayesha', 'Zahra', 'Sana', 'Zara', 'Alya', 'Shaista', 'Zoya', 'Yasmin' ] }

With the help of Pandas we will compile a table of names, their predominant ethnic origin and assessment of tonality:

 def name_sentiment_table(): frames = [] for group, name_list in sorted(NAMES_BY_ETHNICITY.items()): lower_names = [name.lower() for name in name_list] sentiments = words_to_sentiment(lower_names) sentiments['group'] = group frames.append(sentiments) # Сводим данные со всех этнических групп в одну большую таблицу return pd.concat(frames) name_sentiments = name_sentiment_table()

Sample data:

name_sentiments.ix[::25]

	key	Group
mohammed	0.834974	Arab / Muslim
alya	3.916803	Arab / Muslim
terryl	-2.858010	Black
josé	0.432956	Hispanic
luciana	1.086073	Hispanic
hank	0.391858	White
megan	2.158679	White

Make a graph of the distribution of tonality for each name.

 plot = seaborn.swarmplot(x='group', y='sentiment', data=name_sentiments) plot.set_ylim([-10, 10])

(-10, 10)

Or in the form of a histogram with confidence intervals for averages of 95%.

 plot = seaborn.barplot(x='group', y='sentiment', data=name_sentiments, capsize=.1)

Finally, run the statsmodels serious statistical package. It will show how great the bias of the algorithm is (along with a bunch of other statistics).

OLS Regression Results

Dep. Variable:	sentiment	R-squared:	0.208
Model:	Ols	Adj. R-squared:	0.192
Method:	Least squares	F-statistic:	13.04
Date:	Thu, 13 Jul 2017	Prob (F-statistic):	1.31e-07
Time:	11:31:17	Log-Likelihood:	-356.78
No. Observations:	153	AIC:	721.6
Df Residuals:	149	BIC:	733.7
Df Model:	3
Covariance Type:	nonrobust

F-statistic is the ratio of variation between groups to variation within groups, which can be taken as a general assessment of bias.

Immediately below it is the probability that we will see the maximum F-statistic with a null hypothesis: that is, if there is no difference between the compared options. The probability is very, very low. In a scientific article, we would call the result “very statistically significant.”

We need to improve the f-value. The lower the better.

ols_model.fvalue
13.041597745167659

Step 7. Try other data.

Now we have the opportunity to numerically measure the harmful bias of the model. Let's try to correct it. To do this, you need to repeat a bunch of things that used to be just separate steps in a Python notepad.

If I wrote good, supported code, I would not use global variables, such as model and embeddings . But the current spaghetti code allows you to better examine each step and understand what is happening. We reuse part of the code and at least define a function to repeat some steps:

 def retrain_model(new_embs): """ Повторяем шаги с новым набором данных. """ global model, embeddings, name_sentiments embeddings = new_embs pos_vectors = embeddings.loc[pos_words].dropna() neg_vectors = embeddings.loc[neg_words].dropna() vectors = pd.concat([pos_vectors, neg_vectors]) targets = np.array([1 for entry in pos_vectors.index] + [-1 for entry in neg_vectors.index]) labels = list(pos_vectors.index) + list(neg_vectors.index) train_vectors, test_vectors, train_targets, test_targets, train_labels, test_labels = \ train_test_split(vectors, targets, labels, test_size=0.1, random_state=0) model = SGDClassifier(loss='log', random_state=0, n_iter=100) model.fit(train_vectors, train_targets) accuracy = accuracy_score(model.predict(test_vectors), test_targets) print("Accuracy of sentiment: {:.2%}".format(accuracy)) name_sentiments = name_sentiment_table() ols_model = statsmodels.formula.api.ols('sentiment ~ group', data=name_sentiments).fit() print("F-value of bias: {:.3f}".format(ols_model.fvalue)) print("Probability given null hypothesis: {:.3}".format(ols_model.f_pvalue)) # Выводим результаты на график с совместимой осью Y plot = seaborn.swarmplot(x='group', y='sentiment', data=name_sentiments) plot.set_ylim([-10, 10])

We try word2vec

It can be assumed that only GloVe has a problem. Probably, in the Common Crawl base there are a lot of doubtful sites and at least 20 copies of the Urban Dictionary street slang dictionary. It may be better on another base: how about the good old word2vec, trained on Google News?

It seems the most authoritative source for word2vec data is this file on Google Drive . Download it and save as data/word2vec-googlenews-300.bin.gz .

 # Используем функцию ConceptNet для загрузки word2vec во фрейм Pandas из его бинарного формата from conceptnet5.vectors.formats import load_word2vec_bin w2v = load_word2vec_bin('data/word2vec-googlenews-300.bin.gz', nrows=2000000) # Модель word2vec чувствительна к регистру w2v.index = [label.casefold() for label in w2v.index] # Удаляем дубликаты, которые реже встречаются w2v = w2v.reset_index().drop_duplicates(subset='index', keep='first').set_index('index') retrain_model(w2v)

Accuracy of sentiment: 94.30%
F-value of bias: 15.573
Probability given null hypothesis: 7.43e-09

So word2vec was even worse with an F-value of more than 15.

In principle, it was foolish to expect the news to be better protected from bias.

We try ConceptNet Numberbatch

Finally, I can talk about my own project on the vector representation of words.

ConceptNet with the function of vector representations is the knowledge graph I work on. It normalizes vector representations at the training stage, identifying and removing some sources of algorithmic racism and sexism. This method of correcting bias is based on the scientific article by Bulukbasi et al. “Debiasing Word Embeddings” and is generalized to eliminate several types of bias simultaneously. As far as I know, this is the only semantic system in which there is something similar.

From time to time, we export precomputed vectors from ConceptNet — these releases are called ConceptNet Numberbatch . In April 2017, the first release came out with a bias correction, so we’ll load the English-speaking vectors and retrain our model.

numberbatch-en-17.04b.txt.gz , save in the data/ directory and retrain the model:

 retrain_model(load_embeddings('data/numberbatch-en-17.04b.txt'))

Accuracy of sentiment: 97.46%
F-value of bias: 3.805
Probability given null hypothesis: 0.0118

So did ConceptNet Numberbatch completely eliminate the problem? No more algorithmic racism? Not.

Racism has become much less? Definitely .

The tonality ranges for ethnic groups overlap much more than in the GloVe or word2vec vectors. Compared to GloVe, the value of F decreased more than three times, and compared to word2vec - more than four times. And in general, we see much smaller differences in tonality when comparing different names: this should be so, because names really should not affect the result of the analysis.

But a slight correlation still remained. Perhaps I can pick up such data and training parameters that the problem seems solved. But it will be a bad option, because in fact the problem remains, because in ConceptNet we have identified and compensated not all the causes of algorithmic racism. But this is a good start.

No pitfalls

Note that with the transition to ConceptNet Numberbatch, the accuracy of tonality prediction has improved.

Some might have suggested that the correction of algorithmic racism would worsen the results in some other way. But no. You may have data that is better and less racist. Данные реально улучшаются с этой коррекцией. Приобретённый от людей расизм word2vec и GloVe не имеет никакого отношения к точности работы алгоритма.

Другие подходы

Конечно, это только один способ анализа тональности. Какие-то детали можно реализовать иначе.

Вместо или в дополнение к смене векторной базы можно попытаться устранить эту проблему непосредственно в выдаче. Например, вообще устранить оценку тональности для имён и групп людей.

Есть вариант вообще отказаться от расчёта тональности всех слов, а рассчитывать её только для слов из списка. Пожалуй, это самая распространённая форма анализа тональности — вообще без машинного обучения. В результатах будет не больше предвзятости, чем у автора списка. Но отказ от машинного обучения означает уменьшение полноты (recall), а единственный способ адаптировать модель к набору данных — вручную отредактировать список.

В качестве гибридного подхода вы можете создать большое количество предполагаемых оценок тональности для слов и поручить человеку терпеливо их отредактировать, составить список слов-исключений с нулевой тональностью. Но это дополнительная работа. С другой стороны, вы действительно увидите, как работает модель. Думаю, в любом случае к этому следует стремиться.