Scikit-learn. What is the probability of entering the group

Question

for doc, category in zip(docs_new, predicted): print('%r => %s' % (doc, twenty_train.target_names[category]))

How to find out with what probability the algorithm determined that the text belongs to this group?

Here is the full code:

 from sklearn.datasets import load_files categories = ['first', 'second', 'third'] twenty_train = load_files('db', categories=categories, shuffle=False, encoding='utf-8') from sklearn.feature_extraction.text import CountVectorizer count_vect = CountVectorizer() X_train_counts = count_vect.fit_transform(twenty_train.data) from sklearn.feature_extraction.text import TfidfTransformer tfidf_transformer = TfidfTransformer() X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts) print(X_train_tfidf.shape) from sklearn.naive_bayes import MultinomialNB clf = MultinomialNB().fit(X_train_tfidf, twenty_train.target) docs_new = [str1, str2] X_new_counts = count_vect.transform(docs_new) X_new_tfidf = tfidf_transformer.transform(X_new_counts) predicted = clf.predict(X_new_tfidf) for doc, category in zip(docs_new, predicted): print('%r => %s' % (doc, twenty_train.target_names[category]))

It is better to give the corresponding part of the code ... What is it ` docs_new ?

MaxU MaxU 52.5k 6 18 51 · Accepted Answer · 2018-02-01T19:26:52

Use the predict_proba () method.

Example:

Initial data:

 In [19]: X = np.random.randint(5, size=(6, 100)) In [20]: y = np.array([1, 2, 3, 4, 5, 6]) In [21]: clf = MultinomialNB()

we train model:

 In [22]: clf.fit(X, y) Out[22]: MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

predict class:

 In [23]: clf.predict(X[2:3]) Out[23]: array([3])

all classes:

 In [24]: clf.classes_ Out[24]: array([1, 2, 3, 4, 5, 6])

predict probabilities for all classes:

 In [25]: clf.predict_proba(X[2:3]) Out[25]: array([[ 4.69205412e-31, 9.16479809e-30, 1.00000000e+00, 2.47492746e-28, 2.13947776e-31, 2.04949820e-34]])

Would you like to say that this cannot be applied to my code?
@ user277248, why not - just instead of clf.predict(...) use clf.predict_proba(...)
Or maybe you will also recommend some reference material to me.

Scikit-learn. What is the probability of entering the group

1 answer 1

More articles: