for doc, category in zip(docs_new, predicted): print('%r => %s' % (doc, twenty_train.target_names[category])) 

How to find out with what probability the algorithm determined that the text belongs to this group?

Here is the full code:

 from sklearn.datasets import load_files categories = ['first', 'second', 'third'] twenty_train = load_files('db', categories=categories, shuffle=False, encoding='utf-8') from sklearn.feature_extraction.text import CountVectorizer count_vect = CountVectorizer() X_train_counts = count_vect.fit_transform(twenty_train.data) from sklearn.feature_extraction.text import TfidfTransformer tfidf_transformer = TfidfTransformer() X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts) print(X_train_tfidf.shape) from sklearn.naive_bayes import MultinomialNB clf = MultinomialNB().fit(X_train_tfidf, twenty_train.target) docs_new = [str1, str2] X_new_counts = count_vect.transform(docs_new) X_new_tfidf = tfidf_transformer.transform(X_new_counts) predicted = clf.predict(X_new_tfidf) for doc, category in zip(docs_new, predicted): print('%r => %s' % (doc, twenty_train.target_names[category])) 
  • one
    specify which classifier you used. It is better to give the corresponding part of the code ... What is it ` docs_new ? What does twenty_train look twenty_train ? - MaxU
  • @MaxU Added the full code above - user277248

1 answer 1

Use the predict_proba () method.

Example:

Initial data:

 In [19]: X = np.random.randint(5, size=(6, 100)) In [20]: y = np.array([1, 2, 3, 4, 5, 6]) In [21]: clf = MultinomialNB() 

we train model:

 In [22]: clf.fit(X, y) Out[22]: MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True) 

predict class:

 In [23]: clf.predict(X[2:3]) Out[23]: array([3]) 

all classes:

 In [24]: clf.classes_ Out[24]: array([1, 2, 3, 4, 5, 6]) 

predict probabilities for all classes:

 In [25]: clf.predict_proba(X[2:3]) Out[25]: array([[ 4.69205412e-31, 9.16479809e-30, 1.00000000e+00, 2.47492746e-28, 2.13947776e-31, 2.04949820e-34]]) 
  • I, unfortunately, are just starting. Would you like to say that this cannot be applied to my code? - user277248
  • @ user277248, why not - just instead of clf.predict(...) use clf.predict_proba(...) - MaxU
  • The class. Thank! Or maybe you will also recommend some reference material to me. Straight to absolutely for dummies :) - user277248