The task is to load a text sample, normalize it into a numerical form, select the grid parameter. I have:

import sklearn from sklearn import svm, datasets newsgroups = datasets.fetch_20newsgroups(subset='all', categories=['alt.atheism', 'sci.space']) X=newsgroups.data #данные y=newsgroups.target #индексы from sklearn.feature_extraction.text import TfidfVectorizer as tfv vector = tfv() # TF-IDF data_X = vector.fit_transform(X) #тестовая выборка import numpy as np from sklearn.model_selection import KFold grid = {'C': np.power(10.0, np.arange(-5, 6))} #указание параметров словаря cv = KFold(n_splits=5, shuffle=True, random_state=241) #конструктор разбиений from sklearn.model_selection import GridSearchCV clf = sklearn.svm.SVC(kernel='linear', random_state=241) #классификатор gs = GridSearchCV(clf, grid, scoring='accuracy', cv=cv) #подбор параметров gs.fit(data_X, y) 

I want to get:

 for a in gs.grid_scores_: a.mean_validation_score # оценка качества по кросс-валидации a.parameters # значения параметров 

But this is for the old version. For the latter , something like this :

 gs.best_estimator_ gs.best_params_ gs.best_score_ 

    1 answer 1

    The most convenient way is to convert gs.cv_results_ (the result of setting up a network of parameters) to Pandas.DataFrame.

    Example:

     grid = { 'C': np.power(10.0, np.arange(-1, 2)), 'kernel': ['rbf', 'linear'], } gs = GridSearchCV(clf, grid, scoring='accuracy', cv=cv, n_jobs=-1, verbose=1, return_train_score=True) #подбор параметров gs.fit(data_X, y) 

    Parameters grid:

     In [22]: grid Out[22]: {'C': array([ 0.1, 1. , 10. ]), 'kernel': ['rbf', 'linear']} 

    Create a DataFrame from gs.cv_results_ :

     res = pd.DataFrame(gs.cv_results_) 

    Happened:

     In [19]: res Out[19]: mean_fit_time std_fit_time mean_score_time std_score_time ... split3_train_score split4_train_score mean_train_score std_train_score 0 3.368227 0.116977 0.814210 0.058673 ... 0.565430 0.549335 0.552632 0.007027 1 2.908841 0.158428 0.727257 0.025806 ... 0.961512 0.967810 0.964305 0.004109 2 3.504162 0.136986 0.799417 0.056557 ... 0.565430 0.549335 0.552632 0.007027 3 1.824952 0.080142 0.426784 0.016244 ... 0.999300 1.000000 0.999720 0.000343 4 3.601515 0.124061 0.748441 0.045909 ... 0.565430 0.549335 0.552632 0.007027 5 1.842038 0.092328 0.407099 0.009595 ... 1.000000 1.000000 1.000000 0.000000 [6 rows x 22 columns] In [20]: res.columns Out[20]: Index(['mean_fit_time', 'std_fit_time', 'mean_score_time', 'std_score_time', 'param_C', 'param_kernel', 'params', 'split0_test_score', 'split1_test_score', 'split2_test_score', 'split3_test_score', 'split4_test_score', 'mean_test_score', 'std_test_score', 'rank_test_score', 'split0_train_score', 'split1_train_score', 'split2_train_score', 'split3_train_score', 'split4_train_score', 'mean_train_score', 'std_train_score'], dtype='object') 

    We select only the columns of interest:

     In [21]: res.filter(regex='^(?:mean|param)') Out[21]: mean_fit_time mean_score_time param_C param_kernel params mean_test_score mean_train_score 0 3.368227 0.814210 0.1 rbf {'C': 0.1, 'kernel': 'rbf'} 0.552632 0.552632 1 2.908841 0.727257 0.1 linear {'C': 0.1, 'kernel': 'linear'} 0.950168 0.964305 2 3.504162 0.799417 1 rbf {'C': 1.0, 'kernel': 'rbf'} 0.552632 0.552632 3 1.824952 0.426784 1 linear {'C': 1.0, 'kernel': 'linear'} 0.993281 0.999720 4 3.601515 0.748441 10 rbf {'C': 10.0, 'kernel': 'rbf'} 0.552632 0.552632 5 1.842038 0.407099 10 linear {'C': 10.0, 'kernel': 'linear'} 0.993281 1.000000