The task is to train a random forest with a different number of trees from 1 to 50 and for each of the options to evaluate the quality of the forest obtained for cross-validation of 5 blocks ( sklearn.metrics.r2_score ). I wrote this cycle:

 from sklearn.ensemble import RandomForestRegressor from sklearn.cross_validation import KFold from sklearn.metrics import r2_score P_scores = [] p = np.linspace(1.0, 50.0, num=50) p1 = np.array(p) kf = KFold(4176, n_folds=5, random_state=1, shuffle=True) P = 1 while P < len(p1): regressor = RandomForestRegressor(n_estimators=P, random_state=1) regressor.fit(X, Y) predictions = regressor.predict(X) r2_score(Y, predictions) P_scores.append(r2_score) print(P_scores) P += 1 

The result is a vector consisting of elements:

 print(P_scores) [<function r2_score at 0x0000023304775BF8>, <function r2_score at 0x0000023304775BF8>,...,<function r2_score at 0x0000023304775BF8>] 

Although I expected the result of this:

 y_true = [3, -0.5, 2, 7] y_pred = [2.5, 0.0, 2, 8] r2_score(y_true, y_pred) 0.948... 

Only recorded in the form of a vector column. Naturally I can not find out the minimum:

 min(P_scores) TypeError: unorderable types: function() < function() 

Why am I getting non-numeric data? How do I get the numerical parameters of the estimates?

    1 answer 1

    try to change:

     r2_score(Y, predictions) P_scores.append(r2_score) 

    on:

     P_scores.append(r2_score(Y, predictions)) 

    P_scores.append(r2_score) - adds a function reference to the array instead of the return value (s)

    Demonstration:

     In [38]: y_true = [3, -0.5, 2, 7] In [39]: y_pred = [2.5, 0.0, 2, 8] In [40]: from sklearn.metrics import r2_score 

    This is a function link:

     In [41]: r2_score Out[41]: <function sklearn.metrics.regression.r2_score> 

    and here we get the result of calling a function with parameters:

     In [42]: r2_score(y_true, y_pred) Out[42]: 0.94860813704496794