The task is to train a random forest with a different number of trees from 1 to 50 and for each of the options to evaluate the quality of the forest obtained for cross-validation of 5 blocks ( sklearn.metrics.r2_score
). I wrote this cycle:
from sklearn.ensemble import RandomForestRegressor from sklearn.cross_validation import KFold from sklearn.metrics import r2_score P_scores = [] p = np.linspace(1.0, 50.0, num=50) p1 = np.array(p) kf = KFold(4176, n_folds=5, random_state=1, shuffle=True) P = 1 while P < len(p1): regressor = RandomForestRegressor(n_estimators=P, random_state=1) regressor.fit(X, Y) predictions = regressor.predict(X) r2_score(Y, predictions) P_scores.append(r2_score) print(P_scores) P += 1
The result is a vector consisting of elements:
print(P_scores) [<function r2_score at 0x0000023304775BF8>, <function r2_score at 0x0000023304775BF8>,...,<function r2_score at 0x0000023304775BF8>]
Although I expected the result of this:
y_true = [3, -0.5, 2, 7] y_pred = [2.5, 0.0, 2, 8] r2_score(y_true, y_pred) 0.948...
Only recorded in the form of a vector column. Naturally I can not find out the minimum:
min(P_scores) TypeError: unorderable types: function() < function()
Why am I getting non-numeric data? How do I get the numerical parameters of the estimates?