I participate in the Titanic: Machine Learning from Disaster competition on Kaggle. My kernel worked, but suddenly there were errors for the Random Forest and Gradient Boosting Classifier methods; I don’t remember any special changes.

X_train = train_df Y_train = targets X_test = test_df.copy() X_train.shape, Y_train.shape, X_test.shape Результат: ((891, 14), (891,), (418, 14)) # Random Forest rf = RandomForestClassifier(n_estimators=350) rf.fit(X_train, Y_train) Y_pred = random_forest.predict(X_test) rf.score(X_train, Y_train) acc_random_forest = round(random_forest.score(X_train, Y_train) * 100, 2) acc_random_forest # Gradient Boosting Classifier gb = GradientBoostingClassifier() gb.fit(X_train, Y_train) Y_pred = gbk.predict(X_test) acc_gbk = round(gbk.score(X_train, Y_train) * 100, 2) acc_gbk 

Mistake:

 ValueError: Number of features of the model must match the input. Model n_features is 12 and input n_features is 14 

I would be very grateful if someone can tell what could be the problem. If necessary, I can give a link to the kernel, provide other data.

Closed due to the fact that off-topic participants mkkik , Yaant , Vadizar , 0xdb , aleksandr barakin 25 Apr at 13:38 .

It seems that this question does not correspond to the subject of the site. Those who voted to close it indicated the following reason:

  • "The question is caused by a problem that is no longer reproduced or typed . Although similar questions may be relevant on this site, solving this question is unlikely to help future visitors. You can usually avoid similar questions by writing and researching a minimum program to reproduce the problem before publishing the question. " - mkkik, Yaant, Vadizar, 0xdb, aleksandr barakin
If the question can be reformulated according to the rules set out in the certificate , edit it .

  • it is better to give a link to the kernel ... Which of the two models gives an error - RandomForest? Can you display the dimensions X_train.shape , Y_train.shape immediately before rf.fit(X_train, Y_train) and before gb.fit(X_train, Y_train) ? - MaxU
  • @MaxU, Logistic Regression, K-Nearest Neighbors, Support Vector Machines, Gaussian Naive Bayes, Decision Tree - work. Presented in the question - no. Dimensions derived immediately before use. Link to kernel: kaggle.com/distherion/titanic-prediction - smlcrm.
  • @MaxU, derived dimensions and information about datasets directly using. Judging by the error, the dimensions of X_train and X_test do not match, but this is not the case ... - smlcrm.
  • one
    in your kernel, you train the rf object, and predict using a random_forest object, which is not declared anywhere ?? Yes, and I did not see errors from the question in the kernel - MaxU
  • @MaxU, the worst mistake, it all worked. Thank you so much, apparently from fatigue, I notice nothing already! Make out as the answer, I will put a plus sign. The error was the one that in the question, I can change it back and make a screen, but it is hardly necessary ... - smlcrm.

1 answer 1

In the following code block:

 rf = RandomForestClassifier(n_estimators=350) rf.fit(X_train, Y_train) Y_pred = random_forest.predict(X_test) 

rf object is trained, and for prediction the object random_forest , which is not declared anywhere.

Try replacing random_forest with rf .