It is necessary to implement one of the methods for solving the problem of classifying statistical data: the support vector method or the k nearest neighbors method. During the work required:

to make the initial data visualization in the form of a scattering graph to form a data model to produce a model training to test the model

from __future__ import division import pandas as pd url = r'https://archive.ics.uci.edu/ml/' \ 'machine-learning-databases/iris/iris.data' df = pd.read_csv(url, header=None) df.columns = [u'Π§Π°ΡˆΠ΅Π»ΠΈΡΡ‚ΠΈΠΊ Π΄Π»ΠΈΠ½Π°, см', u'Π§Π°ΡˆΠ΅Π»ΠΈΡΡ‚ΠΈΠΊ ΡˆΠΈΡ€ΠΈΠ½Π°, см', u'ЛСпСсток Π΄Π»ΠΈΠ½Π°, см', u'ЛСпСсток ΡˆΠΈΡ€ΠΈΠ½Π°, см', 'Class'] import numpy as np def test_and_train(df, proportion): mask = np.random.rand(len(df)) < proportion return df[mask], df[~mask] train, test = test_and_train(df, 0.67) from math import sqrt def euclidean_distance(instance1,instance2): squares = [(ij)**2 for i,j in zip(instance1,instance2)] return sqrt(sum(squares)) import operator def get_neighbours(instance, train,k): distances = [] for i in train.ix[:,:-1].values: distances.append(euclidean_distance(instance,i)) distances = tuple(zip(distances, train[u'Class'].values)) return sorted(distances,key=operator.itemgetter(0))[:k] from collections import Counter def get_response(neigbours): return Counter(neigbours).most_common()[0][0][1] def get_predictions(train, test, k): predictions = [] for i in test.ix[:,:-1].values: neigbours = get_neighbours(i,train,k) response = get_response(neigbours) predictions.append(response) return predictions def mean(instance): return sum(instance)/len(instance) def get_accuracy(test,predictions): return mean([i == j for i,j in zip(test[u'Class'].values, predictions)]) get_accuracy(test,get_predictions(train, test, 5)) import pylab as pl from sklearn.neighbors import KNeighborsClassifier import pylab as pl variables = [u'Π§Π°ΡˆΠ΅Π»ΠΈΡΡ‚ΠΈΠΊ Π΄Π»ΠΈΠ½Π°, см',u'Π§Π°ΡˆΠ΅Π»ΠΈΡΡ‚ΠΈΠΊ ΡˆΠΈΡ€ΠΈΠ½Π°, см', u'ЛСпСсток Π΄Π»ΠΈΠ½Π°, см',u'ЛСпСсток ΡˆΠΈΡ€ΠΈΠ½Π°, см'] results = [] for n in range(1,51,2): clf = KNeighborsClassifier(n_neighbors=n) clf.fit(train[variables], train[u'Class']) preds = clf.predict(test[variables]) accuracy = np.where(preds==test[u'Class'], 1, 0).sum() / float(len(test)) print("Neighbors: %d, Accuracy: %3f" % (n, accuracy)) results.append([n, accuracy]) results = pd.DataFrame(results, columns=["n", "accuracy"]) pl.plot(results.n, results.accuracy) pl.title("Accuracy with Increasing K") pl.show() 
  • And what exactly is your code wrong? What specific problems have arisen? - m9_psy
  • produce training model to test the model - drako08
  • A good way to ask a question is to describe a specific problem with the code, and give a minimum of code with which you can reproduce and otdebazhit problem. Now everything is too vague, foggy, it is not clear what the problem is without context, and it is quite possible that it relates more to statistical methods than to programming. Understand this clearly can not. In this form, most likely no one will deal with the problem. - Pavel Gurkov

0