Help please solve the problem!

I have a df (2178x4). It is necessary to calculate the average distance for m nearest neighbors for each point. Those. if m = 3, you need to select the three nearest neighbors, add the distances to them and divide by three. Then sort the resulting values ​​in ascending order and plot. Thank you in advance!

import pandas as pd df=pd.read_csv('quake_clear.csv') df.head() 

PS Data (CSV file) can be downloaded from here

2 answers 2

if I understand the problem correctly ...

 import pandas as pd import numpy as np from sklearn.metrics.pairwise import pairwise_distances df = pd.read_csv(r'C:\download\quake_clear.csv').dropna(how='all') # pairwise distance for all points d = pairwise_distances(df[['Latitude','Longitude']]) # ignore distance to itself d[d==0] = np.inf # compute the average distance to 3 nearest neighbours y = np.sort(d, axis=1)[:, :3].mean(axis=1) # plot... plt.plot(range(len(y)), y) 

enter image description here

  • yes yes, this is what was needed, I understood everything! Thank you very much!! - Progging

If this is a training task, then programming yourself as described above is probably the best way to learn. But if you use this method to solve any real problem, then I would still recommend using KNeighborsClassifier from scikit-learn.

It connects like this:

 from sklearn.neighbors import KNeighborsClassifier 

There you can select metrics and optimize the speed of finding a solution and choose different distances for the parameters - in general, everything that may be required for work.

  • It is not entirely clear how to train this model. Those. what to specify as y when calling .fit(X, y) ? - MaxU 1:59 pm
  • The classic example from the documentation X = [[0], [1], [2], [3]] y = [0, 0, 1, 1] from sklearn.neighbors import KNeighborsClassifier neigh = KNeighborsClassifier (n_neighbors = 3) neigh. fit (X, y) KNeighborsClassifier (...) print (neigh.predict ([[1.1]])) print (neigh.predict_proba ([[0.9]])) (Sorry, it seems that the code tag is not used here, but the example shows what to specify as X and y). - passant
  • I understood an example - I did not understand where the author will take the values ​​for the variable “y” - MaxU
  • Thanks for the advice! The task is educational, I will try to solve it myself, but sometimes it does not work)) - Progging
  • @MaxU is a tutorial "A Muller, an introduction to machine learning using Python", in which the fit () method is often used in different algorithms, and there you can understand where y comes from. If anything I can send a tutorial to your mail) - Progging