There is a dataframe of the following form:

id Name Sex 0 Jack male 1 Andrew male 2 Andrew female 3 Jack male 4 Yuriy male 5 Johanna female 

Need to get the most frequently used female / male name. How can this be implemented?

2 answers 2

Series.value_counts() returns the number of occurrences for each value in the form of a series sorted in descending order of occurrences of the series, therefore the .idxmax() call is an unnecessary waste of resources.

Example:

 In [50]: df.groupby('Sex')['Name'].agg(lambda g: g.value_counts().index[0]).reset_index(name='Most_popular_name') Out[50]: Sex Most_popular_name 0 female Johanna 1 male Jack In [51]: df.groupby('Sex')['Name'].agg(lambda g: g.value_counts().index[0]).to_dict() Out[51]: {'female': 'Johanna', 'male': 'Jack'} 
  • sorting (O (n log n)) is not needed, max can be found in linear time. - jfs
  • @jfs, looks better without sorting. It is also possible to refuse sorting during grouping - MaxU
  • your answer uses sorting (this is a trifle, but since you raised this topic yourself (about performance), then adequate solutions should be proposed — I don’t dare say which option is faster for a specific input) - jfs

To print the most common female and male names:

 for sex in ['male', 'female']: print(df.loc[df.Sex==sex, 'Name'].value_counts(sort=False).idxmax()) 

Result:

 Jack Andrew 

Or as one expression:

 >>> df.groupby('Sex').agg(lambda g: g.value_counts(sort=False).idxmax())) Name Sex female Andrew male Jack 

Or explicitly choosing names:

 >>> top_names = df.groupby('Sex')['Name'].agg(lambda g: g.value_counts(sort=False).idxmax()) >>> top_names.to_dict() {'female': 'Andrew', 'male': 'Jack'} 
  • @MaxU is correct, the response by the link I gave is using index[0] . This is less readable, so I use explicit idxmax () after value_counts () (this does not degrade the O-large solution) ¶ If there is a need to improve the constants during the time complexity, then it will be possible to fasten. If the profiler shows that it is a bottleneck in the program. - jfs