When working with dataset "titanic" I encountered the following problem: it is necessary to select a name depending on the title of the passenger, the allocation rules are different, I wrote a function:

def name_extract(x): title=x['Title'] name=x['Name'] if title in ['Mr','Miss']: return name.split('.')[1].split(' ')[1].strip() elif title in ['Mrs']: return name.split('(')[1].split(' ')[1].strip() 

The first part of the function for Mr and Miss works without problems, but the second gives an error

 IndexError: ('list index out of range', 'occurred at index 20') 

Dataset contains the Name field - this is the full name of the passenger, from which the First name must be distinguished. The rules are different depending on the Title , for example, for Mr with a space, and for Mrs with a bracket:

Braund, Mr. Owen Harris, Futrelle, Mrs. Jacques Heath (Lily May Peel)

Title is also allocated by the rules and grouped. This stage is solved without problems.

 def name_extract(word): return word.split(',')[1].split('.')[0].strip() def replace_titles(x): title=x['Title'] if title in ['Don', 'Major', 'Capt', 'Jonkheer', 'Rev', 'Col', 'Master','Sir']: return 'Mr' elif title in ['Countess', 'Mme','Lady']: return 'Mrs' elif title in ['Mlle', 'Ms']: return 'Miss' elif title =='Dr': if x['Sex']=='Male': return 'Mr' else: return 'Mrs' else: return title 
  • one
    Can you elaborate on the task? What should I write in x? What's on the 'Mr','Miss' lists? Solution requires more data. Desirable for more code. The error says that somewhere you requested a list item by index, which is not in the list. - New Python Programmist
  • one
  • @MaxU added code and explanation in a post - Alexander Botvin
  • @NewPythonProgrammist, yes added information - Alexander Botvin
  • What do you want to extract in the case of 'Futrelle, Mrs. Jacques Heath (Lily May Peel)' 'Futrelle, Mrs. Jacques Heath (Lily May Peel)' ? - MaxU

1 answer 1

Use the Series.str.extract () method:

 train = pd.read_csv(r'C:\download\data\titanic\train.csv') train[['Last','Title']] = (train['Name'] .str.extract(r'(\w+),\s*(\w+\.)', expand=True)) train['First'] = (train['Name'] .str.extract(r'\((\w+)\s*', expand=True) .fillna(train['Name'].str.extract(r',\s*\w+\.\s*(\w+)\s*', expand=True))) 

Result:

 In [40]: train[['Title','First','Last','Name']] Out[40]: Title First Last Name 0 Mr. Owen Braund Braund, Mr. Owen Harris 1 Mrs. Florence Cumings Cumings, Mrs. John Bradley (Florence Briggs Th... 2 Miss. Laina Heikkinen Heikkinen, Miss. Laina 3 Mrs. Lily Futrelle Futrelle, Mrs. Jacques Heath (Lily May Peel) 4 Mr. William Allen Allen, Mr. William Henry 5 Mr. James Moran Moran, Mr. James 6 Mr. Timothy McCarthy McCarthy, Mr. Timothy J 7 Master. Gosta Palsson Palsson, Master. Gosta Leonard 8 Mrs. Elisabeth Johnson Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) 9 Mrs. Adele Nasser Nasser, Mrs. Nicholas (Adele Achem) .. ... ... ... ... 881 Mr. Johann Markun Markun, Mr. Johann 882 Miss. Gerda Dahlberg Dahlberg, Miss. Gerda Ulrika 883 Mr. Frederick Banfield Banfield, Mr. Frederick James 884 Mr. Henry Sutehall Sutehall, Mr. Henry Jr 885 Mrs. Margaret Rice Rice, Mrs. William (Margaret Norton) 886 Rev. Juozas Montvila Montvila, Rev. Juozas 887 Miss. Margaret Graham Graham, Miss. Margaret Edith 888 Miss. Catherine Johnston Johnston, Miss. Catherine Helen "Carrie" 889 Mr. Karl Behr Behr, Mr. Karl Howell 890 Mr. Patrick Dooley Dooley, Mr. Patrick [891 rows x 4 columns] 
  • with expand does not start, but I hurried, gives an error, but it works. Thank! - Alexander Botvin
  • @AlexanderBotvin, what is your version of Pandas? - MaxU
  • how to see it? - Alexander Botvin
  • @AlexanderBotvin, pd.show_versions() - MaxU
  • pandas version: 0.22.0 - Alexander Botvin