I have a big text. I want to get an array of words separated by punctuation marks (full stop, comma, colon, semicolon, line break) from this text. Help me please.

  • And there are still such words по-русски . - Visman
  • Yes, I did not think. Thanks, corrected. - Andrey Kuznetsov

2 answers 2

You can make NSCharacterSet delimiters and break the string into parts:

 NSString *text = @"..."; // ваш большой текст NSCharacterSet *separatorsSet = [NSCharacterSet characterSetWithCharactersInString:@".,:;\n"]; NSArray *words = [text componentsSeparatedByCharactersInSet:separatorsSet]; 

    Found the solution through NSRegularExpression . With this pattern, only words (letters, numbers, dashes, underscores) remain

     NSRegularExpression *regex = [[NSRegularExpression alloc] initWithPattern:@"([a-zA-Z0-9_-]+)" options:NSRegularExpressionCaseInsensitive error:nil]; NSString* sourceString = @"This Web site includes information about Project Gutenberg-tm, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks."; NSArray<NSTextCheckingResult *> *matches = [regex matchesInString:sourceString options:NSMatchingWithoutAnchoringBounds range:NSMakeRange(0, sourceString.length)]; NSMutableArray *words = [NSMutableArray array]; for (NSTextCheckingResult *match in matches) { [words addObject:[[sourceString substringWithRange:match.range] lowercaseString]]; } 

    At the output we get an array of lowercase words.