Hello.

Suppose there is a line of this type: Youremail @ gmail.comAnotheremail@gmail.com How to parse this line so that the result is a sheet in which each element is a separate mail, i.e. youremail@gmail.com anotheremail@gmail.com

And yes, there are a lot of such elements, i.e. the solution should not be too slow

UPD thank you all

  • Is it guaranteed that email addresses are correct and located in existing level 1 domain zones? - Schullz
  • Yes, guaranteed. - lounah
  • I can write the code of some "crutch" in C # or C ++, can you "translate"? - Schullz
  • Yes of course. But the crutch and I now write using StringBuffer, finding ".com" in the string, and then adding "" to the string, and then splitting the string into separate elements through the split, adding them to the list - lounah
  • and then the crutch will push the user with mail.ru mail - and everything will be bad: D - Schullz

3 answers 3

Since the author clarified in the comments that the C ++ code also suits him, and clarified “TK”, I quote the code that separates the line containing several valid gmail.com addresses into the address list

#include <iostream> #include <cstdio> #include <string> #include <vector> using namespace std; vector<string> getAddrs(string s) { const string tail = "@gmail.com"; string lst; string cur; vector<string> res; for (int i = 0; i < (int) s.size(); i++) { if (lst.size() >= tail.size()) lst = lst.substr(1); lst += s[i]; cur += s[i]; if (lst == tail) { res.push_back(cur); cur = ""; } } return res; } int main() { string str = "Youremail@gmail.comAnotheremail@gmail.com"; vector<string> res = getAddrs(str); for (int i = 0; i < (int) res.size(); i++) cout << res[i] << endl; getchar(); return 0; } 
  • the same solution)) thanks) right now I’ll measure how fast it will process 5kk elements and write them into another txt - lounah
  • one
    There is a risk of running into the address type my.cm@gmail.com . It’s a pity, the author didn’t immediately indicate in the question that there is only gmail there - then do the non-fig. For simplicity, I would have run through the line, replacing all @gmail.com (starting with @ , so that it is guaranteed not to run into a strange address) with spaces, and I would get a string of names. Then add to each @gmail.com - already generally do not know. - Harry
  • @Harry there were fears that you can add points to gmail too - and get some g.ma.il. Clarified - not. But I corrected the decision. In general, yes, with replacing "@ gmail.com" with spaces followed by splitting the line - a much more beautiful solution, almost without crutches - Schullz

I would recommend that you make these emails separated by something essential that they cannot enter the name of the mail, for example, two asterisks ** or two dollars $$ , and then it’s banal to split lines into this separator via .split("\\*\\*") and fold it into a sheet. There are fewer hemorrhoids on your head and you don’t have to turn the brain in regular intervals and other nonsense.

The split method returns a new array. The string is beaten by the delimiter specified by the first argument.

Example

 String Str = new String("Youremail@gmail.com$$Anotheremail@gmail.com"); List<String> emailList = new ArrayList<String>(Arrays.asList(Str.split("\\$\\$"))); System.out.println("Emails:" ); for (String email: emailList){ System.out.println(email); } 

will output:

 Emails: Youremail@gmail.com Anotheremail@gmail.com 

Here is an option in case it is known that all mailboxes will be on a specific domain. So we will know that they will all end on @somedomain2level.domain1level

Accordingly, you can set this parameter to search in the string, and in the loop by this parameter find each mail by moving the cursor to the next position to search

 String string = "Youremail@gmail.comAnotheremail@gmail.comSomeemail@gmail.com"; int prevPos = 0, nextPos; List<String> emailList = new ArrayList<>(); final String SEARCH_STR = "@gmail.com"; final int SEARCH_STR_LENGTH = SEARCH_STR.length(); /**** start code *****/ while (true) { nextPos = string.indexOf(SEARCH_STR, prevPos); if (nextPos == -1) break; emailList.add(string.substring(prevPos, nextPos + SEARCH_STR_LENGTH)); prevPos = nextPos + SEARCH_STR_LENGTH; } /**** end code *****/ for (String test : emailList) { System.out.println(test); } 

https://ideone.com/OkRYll

  • Thank you, but the fact is that the string that the parser receives at the input is already prepared (that is, we can insert the separators only manually, changing the text file that is parsing), i.e. this option disappears - lounah
  • @Schepalin you have all emails in one text file in this text file? Or is everyone on a new line? If in one, then where do they come from to be stored in one line? - Alexey Shimansky
  • Mailing addresses are about 5 million, they are located together, somewhere 7-10 mails per line - lounah
  • Anyway, thanks for the reply - lounah
  • @Schepalin added the fact that described in the comments. You can try. What if ... - Alexey Shimansky
 public static List<String> parseEmails(String emails, String domain) { List<String> list = Arrays.asList(emails.split(domain)); String temp; for(int i = 0; i<list.size(); i++) { temp = list.get(i); list.set(i, temp + domain); } return list; } public static void main(String []args){ List<String> mailList = parseEmails("Youremail@gmail.comAnotheremail@gmail.com", "@gmail.com"); for(String mail : mailList) { System.out.println(mail); } } 
  • It is customary to give explanations to the answers, not just the code. - AivanF.
  • Try to write more detailed answers. I am sure the author would be grateful for your expert commentary on the subject matter. - Nicolas Chabanovsky