There is a csv file. In it, the lines in the format 1234,123456 are the correct format of the line. But in it the wrong lines come across, for example 123,123456 or 1234,1234. And also sometimes come across lines such 01OD, 123456 or 012№6,123456.

The task is to take the lines of the correct format from this csv, create a new csv with only the correct content. It’s easy to cope with the first two options, if you only need to make an if (str.IndexOf ("," == 4 && str.Lenght == 11) condition), then these options do not fall. , Besides ","?

  • Is this a one-time task? So it is possible formulas in Excel to hide all the curves formats and copy the necessary. - iluxa1810
  • Judging by your question, the file size is 1.5 GB, 120 million lines? Regulars or string.Split + int.Parse will create a bunch of garbage. If you need high performance, you need to manually parse the byte stream. - Alexander Petrov
  • @AlexanderPetrov Yes, that is the question. Can you tell me how to do this? - Andrey Sherman

2 answers 2

Slow but short way

Regex rx = new Regex(@"\d{4},\d{6}", RegexOptions.Compiled); bool IsStringValidRegex(string str) { return rx.IsMatch(str); } 

Fast but long

 bool IsStringValid(string str) { int index = 0; int digitsBefore = 0; int digitsAfter = 0; while (str.Length > index && char.IsDigit(str[index])) { digitsBefore++; index ++; } if (str.Length <= index || str[index]!=',') return false; index++; while (str.Length > index && char.IsDigit(str[index])) { digitsAfter++; index++; } return digitsBefore == 4 && digitsAfter == 6 && index == str.Length; } 

How to use

 Console.WriteLine(IsStringValid("1234,123456")); Console.WriteLine(IsStringValid("123,123456")); Console.WriteLine(IsStringValid("1234,1234")); Console.WriteLine(IsStringValid("01ОД,123456")); Console.WriteLine(IsStringValid("012№6,123456")); Console.WriteLine("---------------------------"); Console.WriteLine(IsStringValidRegex("1234,123456")); Console.WriteLine(IsStringValidRegex("123,123456")); Console.WriteLine(IsStringValidRegex("1234,1234")); Console.WriteLine(IsStringValidRegex("01ОД,123456")); Console.WriteLine(IsStringValidRegex("012№6,123456")); 

Conclusion:

 True False False False False -------------------------- - True False False False False 

    For example, use the regular expression: \d{4},\d{6} and set RegEx.IsMatch