There is a line "Count ComponentName RefDes PatternName Value TU Date of creation Date of change Manufacturer"

There is a need to split it into an array of strings, so that each element would be separate, something like:

1.Count 2.ComponentName . . . 7.Дата создания 8.Дата изменения 9.Производитель. 

If it were not for paragraphs 7 and 8, the question would not have arisen, for one could simply do .Split(' ') . But split divides points 7 and 8, which is not necessary.

Probably, somehow it can be done through regular expressions, but so far not at all. Tell me please?

  • The original line can not be changed so? Indeed, in essence, ComponentName could also be separated by a space. - alexoander
  • ComponentName, RefDes, PatternName are not separated by spaces. They are always without a space. But the 7 and 8 points are always with a space. With the line itself, nothing can be done, it is formed by another program. That is, in fact, I can operate with the line that came. - Darron
  • As an option - check the items on the letter in upper case. It is rather simple to make such check using linq. Just check each letter on "isUpper" and then form a block. It is also possible from the reverse - after performing Split, check all blocks for capital letters - if there is no capital at the beginning, then delete the current block, and attach the contents to the n-1 block. - alexoander
  • 2
    And what exactly is the division condition? Why "Date" and "changes" together, and "TU" and "Date" separately? Formulate a rule, without it the question does not make sense. - VladD
  • one
    Well, if the input line is fixed, just take the finished partition and do not fool yourself. - VladD

2 answers 2

From the description it is not entirely clear what the principle of separation is.

Suppose that you need to separate by words starting with a capital letter. Then you can use a regular expression, with it the code is very short.

 string input = "Count ComponentName RefDes PatternName Value ТУ Дата создания Дата изменения Производитель"; string pattern = @" (?=\p{Lu})"; var result = Regex.Split(input, pattern); 

Lu is a category of Unicode upper case letters .


If the separation principle is strictly based on the positions (indices) of the elements, then we use a simple clumsy code:

 var result = input.Split().ToList(); result[6] += " " + result[7]; result[8] += " " + result[9]; result.RemoveAt(9); result.RemoveAt(7); 

After splitting the string into spaces, concatenate the right words, then delete the extra ones.

     public static string[] doSome (long n) { // your code var string1 = "Count ComponentName RefDes PatternName Value ТУ Дата создания Дата изменения Производитель"; var res = string1.Split(' '); // Простой сплит для решения влоб var length = res.Length; var result = new String[length]; var tmpStr = ""; var counter = 0; //далее мы проверяем каждое слово на заглавную букву - проход идет от конца for (int i=length-1;i>=0;i--) { var substring = res[i]; if (substring[0].ToString() != substring[0].ToString().ToUpper()){ Console.WriteLine("NOT GOOD"); if (tmpStr !=String.Empty) tmpStr += " "; // добавим пробелы если название состоит из 3+ слов. tmpStr += substring; } else { // Заглавная буква - проверяем нашли ли мы что-то раньше if (tmpStr!=String.Empty) { // если да - то мы добавляем в текущий элемент то, что мы нашли result[counter] = substring +" "+ tmpStr; tmpStr = ""; } else { // иначе просто добавляемв готовый результат result[counter] = substring; } counter++; } } Array.Reverse(result);// инвернтируем т.к. проходили массив с конца. Console.WriteLine("-------------------------"); foreach (var sub in result) Console.WriteLine(sub); return result; } 

    I sketched a quick decision here, but it is not optimized in any way =). Just to have an idea. But with linq, I unfortunately not strong - you need to think.