Good day to all! There is such a task. There is some markup syntax defining tables. The syntax is very simple, it uses two characters (the # character to define the header cells of the table and the | character to restrict normal cells) and the line feed character (indicating the end of the table line). I will show by example:

# заголовок1 # заголовок2 # заголовок 3 # | ячейка1 стр1| ячейка2 стр1| ячейка3 стр1| | ячейка1 стр2| ячейка2 стр2| ячейка3 стр2| 

This simple construction must be parsed into this html-size:

 <table> <tr> <th>заголовок1</th> <th>заголовок2</th> <th>заголовок3</th> </tr> <tr> <td>ячейка1 стр1</td> <td>ячейка3 стр1</td> <td>ячейка4 стр1</td> </tr> <tr> <td>ячейка1 стр2</td> <td>ячейка3 стр2</td> <td>ячейка4 стр2</td> </tr> </table> 

I am not good at such things, so I would like to ask knowledgeable people how this can be done and where to dig at all? Unfortunately, using third-party libraries is impossible (these are the conditions of the problem). What algorithms and tools can be applied here? I would also like to clarify that the texts themselves with this markup can be quite voluminous, so productivity is also important. Thank you in advance!

  • What to do if the value of the cell is the symbol | ? - Grundy
  • @Grundy is currently unimportant. Let's imagine that there will be no such cells at all or that we have an algorithm that allows such characters to be escaped - Pupkin

2 answers 2

Alternatively, you can use a replacement with a regular expression.

C # allows you to use named groups

The regular expression might be:

 (?<startline>^)?((?<border>\|)?(?<header>#)?(?<cellvalue>[^#|]+))?(?<endline>[#|\r\n]+$)? 

As a match handler you can use the following function:

 m =>{ var replaced = new StringBuilder(); if (m.Index == 0) // если в самом начале строки - добавляем тег table replaced.AppendLine("<table>"); if (m.Groups["startline"].Success) // если попалась новая строка - добавляем тег tr replaced.AppendLine("<tr>"); if (m.Groups["border"].Success) // если нашли границу ячейки - вставляем значение обернутое в теги td replaced.AppendLine($"<td>{m.Groups["cellvalue"].Value.Trim()}</td>"); else if (m.Groups["header"].Success) // если нашли границу ячейки заголовка - вставляем значение обернутое в th replaced.AppendLine($"<th>{m.Groups["cellvalue"].Value.Trim()}</th>"); if (m.Groups["endline"].Success) // если дошли до конца строки - закрываем тег tr replaced.AppendLine("</tr>"); if (m.Index == table.Length) // если дошли до самого конца - закрываем тег table replaced.AppendLine("</table>"); return replaced.ToString(); // возвращаем результат } 

When launched with the RegexOptions.Multiline flag, the following result is obtained:

 <table> <tr> <th>заголовок1</th> <th>заголовок2</th> <th>заголовок 3</th> </tr> <tr> <td>ячейка1 стр1</td> <td>ячейка2 стр1</td> <td>ячейка3 стр1</td> </tr> <tr> <td>ячейка1 стр2</td> <td>ячейка2 стр2</td> <td>ячейка3 стр2</td> </tr> 

    I guess somehow it will be the fastest:

     string Convert(string str){ var sb = new StringBuilder(); sb.Append("<table>\n <tr>\n"); var header = false; for (int l = -1, i = 0; i < str.Length; i++){ switch (str[i]){ case '#': case '|': if (l > 0){ sb.Append(header ? " <th>" : " <td>"); sb.Append(str.Substring(l, i - l).Trim()); sb.Append(header ? "</th>\n" : "</td>\n"); } l = i + 1; header = str[i] == '#'; break; case '\n': l = -1; sb.Append(" </tr>\n <tr>\n"); break; } } sb.Append(" <tr>\n</table>"); return sb.ToString(); } 

    If you need more speed, you can try to pre-calculate the capacity for StringBuilder (say, StringBuilder how long the cell has an average) and get rid of Trim() (not sure which will help a lot, but you never know).

    By the way, if the task suddenly arises to screen the "|" and "#", you can add (and then replace .Trim() with a certain ConvertValue() , which cuts off the superfluous and screening value):

      case '\\': i++; break;