String conversion by regular expressions

Question

There is a char * string:

"message=sometext <0><0:34><0><0:31><0><1:24> sometext"

How to bring this line to char *:

 "message=sometext (0-0:34)(0-0:31)(0-1:24) sometext"

Please describe the input and output format so that we do not guess.
Describe how the input line has the right to look, what data is in it, and how it would be desirable to place this data in the output line.
If arbitrary, does he have the right to contain the character < ?
 <0> <0:34>  <0> <0:31>  <0> <1:24> replace this part with (0-0: 34) (0-0: 31) (0-1 : 24) without affecting anything else.
Specify what to do from <0 xaxa> pppt <2:44> or <0:34> <0> <1> <2:22> <3:33> <0> <1:24> or << 0> <> <4:77 <xaxa >> It is advisable (if you can) describe the grammar of allowable expressions (preferably BNF or Wirth diagrams), if it does not work out, then there are more examples of "correct" and " incorrect " expressions.

Accepted Answer · 2013-01-15T13:44:52

@Kenpachi , I remembered a little, practiced.

If this is it

 avp@avp-xub11:~/hashcode$ g++ regex1.c avp@avp-xub11:~/hashcode$ ./a.out comp = 0 re_nsub = 1 errc = 8 [Success] > <1><0:9> result = [(1-0:9)] > <1> yye <2><3><4:88> <5><5:6> <7> <8:99> result = [<1> yye <2>(3-4:88) (5-5:6) <7> <8:99>] > qrwr result = [qrwr] > message=sometext <0><0:34><0><0:31><0><1:24> sometext result = [message=sometext (0-0:34)(0-0:31)(0-1:24) sometext] > avp@avp-xub11:~/hashcode$

That is the code. True replacement did carelessly

 <цифра1><цифра2:несколькоЦифр> меняем на (цифра1-цифра2:несколькоЦифр)

only for hard-coded regexp. The principle of work, I think will be clear (the result is a copy of the line). In general, see man 7 regex and man 3 regexec

 #include <stdio.h> #include <stdlib.h> #include <string.h> #include <ctype.h> #include <regex.h> main (int ac, char *av[]) { regex_t reg; char str[1000], buf[1000]; int comp = regcomp (&reg, av[1]? av[1]: "(<[0-9]><[0-9]:[0-9]+>)", REG_EXTENDED); regmatch_t *match = (regmatch_t *)malloc(sizeof(*match)*reg.re_nsub+1); int l = regerror (comp, &reg, buf, 1000); printf ("comp = %d re_nsub = %d errc = %d [%s]\n", comp,reg.re_nsub, l,buf); while (puts("> "), fgets(str,1000,stdin)) { str[strlen(str)-1] = 0; #if DEBUG int rc = regexec (&reg, str, reg.re_nsub, match, 0); l = regerror (rc, &reg, buf, 1000); printf ("rc = %d errc = %d [%s]\n",rc,l,buf); int i; for (i = 0; i < reg.re_nsub; i++) { printf ("match[%d] rm_so=%d rm_eo=%d\n", i, match[i].rm_so, match[i].rm_eo); if (match[i].rm_so >= 0) { strncpy(buf, str+match[i].rm_so, l=match[i].rm_eo-match[i].rm_so); buf[l] = 0; printf ("'%s'\n",buf); } } #else char *p = str, *q = buf; *q = 0; while (regexec (&reg, p, reg.re_nsub, match, 0) == 0) { int so = match[0].rm_so, eo = match[0].rm_eo; strncat(q,p,so); q += so; *q++ = '('; // replace first '<' to '(' *q++ = *(p+so+1); // copy first digit *q++ = '-'; *q = 0; // prepare to copy dig:digits strncat(q,p+so+4,l = eo-so-5); q += l; *q++ = ')'; *q = 0; p += match[0].rm_eo; #if DEBUG2 printf ("rest = [%s]\n",p); printf ("ebuf = [%s]\n",buf); #endif } // printf ("eol = [%s]\n",p); strcat(q,p); printf ("result = [%s]\n", buf); #endif } }

UPDATE

Here is the promised replacement copy code for regular expressions in C (C ++) (made in Ubuntu).

 // regrepl.h avp 2013 C (C++) копия строки с заменой по regexec() #ifndef _REGREPL_H_ #define _REGREPL_H_ #include <string.h> #include <ctype.h> #include <regex.h> /* * regreplace (regex_t *reg, const char *str, void *replexp[], u_int count) * функция возвращает копию строки str с заменой частей, * сопоставившихся с регулярным выражением reg для regexec() * (см. man 3 regexec, man 7 regex) * в соответствии с выражением замены replexp. * Производится максимум count последовательных замен. * * replexp выражение замены представлено в памяти массивом * типа void *. В нулевом элементе содержится общее количество * элементов, включая нулевой. В остальных либо номер в pmatch[] * из regexec (соответствует номеру скобочного подвыражения в * reg), либо адрес строки символов (строка может быть пустой), * вставляемой между элементами pmatch[]. * * Например: * void *repa[2]; repa[0] = (void *)2; repa[1] = (void *)""; * удалит вхождение reg в str * void *repa[2]; repa[0] = (void *)2; repa[1] = 0; * сделает копию str, при сопоствлении с reg * void *repa[2]; repa[0] = (void *)2; repa[1] = (void *)"..."; * заменит вхождение reg в str на ... * void *repa[6]; repa[0] = (void *)6; * repa[1] = (void *)"("; repa[2] = 0; repa[3] = (void *)":"; * repa[4] = 0; repa[5] = (void *)")"; * удвоит вхождение reg, разделив копии двоеточием и * поставит скобки в начале и конце * void *repa[6]; repa[0] = (void *)6; * repa[1] = (void *)"("; repa[2] = (void *)1; * repa[3] = (void *)"-"; repa[4] = (void *)2; * repa[5] = (void *)")"; * для reg "<([0-9])><([0-9]:[0-9]+)>" из строки * "abc <1><2:345> zyx" сделает * "abc (1-2:345) zyx" * * Для упрощения конструирования выражений замены можно использовать * макрос INIT_REPLACE и функцию make_replace. Они принимают строку с * выражением замены (replexp) и создают массив void * []. Строки в нем * производятся функцией strdup(), в этом случае для освобождения памяти * можно использовать функцию free_replexp(). * * INIT_REPLACE macro делает локальную переменную с выражением замены (replexp), * используемую в функции regreplace(). * Это выражение должно соответствовать скобочной структуре * regexp, передаваемого в regexec(). * Номера элементов массива match в regexec задаются * скобках в выражении замены. Открывающие скобки в нем * надо экранирвать символом `\`. * * Например: * regexp для regexec "<([0-9])><([0-9]:[0-9]+)>" * и replexp "\\((1)-(2))" заменит строку (в копии) * "=<1><2:22>==<3><4:4>" на "=(1-2:22)==(3-4:4) * * make_replexp создаем массив void * [] в динамической памяти (malloc) * и инициализирует его аналогично INIT_REPLACE * */ // "граница" между адресами строковых констант и номерами match[] regexec() #define PATLIM 4096 // делает локальную переменную (name) с replexp // и локальную переменную (errname) с количеством ошибок разбора replexp #define INIT_REPLACE(name,replexp,errname) \ void *name[_repasize((replexp))]; \ int errname = _init_repa(name,(replexp)) // размер массива для replexp static inline int _repasize (const char *replexp) { int i, n = 0, esc = 0; for (i = 0; replexp[i]; i++) { switch (replexp[i]) { case '\\': esc ^= 1; break; case '(': if (!esc) n++; default: esc = 0; } } return n*2+2; // перед каждой скобкой м.б. текст + хвост + size in a[0] } // заполнение массива replexp, возращает количество ошибок разбора replexp static inline int _init_repa (void *arr[], const char *replexp) { static const char *estr = "\a\bcd\e\fghijklm\nopq\rs\tu\vwxyz"; int i, j = 0, n = 1, esc = 0, err = 0, c; char tmp[strlen(replexp)+1], *ep; tmp[0] = 0; for (i = 0; replexp[i]; i++) { switch (replexp[i]) { case '\\': esc ^= 1; if (!esc) tmp[j++] = '\\'; break; case '(': if (!esc) { if (j) { tmp[j] = 0; arr[n++] = strdup(tmp); } long x = strtol(replexp+i+1,&ep,10); if (ep < replexp+j+2 || *ep != ')' || x > PATLIM-1 || x < 0) { // error err++; x = PATLIM; } arr[n++] = (void *)x; i = ep-replexp; tmp[j = 0] = 0; } else { tmp[j++] = '('; } esc = 0; break; default: c = (u_char)replexp[i]; if (esc) { int c1 = tolower(c); if ('a' <= c1 && c1 <= 'z') c = estr[c1-'a']; } esc = 0; tmp[j++] = c; } } tmp[j] = 0; arr[n++] = strdup(tmp); arr[0] = (void *)(long)n; return err; } // делает динамический массив и заполняет его static void ** make_replexp (const char *rep, int *err) { int sz = _repasize(rep); void **repa = (void **)malloc(sizeof(void *)*sz); *err = _init_repa(repa,rep); return repa; } // удаляет части replexp static void free_replexp (void *repa[]) { int sz, i; sz = (long)repa[0]; for (i = 1; i < sz; i++) { long x = (long)repa[i]; if (x > PATLIM || x < 0) free((void *)x); } } // отладочная печать replexp static void prirepa (void *repa[]) { int sz, i; sz = (long)repa[0]; for (i = 1; i < sz; i++) { long x = (long)repa[i]; if (x > PATLIM || x < 0) puts((char *)x); else printf ("%ld %sin %d\n",x,x==PATLIM? "err ":"", i); } puts(""); } // копирует n байт с адреса src в динамическую строку dst с позиции lvar // szvar переменная с текущим размером dst #define dynstradd(dst,szvar,lvar,src,n) \ ({ char *_src = (char *)(src); int _n = (n), i; \ for (i = 0; i < _n; i++) { \ if (lvar >= szvar) \ dst = (char *)realloc(dst, szvar = (lvar+4096)); \ dst[lvar++] = _src[i]; \ } \ dst[lvar] = 0; dst+lvar; \ }) // копирует байты до нулевого с адреса src в строку dst с позиции lvar // szvar переменная с текущим размером dst #define dynstraddz(dst,szvar,lvar,src) \ ({ char *_src = (char *)(src); \ do { \ if (lvar >= szvar) \ dst = (char *)realloc(dst, szvar = (lvar+4096)); \ } while (dst[lvar++] = *_src++); \ lvar--; \ dst+lvar; \ }) // делает копию str с заменой найденных reg count частей по replexp static char * regreplace (regex_t *reg, const char *str, void *replexp[], u_int count) { regmatch_t match[reg->re_nsub+1]; int rsz = 0, l = 0, nrepl = (long)replexp[0]; char *res = (char *)malloc(rsz = 1024); *res = 0; while (count-- && regexec(reg, str, reg->re_nsub+1, match, 0) == 0) { int so = match[0].rm_so, i; dynstradd(res,rsz,l,str,so); // копируем байты до сопоставления for (i = 1; i < nrepl; i++) { long x = (long)replexp[i]; if (x < 0 || x > PATLIM) // замена константой dynstraddz(res,rsz,l,x); else if (x < reg->re_nsub+1) // копируем сопоставление в str dynstradd(res,rsz,l, str+match[x].rm_so, match[x].rm_eo-match[x].rm_so); } str += match[0].rm_eo; } dynstraddz(res,rsz,l,str); // копируем остаток str return (char *)realloc(res,l+1); } #endif

Incorporate into your program and call. If desired (saving memory, if used in several program files), you can pull out the function code into a separate file, and register their prototypes in regrepl.h .

What is not clear, ask.

I looked at all this a bit more and portrayed regrepl.h #include <stdio.h> #include <stdlib.h> #include "regrepl.h" int main () {INIT_REPLACE (repa, "\ ((1) - (2 )) ", repa_rc);
regcomp (& reg, "<([0-9])> <([0-9]: [0-9] +)>", REG_EXTENDED);
while (puts (">"), fgets (str, sizeof (str), stdin)) {char * result = regreplace (& reg, str, repa, 100);

ReinRaus ReinRaus 16k 3 32 77 · Answer 2 · 2013-01-15T03:18:02

In php format:

 preg_replace("/.<(\d+)><(\d+):(\d+)>/", "($1-$2:$3)", $text);

if before the first corner it is necessary to delete all the text up to the space, and not one character, then instead of a dot you should write [^ ]++

ReinRaus

16k 3 32 77

2
@ReinRaus, and you did not confuse PHP with C ++? - avp
@ReinRaus: and yes, what happens with <<0><><4:77<xaxa>> ? - VladD
@avp regular expressions are universal with some differences in functionality and syntax, depending on the specific language. @VladD based on the data presented by the TS, the expression will be like this, there will be more data, there will be another expression. And the answer to the question: this text will remain unchanged. - ReinRaus
one
@ReinRaus, in C ++ (C), for working with regular expressions, the functions regcomp() and regexec() . At the same time, there are no ready-made functions for replacing text. - avp
Thank you, I have this implemented on php, I need it for c ++, since I don’t feel like it in my heart. I’m working with this regex, there’s a special format of regulars. - Kenpachi

|

String conversion by regular expressions

2 answers 2

More articles: