Tell me how to correctly make changes to the program so that you can work with a large amount of information. And now with large volumes I encounter memory overflow and crash programs.

The essence is as follows:

There are files in the directory that contain the text string ID and the string itself. In all files, the ІD lines are unique and increase monotonously as new records are added to the file. I need to open all the files, sort the lines as the ID grows, and write all the data into one file.

My program code:

#define MESSAGE_LENGTH 10000 struct list { unsigned long id; char data[MESSAGE_LENGTH]; }; int openFileAndAddToList(struct list items[], long numberOfLines); long numberOfEntries(); //от этой функции я потом избавлюсь void writeToFile(struct list items[],long size); void sorting(struct list items[], long l,long u); long partArr(struct list items[],long l,long u); int main(int argc, const char * argv[]) { long listSize; listSize = numberOfEntries(); //calculates size of list array struct list items[listSize]; openFileAndAddToList(items, listSize); printf("Merging...\n"); sorting(items,0 , listSize-1); writeToFile(items,listSize); printf("Done!\n"); printf("___________________________________________________________\n"); printf("All data written in \"all.log\" file at program root folder.\n"); printf("___________________________________________________________\n"); return 0; } long numberOfEntries() { DIR* FD; struct dirent* oneFile; FILE* sourceFile; char buffer[BUFSIZ]; long numberOfLines; FD = opendir("./callLog"); if (FD != NULL){ while ((oneFile = readdir(FD))){ if (!strcmp (oneFile->d_name, ".")) continue; if (!strcmp (oneFile->d_name, "..")) continue; if (!strcmp (oneFile->d_name, ".DS_Store")) continue; char fullFileName [255]; strcpy(fullFileName,"./callLog/"); strcat(fullFileName, oneFile->d_name); sourceFile = fopen(fullFileName, "rw"); if (sourceFile == NULL) { fprintf(stderr, "Error : Can`t open source file - %s\n", strerror(errno)); return 1; } while (fgets(buffer, BUFSIZ, sourceFile) != NULL) { numberOfLines++; } } } fclose(sourceFile); return numberOfLines; } int openFileAndAddToList(struct list items[], long numberOfLines) { DIR* FD; struct dirent* oneFile; FILE *sourceFile; char buffer[BUFSIZ]; int k = 0; FD = opendir("./callLog"); if (FD != NULL){ while ((oneFile = readdir(FD))){ if (!strcmp (oneFile->d_name, ".")) continue; if (!strcmp (oneFile->d_name, "..")) continue; if (!strcmp (oneFile->d_name, ".DS_Store")) continue; char fullFileName [255]; strcpy(fullFileName,"./callLog/"); strcat(fullFileName, oneFile->d_name); printf("Opening file: %s \n",fullFileName); sourceFile = fopen(fullFileName, "rw"); if (sourceFile == NULL) { fprintf(stderr, "Error : Can`t open source file - %s\n", strerror(errno)); return 1; } while (fgets(buffer, BUFSIZ, sourceFile) != NULL) { sscanf(buffer, "%lu",&items[k].id); size_t numbers_end = strspn(buffer, "1234567890. \t"); strcpy(items[k].data, buffer+numbers_end); k++; } fclose(sourceFile); } } return 0; } void writeToFile(struct list items[],long size) { FILE *sourceFile; sourceFile = fopen("all.log", "w"); if (sourceFile == NULL) { fprintf(stderr, "Error : Failed to open destination file - %s\n", strerror(errno)); } for (int i=0; i<size; i++) { fprintf(sourceFile,"%lu %s", items[i].id, items[i].data); } fclose(sourceFile); } 

For sorting I use the quick sort algorithm. Please give a kick in the right direction.

    1 answer 1

    The easiest way is to write all the data into one file (with the ID in the first field), and then sort with the sort command.

    As for the program, firstly, you can save memory by changing the structure to

     struct list { unsigned long id; char *data; }; 

    those. store in a sortable array pointers to the lines that need to be placed in dynamic memory. Then the text of each line will not occupy MESSAGE_LENGTH bytes, but as much as you actually read (plus a small addition).

    After that, you also need to replace in the openFileAndAddToList() function

     strcpy(items[k].data, buffer+numbers_end); 

    on

     items[k].data = strdup(buffer+numbers_end); 

    Second, you place your huge (?) Array in main()

     struct list items[listSize]; 

    in the local memory of the function (i.e. in the stack), which is usually limited to a few megabytes (typically 2 or 8 MB, generally depends on the system settings).

    But the array can be placed in dynamic memory, which is usually much larger (now typically gigabytes).

    Carefully looked at the code, but it seems to me that you can

     struct list *items = malloc(sizeof(struct list) * listSize); 

    IMHO both improvements will solve this puzzle.

    • I saw my mistakes. Yes, I announced the entire array on the stack ... Which is probably the main error in the program. - Yura Halych