Hello, there are files with numeric data. The number of lines in each file is different.

Approximate structure.

2015 3 1 0 7 20 796.00 27 1

2015 3 1 0 7 20 796.00 27 1

2015 3 1 0 7 20 796.00 27 1

It is necessary to read the data files and add to the array for calculations. The problem is that initially it is not known how many rows will consist of a large array. Therefore implemented a double arraylist.

ArrayList<ArrayList<Double>> massivData = new ArrayList<ArrayList<Double>>(); 

This approach works effectively only for an array of small length. However, when I need to read for example 100 files (each of which has an average of 50,000 lines), java simply overloads the RAM. If working with a static array, the operation speed is quite acceptable.

I do not know how best to implement the integration of data from all files into one array. As an option, I think when reading files to create a conditional file mergerfile, which will be overwritten data from each file. An array of the same form of mergerfile.

What do you advise?

  • bring your code, where you read and add to the sheet .. you do not store the read data in a variable? if so, this is an additional expense. - Senior Pomidor
  • ArrayList has an internal length, above which the list will begin to reassemble, selecting inside the new array and copying all the data. By specifying the correct capacity when creating an ArrayList, you will already save a lot of time and memory. - etki
  • I agree with the commentator above - it may be better to first determine the length of the array (W). Here, apparently, there is a good way to do this at stackoverflow.com/questions/453018/… - abbath0767

5 answers 5

The answer to this question varies depending on what should be done with the resulting arrays. Specify which calculations are assumed over the data. Perhaps you should not read all the lines from the files, but rather processing them in portions. For example, in the simplest case, you can calculate the arithmetic average by summing the values ​​from each row and increasing the row count (if there are a lot of values, you can store in memory intermediate summation results to avoid overflow). If you still need to get all the data in the form of an array, you can first run through the files and count the number of rows, and starting from this information to prepare an array. Here they told how to quickly calculate the lines.

  • why prepare an array? sheets have such limitations that it does not reach with its task. - Senior Pomidor
  • Cook the value of the size of the array to create it. In any case, both with an array and with a sheet, we rest on the size limit. I think it would be more expedient to shift all the work on calculations to the shoulders of the DBMS. Again, it all depends on what you need to calculate. - Grilled Nail
  1. ArrayList<Double[]> massivData = new ArrayList<Double[]>(); Saves memory, an array instead of an ArrayList object. You read a string, hit an array of strings, you know the number of elements, and you create a Double array. Convert from strings to double
  2. There are libraries to save memory collections: FastUtils and Koloboke.
    1. Any collections of primitive types (for example, numbers) are 4-5 times faster released in libraries like Trove (TDoubleArrayList, see, for example, https://habrahabr.ru/post/187234/ ) or Koloboke (new replacement Trove) mentioned above and FastUtils. It's all about converting from Double objects to double values ​​and back (Autoboxing / Autounboxing).

    2. If you pre-allocate memory for 50000 * 100 items

       ArrayList<Double> massivData = new ArrayList<Double>(5000000*8); 

      everything will be much faster - we save on re-allocating memory as the list grows (already noted above: Using arraylist with large amounts of data )

    3. ArrayList<double[]> will be faster ArrayList<Double[]> and, all the more quickly, ArrayList<ArrayList<Double>>() . The double[] array is already a reference type, so it is supported by collections, changing the number of elements in a string and storing null is most likely not required.
    4. Reading all files into one list is much faster than merging. If you need to quickly merge, you will have to use homemade or third-party linked lists. Standard LinkedList<double[]> glued together for a long time, ArrayList too.

      The initial question is not completely clear. in Java, an ArrayList is limited only by the amount of physical memory.

        List<Double> data = new ArrayList<>(); data.add(1d); data.add(2d); .... data.add(99999999d); 

      Therefore, I propose: 1) not to read all the information from all files at once - it can be expensive for resources 2) We read the data, checked the data, saved it in the resulting structure. Repeat the procedure with the following file.

      • It was meant that arrayList is less efficient than a regular static array. For example, the code clearly shows: - andrei piletsky
      • For example: it took 1919 ms to fill an arraylist of 200,000 rows and 25 columns. to fill the same static array 32. Also, the load of RAM is too different in these cases. I think to do intermediate calculations let every 10-20 files. then from these files already form an array - andrei piletsky
      • All the same, I would really like to see an example of your code. - user1169483

      Try using LinkedList , it has no limit, like ArrayList (Integer.MAX_VALUE - maximum value).

      https://stackoverflow.com/questions/3767979/how-many-data-a-list-can-hold-at-the-maximum

      • Try to write more detailed answers. Explain what is the basis of your statement? - Nicolas Chabanovsky
      • close to two billions hasn’t gotten here yet - etki