Good day to all! I have the need to display large text files in the application. The text should be displayed without scrolling, paging as in e-books. I can break a long text into pages, but it takes me too long. For example - the following code processes 1.4 MB of text for about 10-15 seconds.

public void split(TextPaint textPaint, String filepath,Context context) { int pages = 0; File file = new File(filepath); char[] bufferChar = new char[1024]; //Чтобы не возникло out of memory, будем считывать файл небольшими кусками String uncompletedtext=""; //Определяем максимальное количество линий на страницу int maxLinesOnpage = 0; StaticLayout staticLayout = new StaticLayout( context.getString(R.string.lorem_ipsum), textPaint, pageWidth, Layout.Alignment.ALIGN_NORMAL, lineSpacingMultiplier, lineSpacingExtra, false ); int startLineTop = staticLayout.getLineTop(0); int endLine = staticLayout.getLineForVertical(startLineTop + pageHeight); int endLineBottom = staticLayout.getLineBottom(endLine); if (endLineBottom > startLineTop + pageHeight) { maxLinesOnpage = endLine - 1; } else { maxLinesOnpage = endLine; } // Поехала пагинация try { BufferedReader buffer = new BufferedReader(new FileReader(file)); while (buffer.read(bufferChar)>=0) { uncompletedtext += new String(bufferChar); boolean allcomplete = false; staticLayout = new StaticLayout( uncompletedtext, textPaint, pageWidth, Layout.Alignment.ALIGN_NORMAL, lineSpacingMultiplier, lineSpacingExtra, false ); staticLayout.getLineCount(); int curTextPages= (int) Math.floor(staticLayout.getLineCount() / maxLinesOnpage); uncompletedtext=uncompletedtext.substring(staticLayout.getLineEnd(curTextPages)); pages+=curTextPages; Log.e("PAGES","" + pages); } } catch (Exception e) { e.printStackTrace(); } Log.e("FILE READED FULLY!!", "READ COMPLETE!!!!!!!!!!!!!!!!"); } 

It is too long. I cannot understand how applications such as FBReader and CoolReader work with large files (more than 9 MB) instantly. I saw the sources of the applications, but these sources are really very large in order to understand such a small problem. I really need help and advice. Thank you.

  • Maybe you should not split the entire file at once? It is enough to get the next 2 pages (well, or 10 pages to support scrolling "in large steps" several pages at a time), and at any time you change the page to check how many broken pages are left, and finish to the required number. Another option - after splitting the first 10 pages, allow the user to work with them, as if everything is ready, and continue to break the rest in the background thread. - Pavel Krizhanovskiy
  • I already thought about a similar solution (about the background calculation of pages). In principle, it is suitable, as an extreme option. But this option has one bad side - while the message is not counted, the text will not even know how many pages there are in the book. It amazes me how FBReader does it. There's just some kind of magic there. Nevertheless, if from the first page to jump to some thousandth, the FB still slows down and creaks. But how do they still manage to even calculate the total number of pages so quickly? - psinetron
  • The number of characters in the file divided by the average number of characters on 1 page = approximate number of pages. In addition, you can try out the size: if the page fits 200 bytes of text, the file in kilobytes will contain 5 pages. We’ll get the size of the page almost immediately after the splitting begins, and the file size is before the beginning, and the counting will be instant. IMHO, I believe that this is not an “extreme option”, but the right decision. A very large file, even without splitting, will simply be read for too long, and you need a background thread. - Pavel Krizhanovskiy
  • Opened the first available book in txt format. There are quite a few places in the book where the hyphen can be used as a string. Large numbers of pages are thus obtained. I was offered to try to independently break the text into lines, while working with the file not as with text, but as with a set of bytes. They say that if the cursor runs on bytes, it will turn out very quickly. What do you think about this? - psinetron
  • Not a bad suggestion, but unfortunately I have no practical experience with such an implementation. I can only say with respect to "if the cursor runs bytes, it will turn out very quickly": if the bottleneck is reading the file, then yes, the whole process will be faster. But if the bottleneck is a partitioning cycle, it will not be faster. - Pavel Krizhanovskiy

2 answers 2

Parallel tasks such as

  • "user work with already read data"
  • "read file"
  • "page breaks"
  • "listening to incoming notifications about something"

must be executed in parallel threads: why should the user wait for the loading of 500 pages to read only 5? Therefore, the correct solution is to take out the splitting into a separate stream. To instantly count the number of pages, you need to divide the file size into an approximate number of bytes on a page rounded up.

    You can achieve "instant" counting of pages and display the desired page without waiting for the parsing of the entire text and not the parallelization process. Here is the code that works fine in one thread and processes 10 MB of text in about half a second:

     public void split(TextPaint textPaint, String filepath,Context context) { File file = new File(filepath); char[] bufferChar = new char[512]; //How lines on page int maxLinesOnpage = 0; int symbolsOnLine = 0; StaticLayout staticLayout = new StaticLayout( context.getString(R.string.lorem_ipsum),//short text with 100 lines (\r\n\r\n\r\n\r\n\r\n\r\n) textPaint, //MONOSPACE!!! pageWidth, Layout.Alignment.ALIGN_NORMAL, lineSpacingMultiplier, lineSpacingExtra, false ); int startLineTop = staticLayout.getLineTop(0); int endLine = staticLayout.getLineForVertical(startLineTop + pageHeight); int endLineBottom = staticLayout.getLineBottom(endLine); if (endLineBottom > startLineTop + pageHeight) { maxLinesOnpage = endLine - 1; } else { maxLinesOnpage = endLine; } symbolsOnLine = staticLayout.getLineEnd(0); try { RandomAccessFile rac = new RandomAccessFile(file, "r"); byte[] buffer = new byte[2048]; int wordLen = 0; //Length of word in symbols int wordInBytes = 0; //Lenght of word int startLinePos = 0; //Start first line position int lineWidth = 0; //Current line length int totalLines =0; //Total lines on current page Log.e("Start pagination", "" + totalLines); long timeout= System.currentTimeMillis(); int buflen=0; //Размер буффера int totalReadedBytes = 0; //Total bytes readed byte skipBytes = 0; while ( (buflen=rac.read(buffer))!=-1){ for (int i=0;i<buflen;i++) { totalReadedBytes++; wordInBytes++; if (skipBytes==0){ //Bytes on one symbol if (unsignedToBytes(buffer[i])>=192){skipBytes=2;} if (unsignedToBytes(buffer[i])>=224){skipBytes=3;} if (unsignedToBytes(buffer[i])>=240){skipBytes=4;} if (unsignedToBytes(buffer[i])>=248){skipBytes=5;} if (unsignedToBytes(buffer[i])>=252){skipBytes=6;} } //Full bytes on symbol or not if (skipBytes>0){ skipBytes--; if (skipBytes>0){continue;} } if (buffer[i] == 13) {//We have a \r symbol. Ignore. continue; } if (buffer[i]==10){//New line symbol if (lineWidth + wordLen>symbolsOnLine){ totalLines++; if (totalLines > maxLinesOnpage) { int[] pgsbytes = {startLinePos, totalReadedBytes}; pages.add(pgsbytes); startLinePos = totalReadedBytes ; totalLines = 0; } } wordLen=0; wordInBytes=0; totalLines++; lineWidth=0; if (totalLines>maxLinesOnpage){ int[] pgsbytes = {startLinePos, totalReadedBytes-1}; pages.add(pgsbytes); startLinePos = totalReadedBytes-1; totalLines=0; } } if (buffer[i]==32){//Space symbol if (lineWidth + wordLen+1<=symbolsOnLine){//Word fits in line lineWidth+=wordLen + 1; wordLen=0; if (lineWidth==symbolsOnLine){ totalLines++; if (totalLines > maxLinesOnpage) { int[] pgsbytes = {startLinePos, totalReadedBytes}; pages.add(pgsbytes); startLinePos = totalReadedBytes ; totalLines = 0; } lineWidth = 0; wordLen = 0; wordInBytes=0; } } else { if (lineWidth + wordLen==symbolsOnLine){ totalLines++; if (totalLines > maxLinesOnpage) { int[] pgsbytes = {startLinePos, totalReadedBytes}; pages.add(pgsbytes); startLinePos = totalReadedBytes ; totalLines = 0; } lineWidth = 0; wordLen = 0; wordInBytes=0; } else { totalLines++; if (totalLines > maxLinesOnpage) { int[] pgsbytes = {startLinePos, totalReadedBytes - 1 - wordInBytes}; pages.add(pgsbytes); startLinePos = totalReadedBytes - 1; totalLines = 0; } lineWidth = wordLen + 1; wordLen = 0; wordInBytes=0; } } } if (buffer[i]!=32&&buffer[i]!=10&&buffer[i]!=13){wordLen++; } if (wordLen==symbolsOnLine){ totalLines++; if (totalLines>maxLinesOnpage){ int[] pgsbytes = {startLinePos, totalReadedBytes-1 - wordInBytes}; pages.add(pgsbytes); startLinePos = totalReadedBytes-1; totalLines=0; } lineWidth=0; wordLen=0; wordInBytes=0; } } } rac.close(); timeout = System.currentTimeMillis() - timeout; Log.e("TOTAL Time", " time " + timeout + "ms"); } catch (Exception e) { e.printStackTrace(); } Log.e("FILE READED FULLY!!", "READ COMPLETE!"); } 

    There are still minor issues that need to be resolved, but still it works. Well, the text of the desired page can be obtained for example with this method:

     RandomAccessFile rac = new RandomAccessFile(file, "r"); byte[] buffer = new byte[pages.get(pagenum)[1] - pages.get(pagenum)[0]]; rac.seek(pages.get(pagenum)[0]); rac.read(buffer); rac.close(); return new String(buffer); 

    I hope this decision will help all those in need!

    • The display font must be monospace, otherwise it is necessary to determine the size of each letter somewhere - psinetron