I will offer my version how I would optimize it. It will seem wild, but if speed is needed, sometimes you have to sacrifice beauty. (further on the code real numbers will be visible - I just found Google a dictionary of English words for 349900 words - for my purposes it is enough).
First you need to prepare the file. (I assume that the file consists of single words). The very first thing is sorting the words in the list. We look at the longest word. Most likely it will not be longer than 16 characters. And now we finish each line with spaces up to this length (in fact, it will be necessary to finish up to 15, or even 14 - we also have line breaks - in any case, the hex editor checks that each word starts from a position that is a multiple of 16).
Why 16? Simply, this number is a power of two, which will have a positive impact on performance. But maybe 32 or 64 will suit you. But 64 is already a lot.
The file will now of course become larger (maybe even two times).
Further loading. In the code, we create an array of characters that need the size (by file size, number of words * 16) and in one step we load the dictionary there. This will be much faster than loading one word at a time. (Loading one word per dict will most likely have logarithmic complexity. And this method is almost linear).
Now we need to solve two problems - search for a word by index and search for a word in a dictionary. The first problem is solved simply. If we know the index, then simply multiply it by 16 and this will be the index in the array of characters. Copy the 16 characters and cut off the final space.
Finding a word in a dictionary is also not difficult. At first glance it may seem that you need to run through the entire list (which is a long time). But I did not write for nothing that the list should be sorted. In this case, you can apply a binary search. But since this is a dictionary, you can use a more cunning way - the index. Such a file can be prepared in advance - run through the entire list of words and write out the line numbers in which the first two letters change. For English, it will be 676 lines of the form
aa 0 ab 82 ac 1216 ad 2904 ... zz 349895
(but there will be only 635, 41 combination is not found). The most popular combinations are ca, re, ma, co un (incremental). for Russian - a little more than 1000 (33 * 33 = 1089, but I don’t remember the words for combinations ъъ or ъъ. But you shouldn’t exclude from the index). The search will be much faster - since we immediately find a small area. (I found out that the largest section is
un - 11,780 words 115 combinations are more than 1000 words.
Next on optimizations. If you look at the statistics, then words that are shorter than 8 characters are quite a few - about 40 percent. Therefore, it is possible to break words into two groups - those that are inserted into 8 characters and those that are inserted into 16 (here and there is also a group of 32, but I have not found such words :)). And in the index will need to write more and group number.
How to implement it in the code - I will leave for independent work.