Search in the lines of different length

Question

Hello!

There is a file with lines in which you need to search for a line by the value entered by the user. The contents of the file of this type:

45 123 0000 77788 77789 999900

That is, all the lines of different lengths. The user enters a number that is larger than the number in the database. That is, for example, if he enters 450, 45000123 or 45777, then the first line should be found, if he enters 77788013 or 7778888, then the fourth and so on.

I tried to use the find () function, but it searches only for the exact value. Difflib also did not help me.

There is an idea to sort through each character string and user input, where, by the maximum of matches, I find out the length of the field in the database, then limit the user input with this length and use the find () function, but I don’t know how to implement it.

More examples:

 3766 376690 3767 3768 3769 971 9712 971200 9712234 97124 97125 97126 971287 971288

The person enters the number "3766823013" - is "3766".
Introduces "3766900124" - is located "376690". Enters "9712473023" - the string "97124" is located.
Enters the "97122348230" is located - "9712234".

The file is large - 150 MB of such lines.

The answer was correct:

 filestr.find(userstr)==0 or userstr.find(filestr)==0

It works absolutely, as it should - do not take away, nor add.

PS Me, too, Ivanov Yura is called - that's a coincidence, right? :)

Comrade @ReinRaus , I’m not misleading anyone. The find () method really looks for only an exact match (at least it works for me that way), but once I replaced it with filestr.find (userstr) == 0 or userstr.find (filestr) == 0 and as if by magic It all worked as it should. Tell me, where have I confused you?

@ @ReinRaus , so you lay out the version of the program (both)? Exactly that I rechecked everything 2 times: just find () doesn't work

@ReinRaus , and then I’ll post the BOTH version of the program (for doubters). http://dpaste.com/726715/ - with just find ()

http://dpaste.com/726716/ - with filestr.find (userstr) == 0 or userstr.find (filestr) == 0

http://dpaste.com/726719/ - a piece of db.txt (the file itself weighs 156 MB)

And no one accidentally tells you how to optimize the search for such a database? For 156 mb looking for seconds 20 somewhere in me.

Read the string from the file and regexp compare with the user entered?
does the beginning of the second line coincide with the first?
If the user enters 77790, then the fourth, if 777905, then also the fourth.
Yes, lines can be 77789 and 777891 - must be searched before the last matching character.
If you need to be very specific, I would say that this is a telephone database of prefixes of mobile operators.
The user enters a phone number (which is larger than any prefix) and finds the string of this prefix.
The prefix and phone lengths are completely different (prefix 2-6 characters, phone number 7-14 characters)
It is necessary to find those lines that coincide with the beginning of the test and choose the longest of them.
res = '' for row in rows: if row == arg [: len (row)] and len (row)> len (res): res = row If you have a database - some sql, then the query write is very simple.

Yura Ivanov Yura Ivanov 25.2k one 22 52 · Accepted Answer · 2012-04-03T11:14:27

Search file by line and look for one line in another and vice versa.

 (filestr in userstr) or (userstr in filestr)

Or via find, then at the beginning of the line you can compare it like this:

 filestr.find(userstr)==0 or userstr.find(filestr)==0

Yura Ivanov

25.2k one 22 52

one
@avp The answer to which I add a comment is not the same, but it does not go there anymore. What 150meg, you what. If you look, @ Killer1999 wrote: mobile phone number prefixes. Is it possible for each person to have his prefix? - alexlz
@alexlz, TS at the end of the question added:> File is large - 150 MB of such lines. If the file is really 150 mb, then yes, you need to implement a more optimal algorithm. Or binary search or at least segment the file, for example, by the first two digits. - Ilya Pirogov
That's what I looked through. Also @avp wrote that the file could not be like this. But then the desire to mess with the txt file is unclear. Well, if not mysql / sqlite, then maybe some berkley db ... But here you have to look at the TS, how much work, the desired response time, etc. - alexlz
@alexlz, in fact, working with files with the right (for a limited number of tasks) data organization is always more efficient than with a DBMS. Only program db. well written. Regarding the 150M, I think the author actually got excited (or he does not have phone prefixes). - avp
@avp "correct", "limited circle", "well written". Somehow, these arguments remind me of a fairy tale about the high efficiency of programs written on asm. Even if a miracle result is achieved, then life changes, and programs too. And when making adjustments to something carefully licked by hand, the risk of losing efficiency (despite the high cost of work) is indistinguishable from 100% (with the naked eye). It will be necessary for him to make more operational adjustments (a large file), or even add an attribute to the lines or something like that - and the whole slender structure will cause only a bunch of mat. - alexlz

|

Ilya Pirogov Ilya Pirogov 10.4k one 14 25 · Answer 2 · 2012-04-03T11:39:30

If I understood the question correctly, then the algorithm is approximately as follows:

 arg = '45000123' with open('file.dat') as fp: for row in fp: row = row.rstrip() if row == arg[:len(row)]: print(row)

Search in the lines of different length

2 answers 2

More articles: