In Python2, strings of both types ( unicode and str ) were inherited from basestring . This was very useful when checking the types isinstance(_, basestring) . In Python3, the corresponding types of strings ( str and bytes ) are completely independent and have no common ancestor (what’s the logic?). Because of this, you have to write constructions like isinstance(_, (str, bytes)) , which is rather unaesthetic.

On the other hand, there are built-in types-characteristics, like collections.Iterable , which allow to check the value of a certain type of type through isinstance . For example, isinstance([], collections.Iterable) returns True , despite the fact that collections.Iterable missing from the ancestors of the type([]).mro() list.

Question: Is there a similar type of sign for checking all types of strings so that the condition is isinstance('', StringCategoty) and isinstance(b'', StringCategoty) ?

UPD: It is strange to hear that str and bytes supposedly have nothing in common with each other, while they implement an almost identical software interface: out of 44 ( str ) and 38 ( bytes ) their methods 36 are identical in their function and in the form of the call .

 >>> str_methods = {method for method in dir(str) if not method.startswith('_')} >>> bytes_methods = {method for method in dir(bytes) if not method.startswith('_')} >>> shared_methods = str_methods.intersection(bytes_methods) >>> len(str_methods) 44 >>> len(bytes_methods) 38 >>> len(shared_methods) 36 

The whole point of the differences between the two types of strings comes down to the fact that one is the encoded / decoded form of the other. Even their literals are written in an identical way and differ only in the presence of the prefix b . In addition, when working with different I / O libraries, we constantly have to deal with both types.

  • 2
    It hardly exists, because in the third python it is not necessary, because str and bytes are not interchangeable - andreymal
  • Checking isinstance ([], collections.Iterable) gives you everything correctly, since in this case it checks for the presence of methods specific to the object being iterated, such as __next__ and __iter__ . The logic is that in one case you have a string for presenting textual information, and in another case a sequence of bytes. - Avernial
  • They do not have enough common functionality to have a common base class. - Avernial
  • @andreymal: @Avernial: Is there much in common between dict and list ? Only that both of them are some kind of containers. To classify them on this basis, there are collections.Iterable . And str and bytes are essentially a primitive "string" in encoded and decoded versions and have an almost identical API (I added this to the question). In practice, it is more often necessary to identify such primitives than to look for an indefinite kind of containers. Yes, and the principles of OOP require that the common interface be imposed in the form of an abstract class. - cridnirk
  • @cridnirk are there many similarities between str and bytes ? Only that both of them are some kind of containers. One stores characters, the other stores bytes. One in the access by index and in the loop produces lines of one character length, the other - numbers (yes, not even bytes , but int ). One can only be encoded, the other can only be decoded. They cannot be stacked with each other. Similarity is only external; technically, they have (almost) nothing in common and should not have any abstract basestring interfaces there. - andreymal

1 answer 1

Briefly : instead of the basestring , in Python 3, use str or os.PathLike or ( bytes, str) as appropriate, or try EAFP approach instead of isinstance.

Argument is text

If you work with text in Python, then use Unicode.

In Python 2, the text could be intermixed as bytes as well as Unicode (not surprisingly, the interface is similar). The bytes representing the text encoded using any encoding should be decoded into Unicode before being used further in the program, otherwise it is easy to get cracks (the data is corrupted, but the program does not know about it) .

In Python 3, simply use isinstance(text, str) where in this case in Python 2 would be needed isinstance(text, basestring) .

The argument is a file name.

Some data can almost always be text, but it is formal to allow non-text values. For example, often file names can be encoded using sys.getfilesystemencoding() encoding, but this is not always possible — on * nix, the file name can be any byte porridge that does not contain the slash '/' and zero '\0 '. Read more How to work with paths with Russian characters?

In this case, path functions could accept both Unicode and bytes and therefore use basestring in Python 2, in cases where the isinstance is justified. In Python 3, you can use os.PathLike if available, which allows you to accept not only bytes, str, but also pathlib.Path , os.DirEntry ( os.scandir() ), etc., to represent file names. More likely, os.fspath(path) is called in the EAFP style, instead of isinstance(path, os.PathLike) (LBYL style). Example :

 def abspath(path): """Return an absolute path.""" path = os.fspath(path) if not isabs(path): if isinstance(path, bytes): cwd = os.getcwdb() else: cwd = os.getcwd() path = join(cwd, path) return normpath(path) 

abspath() accepts any os.PathLike object. It uses the OS API, which only str , bytes understands ( os.fspath() guaranteed only str , bytes returns).

(str, bytes)

In practice, sometimes you have to accept both bytes and text. For example, subprocess.Popen accepts command line arguments like bytes and str (which the OS API provides). In Python, there is no ABC that would represent strings received from the system (command line arguments, environment variables, paths), so instead of os.SystemString , Popen() uses (str, bytes) when isinstance checking is needed.

The modern interface would only accept strings here (str) and would os.fsencode() in cases where the OS requires bytes. sys.argv , os.environ is a Python str collection ( surrogateescape can be used). A relatively recent addition to the standard library pathlib.Path accepts only strings.


From the comment below:

... the application of isinstance for the classification of values ​​is advisable in the case where the logic of the function depends on the type of argument adopted. An example would be the isinstance itself, which (in statically typed PL) would have to take in the second argument only an enumerated type, but (in Python) for ease of use allows the transfer of a value of a separate type instead of a tuple / list from a single element.

Popen() is just such an example: it takes a command ( args ) as a string or as a list / collection of strings. Direct quote from the source code :

  if isinstance(args, (str, bytes)): args = [args] else: args = list(args) 

Another example is the fileinput.input() function, which accepts either a file or a list / collection of files. From its source code :

  if isinstance(files, str): files = (files,) else: if files is None: files = sys.argv[1:] if not files: files = ('-',) else: files = tuple(files) 

In this case, fileinput not updated to use os.PathLike , but simply uses str (where Python 2.7 would use basestring ) —clients of the code are forced to os.fspath() themselves when necessary.

  • Thank you for the detailed response. Only I did not consider the use of isinstance immediately before using the tested value (LBYL approach) - this is unnecessarily and non-idiomatic from the point of view of Python. It was about using isinstance to classify values ​​separately from their subsequent use in another code. For example, to display adequate error messages, validation of values ​​is required at the point where they are received, when a sudden release of exceptions from the depth of the library due to invalid data received at another point does not provide any information to localize the problem. - cridnirk
  • Also, the use of isinstance for the classification of values ​​is advisable in the case where the logic of the function depends on the type of argument adopted. An example would be the isinstance itself, which (in statically typed PL) would have to take in the second argument only an enumerated type, but (in Python) for ease of use allows the transfer of a value of a separate type instead of a tuple / list from a single element. - cridnirk
  • @cridnirk EAFP is just one of the options that should be mentioned. What I did in two sentences, the rest is about isinstance. This would be a noticeable omission otherwise. Answers to Stack Overflow are not only for you personally, but also for people with a similar question as it is formulated. - jfs