Python strings come with a host of handy built-in methods. Let us begin by taking a look at some of these methods for Strings using the dir() function; in the following list, the ones without the leading/trailing double underscores are callable methods.
[user@codingbison]$ python3 Python 3.2.3 (default, Jun 8 2012, 05:40:06) [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> >>> varStr = "" >>> >>> dir(varStr) ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'] >>>
For the sake of readability, we categorize above methods into four sets: search-related methods, format-related methods, split/join method, and string-type verification methods.
In this section we focus on the following search methods: find()/rfind(), index()/rindex(), startswith()/endswith(), and count().
Method find(str, i1, i2) returns the first occurrence of the substring "str" in a given string, starting from index i1 and ending at index i2; arguments i1 and i2 are optional -- if we do not specify them, then find() returns the first occurrence of the substring. Reverse find (rfind()), does the same thing but in an opposite manner. If the substring is not found, then both methods return -1.
>>> varString = "Polar bears are sometimes called sea bears" >>> print(len(varString)) 42 >>> >>> varString.find("bear") 6 >>> varString.find("bear", 7, 40) -1 >>> >>> varString.rfind("bear") 37 >>> varString.rfind("bear", 0, 11) 6 >>> >>> varString.find("black bear") -1 >>>
There is a boundary condition for these methods. If the substring starts at i1 but it lies (spreads) beyond i2, even then these methods will return -1.
>>> varString = "Polar bears are sometimes called sea bears" >>> >>> varString.find("bear", 5, 8) -1 >>> >>> varString.find("bear", 5, 10) 6 >>>
Methods index() and rindex() behave similar to that of find()/rfind() methods. However, if the passed substring does not exist, then index()/rindex() methods return a ValueError exception instead of -1. Therefore, we need to have a try/catch clause to handle it.
>>> varString = "Polar bears are sometimes called sea bears" >>> >>> varString.index("bear") 6 >>> varString.index("bear", 7, 41) 37 >>> varString.rindex("bear") 37 >>> varString.rindex("bear", 0, 11) 6 >>> >>> varString = "Polar bears are sometimes called sea bears" >>> >>> varString.index("black bear") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: substring not found >>> >>> try: ... varString.index("black bear") ... except ValueError: ... print("Could not find black bear") ... Could not find black bear >>>
Methods startswith() returns True if a given string starts with the passed substring. Likewise, method endswith() returns True if the given string ends with the passed substring. Like the earlier methods (find(), index() etc), these two methods also take two optional arguments. Thus, startswith(str, i1, i2) returns True if the substring "str" starts at index i1 and ends before index i2.
>>> varString = "Polar bears are sometimes called sea bears" >>> >>> varString.startswith("Polar bear") True >>> varString.startswith("Black bear") False >>> >>> varString.endswith("sea bears") True >>> varString.endswith("sea bear") False >>> >>> varString.startswith("bear", 6) True >>> varString.startswith("bear", 6, 12) True >>>
Lastly, method count() returns the number of occurrences of a substring in a given string.
>>> varString = "Polar bears are sometimes called sea bears" >>> >>> print(varString.count("bear")) 2 >>> print(varString.count("a")) 6 >>> print(varString.count("fox")) 0 >>>
Python strings provide various methods for formatting: ljust(), rjust(), zfill(), upper(), lower(), capitalize(), strip(), lstrip(), and rstrip().
Methods, ljust() and rjust() provide left and right justifications respectively. In the following example, we use ljust() and rjust() methods for variable, varStr. With these methods, a new string is returned and the length of the new string equals the padding value passed. If the length of the padding passed is less than that of the string, then the length of the new string is same as that of the original string.
>>> varStr = "Polar bear" >>> print(len(varStr)) 10 >>> retVal = varStr.ljust(20) >>> print(retVal) Polar bear >>> print(len(retVal)) 20 >>> retVal = varStr.rjust(20) >>> print(retVal) Polar bear >>> print(len(retVal)) 20 >>>
Method zfill(n) is used to pad empty spaces with zeros. After padding, the length of the new string becomes n. However, if n is less than or equal to the earlier string, then the length of the new string is same as that of the earlier string.
>>> varStr = "Polar bear" >>> print(len(varStr)) 10 >>> retVal = varStr.zfill(20) >>> print(retVal) 0000000000Polar bear >>> print(len(retVal)) 20 >>> retVal = varStr.zfill(15) >>> print(retVal) 00000Polar bear >>> print(len(retVal)) 15 >>>
Methods, upper() and lower() convert each characters of a given string to upper case and lower case respectively. Next, method capitalize() makes the first character of the string upper case. These methods return a new string and the original string remains unchanged.
>>> varStr = "poLar BeAr" >>> >>> print(varStr.lower()) polar bear >>> >>> print(varStr) poLar BeAr >>> >>> print(varStr.upper()) POLAR BEAR >>> >>> print(varStr.capitalize()) Polar bear >>>
Methods strip(), lstrip(), and rstrip() remove white spaces from a string. Method strip() removes leading and trailing white spaces. Method lstrip() removes leading white spaces and rstrip() removes trailing white spaces. In the example below, the varStr string has 6 white spaces both at the front and at the end of the text "Polar bear". These methods also return a new string but the original string remains unchanged.
>>> varStr = " Polar bear " >>> print(len(varStr)) 22 >>> print(varStr) Polar bear >>> >>> retStr = varStr.strip() >>> print(len(retStr)) 10 >>> print(retStr) Polar bear >>> print(varStr) Polar bear >>> >>> retStr = varStr.lstrip() >>> print(len(retStr)) 16 >>> print(retStr) Polar bear >>> >>> retStr = varStr.rstrip() >>> print(len(retStr)) 16 >>> print(retStr) Polar bear >>>
Sometimes, we need to split a string into individual tokens. For this, we can use the split() method of strings. This method takes a string argument and split it based on this delimiter; by default, split() splits the string using white-space as the delimiter. This method returns the split strings as a list of tokens.
>>> varString = "Polar bear, Brown bear, Panda bear, Grizzly bear" >>> >>> retTokens = varString.split() >>> >>> print(type(retTokens)) <class 'list'> >>> >>> print(retTokens) ['Polar', 'bear,', 'Brown', 'bear,', 'Panda', 'bear,', 'Grizzly', 'bear'] >>> >>> print(varString.split(", ")) ['Polar bear', 'Brown bear', 'Panda bear', 'Grizzly bear'] >>>
The join() method does the opposite of the split() method -- it takes a sequence (list or tuple) as an input and adds them to form a new string. This method also takes a delimiter that gets added when concatenating individual elements of the sequence. In the example provided below, we use both comma and a white-space as delimiters.
>>> varStrComma = ", " >>> varList = ["Polar bear", " Brown bear"] >>> >>> varStrNew = varStrComma.join(varList) >>> print(varStrNew) Polar bear, Brown bear >>> >>> varStrSpace = " " >>> varTuple = ("Grizzly bear", " Panda bear") >>> >>> varStrNew = varStrSpace.join(varTuple) >>> print(varStrNew) Grizzly bear Panda bear >>>
Method isdigit() checks if a string consists of all numeric characters ('0' to '9'). It returns True if all characters are digits, else, it returns False.
>>> varStrNum = "101" >>> print(varStrNum.isdigit()) True >>> >>> varStrAlphaNum = "2 Polar bear cubs" >>> print(varStrAlphaNum.isdigit()) False >>>
Method isalpha() is used to test if all characters of a string are alphabetic or not. If so, it returns True, else, False. Numeric and white-space character are not counted as alphabetic characters. Thus, isalpha() would return False for strings like "Polar bear" or "2 Polar bear cubs".
>>> varStrAlpha = "Polarbear" >>> print(varStrAlpha.isalpha()) True >>> >>> varStrNum = "101" >>> print(varStrNum.isalpha()) False >>>
Method isalnum() returns True if all the string characters are either alphabets or numbers. Presence of other characters (like white-space character) is not counted in the isalnum(). Thus, isalnum() would return False with strings like "2 Polar bears cubs" or "2Polarbearcubs#".
>>> varStrAlphaNum = "2Polarbearcubs" >>> print(varStrAlphaNum.isalnum()) True >>> >>> varStrAlphaNum = "2 Polar bear cubs" >>> print(varStrAlphaNum.isalnum()) False >>>