CodingBison

Python lets us access files, read their content, and update their content. For these file-related tasks, Python provides a new type: file.

Prior to doing any operation (read or update its content) on a file, we must open the file using the open() function. The open() function returns a file descriptor (fd) handle -- all subsequent file operations are done using this fd.

The open() function takes two arguments: name of the file to be opened and a mode for accessing the file. The mode can be read("r"), write("w"), or append ("a"); we can also specify a combination of these modes. When we specify read mode, then we can read the content from the file. If we do not provide read mode then we cannot read content of a file. Likewise, we need to specify the write (or append) mode to write (or append) to a file. If we specify neither write nor append mode, then we can not update a file. Also, if we do not pass any modes, then the default is the read mode.

Once we are done accessing the file, we can close the file using the close() method. This method accepts fd of the file that was opened earlier. Closing a file allows us to release resources associated with the opened file.

Reading from a File

Let us begin by focusing on how to read contents of a file. For this purpose, we start with a small file name "mountain-lion.txt". Let us first take a look at the content and the word-count of the file (using "cat" and "wc" utility commands). The cat command dumps contents of a file and the wc command prints total lines, total words, and total characters of a file; for more info, we can google using "man cat" and "man wc".

 [user@codingbison]$ cat mountain-lion.txt 
 Mountain lion is also known as cougar, puma, or mountain cat. 
 Mountain lion is found in many parts of the world. 
 By nature, a mountain lion is reclusive and usually avoids people.
 An adult mountain lion is generally solid red or brown in color. 
 [user@codingbison]$ 
 [user@codingbison]$ wc mountain-lion.txt 
   4  44 248 mountain-lion.txt
 [user@codingbison]$ 

With that, let us use the open() function to open this file. Since we are interested in reading its content, we open the file in read mode.

 [user@codingbison]$ python3
 Python 3.2.3 (default, Jun  8 2012, 05:40:06) 
 [GCC 4.6.3 20120306 (Red Hat 4.6.3-2)] on linux2
 Type "help", "copyright", "credits" or "license" for more information.
 >>> 
 >>> fd = open("mountain-lion.txt", "r")
 >>> 
 >>> print(type(fd))
 <class '_io.TextIOWrapper'>
 >>> 
 >>> print(fd)
 <_io.TextIOWrapper name='mountain-lion.txt' mode='r' encoding='UTF-8'>
 >>> 
 >>> fd.close() 
 >>> 

Once we have an fd from the open() call, we can use various methods of fd to read the file content: read(), read(n), readline(), and readlines(). The read() method reads entire content of a file, read(n) reads n characters from the file, readline() reads one line at a time, and readlines() reads all the lines present in a file at the same time. Note that Python has no readlines(n) method!

Let us try to read contents of the "mountain-lion.txt" file using the read() method. Next, we print the value returned from the read() method; the output confirms that read() returns the entire content of the file!

 >>> fd = open("mountain-lion.txt", "r")
 >>> 
 >>> x = fd.read()
 >>> print(x)
 Mountain lion is also known as cougar, puma, or mountain cat. 
 Mountain lion is found in many parts of the world. 
 By nature, a mountain lion is reclusive and usually avoids people.
 An adult mountain lion is generally solid red or brown in color. 

 >>> 

When we open a file, along with fd, the file also has a seek location, which points to the current read position (in terms of characters) of the file. We can get that location using the tell() method of the fd. In the below-provided output, the output of tell() is 248, which means that the seek location is pointing to the end of the file (the earlier "wc" command shows that the file has 248 characters). If we were to attempt to read the file again, then the read() method would return nothing since the seek pointer is already pointing to the end of the file.

 >>> print fd.tell()
 248
 >>> x = fd.read()
 >>> print(x)

 >>> 

To overcome this, we can simply reset the seek position back to the start of the file using the seek() method. The seek() method takes the position of the file in terms of characters where we wish to set the file position. Using 0-based index, the starting index is 0. Once we set the seek() to 0, then the read() would once again return the content of the file.

 >>> fd.seek(0)
 0
 >>> x = fd.read()
 >>> print(x)
 Mountain lion is also known as cougar, puma, or mountain cat. 
 Mountain lion is found in many parts of the world. 
 By nature, a mountain lion is reclusive and usually avoids people.
 An adult mountain lion is generally solid red or brown in color. 

 >>> 

Function readlines() is similar to read() -- it allows us to read the entire content of the file at a time, but it returns the content as a list of all the lines present in the file. Once we have the return value of the readlines(), we can use a for/in loop to read each line one at a time.

 >>> fd.seek(0)
 0
 >>> x = fd.readlines()
 >>> print(x)
 ['Mountain lion is also known as cougar, puma, or mountain cat. \n', 'Mountain lion 
   is found in many parts of the world. \n', 'By nature, a mountain lion is reclusive 
   and usually avoids people.\n', 'An adult mountain lion is generally solid red or 
   brown in color. \n']
 >>> 

We can also loop over x and read the content line-wise.

 >>> fd.seek(0)
 0
 >>> x = fd.readlines()
 >>> for lines in x:
 ...     print(lines)
 ... 
 Mountain lion is also known as cougar, puma, or mountain cat. 

 Mountain lion is found in many parts of the world. 

 By nature, a mountain lion is reclusive and usually avoids people.

 An adult mountain lion is generally solid red or brown in color. 

 >>> 

Reading from a Large File

Both read() and readlines() allow us read the entire content of the file in the memory. When we have large files (files can often be Gigabytes long!), reading the entire file content in the memory can easily degrade the performance. For such scenarios, Python provides read(n) and readline() methods. Method read(n) allows us to read() only n characters and readline() allows us to read one line at a time.

Let us read the file content again, this time using read(n) method.

 >>> fd.seek(0)
 0
 >>> x = fd.read(100)
 >>> print(x)
 Mountain lion is also known as cougar, puma, or mountain cat. 
 Mountain lion is found in many parts 
 >>> 
 >>> print(fd.tell())
 100
 >>> x = fd.read(148)
 >>> print(x)
 of the world. 
 By nature, a mountain lion is reclusive and usually avoids people.
 An adult mountain lion is generally solid red or brown in color. 

 >>> 
 >>> print(fd.tell())
 248
 >>> 

Next, let us use readline() to read the file content. With this method, the seek() index slides automatically with each read and points to the start of the next line. Please note that a line is identified when the readline() encounters a new line character. To read the entire "mountain-lion.txt" file, we will have to issue readline() call the same number of times as there are lines in this file.

 >>> fd.seek(0)
 0
 >>> x = fd.readline()
 >>> print(x)
 Mountain lion is also known as cougar, puma, or mountain cat. 

 >>> print(fd.tell())
 63
 >>> x = fd.readline()
 >>> print(x)
 Mountain lion is found in many parts of the world. 

 >>> print(fd.tell())
 115
 >>> x = fd.readline()
 >>> print(x)
 By nature, a mountain lion is reclusive and usually avoids people.

 >>> print(fd.tell())
 182
 >>> x = fd.readline()
 >>> print(x)
 An adult mountain lion is generally solid red or brown in color. 

 >>> print(fd.tell())
 248
 >>> 
 >>> x = fd.readline()
 >>> print(fd.tell())
 248
 >>> 

Writing to a File

When we have to write text or binary to a file, we can do so by providing the write ("w") or append ("a") modes when we open the file. Python provides two methods to write to a file: write(str) and writelines(lines).

When we open a file with "w" mode, then write() or writelines simply overwrite what is current present in the file. Sometimes, we would like to avoid that. For such cases, we can use the append ("a") mode. With append mode, whatever is written, Python appends it at the end of the file instead of overwriting the entire content.

Let us open a new file ("lion.txt") and write some text to it using the write() method.

 >>> fd = open("lion.txt", "w") 
 >>> fd.write("Unlike other cats, lions live in groups")
 39
 >>> fd.close()
 >>> 
 >>> fd = open("lion.txt","r")
 >>> x = fd.read()
 >>> print(x)
 Unlike other cats, lions live in groups
 >>> 

The writelines() method is similar except that it takes a set of lines as an input.

 >>> lines = "A lion can sleep upto 20 hours in a day\n"
 >>> print(lines)
 A lion can sleep upto 20 hours in a day

 >>> 
 >>> fd = open("lion.txt","w")
 >>> fd.writelines(lines)
 >>> fd.close()
 >>> 
 >>> fd = open("lion.txt","r")
 >>> x = fd.read()
 >>> print(x)
 A lion can sleep upto 20 hours in a day

 >>> 
 >>> fd.close()

Python provides flush() method for fd to flush all the file-related data in the memory and to write it to the file. When we close a file, using fd.close(), flush() is done automatically.





comments powered by Disqus