10 Chapter 10: Data Files
DATA FILES
Topics Covered:
- Data file as input
- Creating a text file
- Reading from a text file
- Data file as output
Data File as Input
In all of the programs we have written so far, the user input has come from the keyboard. There are many scenarios, however, when we will instead have the program input come from a data file. We will learn to have our Python programs read data from a text file, which is sometimes called an ASCII file. This type of file simply stores all of the data as raw text, one byte per character, using ASCII codes.
Here are some uses and advantages of having a text file the source for program input:
- Ease of testing: When thoroughly testing a program, you typically need to enter several input values from the keyboard. After making a correction to your program, you need to type those input values in again. Each time you run your program, you type those values over and over. By using a text file input, you can type the input into a file one time and your input is complete.
- Large inputs: You may need to write a program in which the input is simply too large to practically and accurately enter it from the keyboard.
- Import data from other systems: Many times, you can use input text files from other sources, such as a web site or another application’s output. Quite often, data from spreadsheets and databases can be saved to a text file and then used as the input to your program.
Creating a Text File
Most computer systems include a text editor, which is a program that allows you to create, edit, save, and retrieve text files. Windows-based computer systems provide the Notepad program, while Macintosh computers ship with the TextEdit application. Your Python programs, by the way, are text files. You can even use your Idle environment to create a text file. Suppose you have a text file named “artists.txt” that looks like this:
Bob Dylan
Elvis Presley
Chuck Berry
Jimi Hendrix
James Brown
Aretha Franklin
Ray Charles
Reading from a Text File
Python makes it very easy to read data from a text file. The first step when accessing a text file is to open the file. The open function has two parameters, the name of the external data file and the file access mode. The file should be stored in the same directory, or folder, as your Python program. If the file is stored in another location, the path to the file must accompany the file name. In our first code example, the external data file is named “artists.txt”.
To indicate the file access mode, a single-character string is used based on whether the file will be opened in read (r), write (w), or append (a) mode. In this section we are reading from text files, so we will use mode “r”. The open function returns a file handle that must be stored in a variable. In the example below, we stored the file handle as infile.
In our example program, we can use a for loop to read the file, one line at a time. In the for loop, we will store each line to a list named dataList. Once the program is done reading data from the file the close() method of the file handle should be invoked to disconnect the file from your application. In our program below, we include a second for loop that simply prints out the elements of dataList. In each iteration of the for loop, one item is retrieved from the list and stored in artist.
infile = open(“artists.txt”, “r”)
dataList = []
for line in infile:
dataList.append(line)
infile.close()
for artist in dataList:
print(artist)
The output when this program is executed:
A couple of observations from running the program: (1) When hitting the F5 key to run the program, it immediately executed and showed the results with no keyboard input, and (2) the output appeared with blank lines between each artist. When each line was read from the file, the newline character ‘\n’ remained at the end of the string input. Since the print function automatically adds a newline when displaying output, that second newline created the extra space, which gave a “double spacing effect.
A common technique to remove the newline from the end of a string is to apply the rstrip() string method. This removes any trailing whitespace characters at the end of the string. We will apply this method in our next example, as well as add a counter and some string formatting to create a table with simple headings.
To break up a string into multiple pieces based on some delimiting character or string, we can use the split() method. The method has a single parameter, the delimiting string, such as a space, comma, or tab. The result of the split method is a list. The list will contain each of the broken-up pieces of the string without the delimiting strings. Let’s look at an example. After the split, the list pieces now contains four strings.
Here’s the new and improved version of our program for displaying the contents of the “artists.txt” file:
The output of the program run is shown next:
INTERACTIVE – Dealing with a fruit file
Okay, now that you have seen some examples of a data file being used as input, see if you can predict the exact output of the program below. Look closely at the tabs in the Trinket window so you can examine the “fruits.txt” data file.
Data File as Output
As with input, there are many uses and advantages to having your program output get sent to a text file:
- Too much output: Sometimes your program will produce so much output that it is difficult to view on the screen as it scrolls on by. Saving the output to a file allows you to view it at your leisure.
- Have permanent record: When you’re done running your code and you shut down Python, all of your program output disappears. If you would like to store the results of your program runs permanently, you could store that output to a file.
- Export program output for various uses: Once your program output is saved as a text file, you can then open that file with other software, such as a word processor, spreadsheet, statistical tool, etc.
When your program is going to create an output file, it must first use the open() function to create a file handle, using the “w” parameter to indicate you will be writing to the file. If you use write mode and the external data file already exists, it will be overwritten, and the former file contents will be lost. The append (a) mode is used if you want your program to just tack data onto the end of an existing file. If that external data file does not exist in append mode, it will be created.
The write method is used to send a string parameter to a text file. The program below will open a file named “test.txt” in the write mode. It then writes two strings to that file and closes the file.
dataFile = open(“test.txt”, “w”)
dataFile.write(“CS101”)
dataFile.write(“Rocks!”)
dataFile.close()
The program output:
Since this program had no print statements in it, there was no screen output from the program. When you hit the F5 key to run the program, you will see the >>> prompt. This is a little disconcerting at first. To evaluate our program, though, we just need to use a text editor to examine the output file “test.txt”. Below, we show the contents of this file. Differing from the print function, the write() method does not add a newline to the end of the output.
The test.txt data file:
CS101Rocks!
Problem: The Roaring Lions Run Club has hired you to develop a running activity log application. When a runner completes a run, she will need to input: (1) the date, (2) the distance in miles, and the time in (3) minutes and (4) seconds. The program will compute the average mile pace for the run. It will then output this average mile pace to a data file, along with the four inputs. Each running activity should get logged as a comma-separated line at the end of the file. Having comma separated values (CSV) allows this text file to be opened by many common programs, such as a spreadsheet.
Knowing that this single file will have data added to it over a long period of times indicates to a programmer to use the append (a) file access mode when the file is opened. There is a little math involved when we calculate the average mile pace. Before the coding process begins, we must have a plan for this computation. Let’s take a look at a test case and go through the necessary formulas using paper and pencil:
The main construct in this program is a while loop that allows the user to log as many runs as they wish. During each iteration of the loop, the user will provide the four input values discussed above. The average mile pace is then computed as illustrated in the test case. Note that we needed to import the math module in order to use the floor() method. An outputString is created to store the five items that will be sent to the file along with the comma separators. This string terminates with a newline so that each logged activity will appear on its own line. Since that instruction was so long, Python allows you to continue it on a second line by using the backslash (\) character. When the user has no more activities to log, the loop concludes and the close() method for the dataFile is called. The full program is shown next:
Each time the program is executed, new run records are added to the “activities.csv” data file. Suppose the user had already entered a couple of runs before they ran the program with the following interaction:
After running the program multiple times, “activities.csv” looks like this:
3/13/2021,3.1,29,47,9:36
3/15/2021,4,38,40,9:40
3/20/2021,3.1,28,33,9:13
3/22/2021,3.1,28,9,9:05
As mentioned, a CSV file is an ASCII file that can be opened or imported into many applications. We opened “activities.csv” using Excel and it was pretty easy to make the chart below:
In chapter 12, we will look at how you could create charts like the one above using the Python Turtle graphics module. In chapter 13, we examine how to turn that text-based interaction into a graphical user interface (GUI) that you are used to using with everyday apps. Finally, in chapter 14, we take a look at how to turn this app into a web-based application.
Chapter Review Exercises:
10.1. Name three uses or advantages of using a text file for program input.
10.2. Name three uses or advantages of using a text file for program output.
Programming Projects:
10.1. A text file named “superbowl.txt” has a list that includes the winner of every Super Bowl. The first few lines of the file look like this:
Write a Python program that will read the text file and print out all of the winners, along with the Super Bowl number and the year. The 1st Super Bowl occurred during the 1966 season. Display tabs between each column and include column headings. The first few lines of the program output should look like this: