Skip to Main Content

Importing Data and Writing Files

Open Text Files

Simply use the built-in function open(): open(filename, r).

First, you have to know the filepath of the file you plan to open. For example, if you placed the file on your desktop, the filepath of the file would be C:\Users\your_user_name\Desktop\hello.txt (this assumes you are using a windows machine). There are two ways to write the path in Python, explained below.

# use \\ when writing the path of the file
example1_data = open("C:\\Users\\your_username\\Desktop\\example1.txt", "r")  

# another example for opening a text file, use / when writing the path of the file
example2_data = open("C:/Users/your_username/Desktop/example1.txt","r")  

# get a list of all lines in the text file
lines = example1_data.readlines()  

# read the whole file, line by line
for i in range(0, len(lines)): # from the first to the last line in the file
     print lines[i]     # print the line (lines are printed one at a time)
example1_data.close()  # close the file
example2_data.close()

Tutorials for learning the open() function can be found at python.org (Built-in Functions) or tutorialspoint.

Open .csv files

You can use the open() function to open a .csv format file and work on it as well. Here we introduce a Python module csv to work with .csv format files.

import csv
myfile = open("C:\\Users\\your_username\\Desktop\\example1.csv", "r")  
# "r" means that we only want to read the file and get the information from the file, 
# there are couple options

csvfile = csv.reader(myfile)  # get rows in the file
for row in csvfile:  # print all rows in the file
     print row

Note: Another way to read rows in the file is

for row in myfile:
     print row

The difference between these two methods is the type of rows: csv.reader produces a list of several elements, but the for loop prints each line as a string.

More detail about the module csv can be found at python.org. (CSV File Reading and Writing).

Open .xlsx files

There are several Python packages available to work with Excel. We introduce the package openpyxl here.

import openpyxl
# open a excel file with .xlsx format
excelfile = openpyxl.load_workbook('C:\\Users\\your_username\\Documents\\example1.xlsx')  

excelfile.get_sheet_names()  # get names of all spreadsheet in the file
sheet1 = excelfile.get_sheet_by_name("Sheet1")  # get the first spreadsheet by name
sheet1.max_row  # get the number of rows in the sheet
sheet1.max_column  # get the number of columns in the sheet
sheet1["A10"].value  # read the value in the cell A10
sheet1.cell(row=10, column=1).value  # another way to read the cell A10

# read the whole spreadsheet
for row_index in range(1, (sheet1.max_row+1)):
     print "Row:", row_index
     for col_index in range(1, (sheet1.max_column+1)):
          print "Column:", col_index, sheet1.cell(row=row_index, column=col_index).value   # Get cell object by row, col

Note:

  1. "Sheet1" is an example of the name of a spreadsheet, you should adjust it depending on the name of the spreadsheet you see.
  2. To read the whole spreadsheet, you can read it by rows or by columns; which option is best will depend on the data structure. Accordingly, it is good practice to read the first row or column before reading the entire sheet.
  3. Think about why 1 is added (for example, sheet1.max_column+1) in the loop when reading the whole spreadsheet.

More detail about the package openpyxl can be found on the official page of the package.

Read the content of a URL with Python

We use the same example with different file formats above to demonstrate how to read the content of an URL with Python.

import urllib  # the module for reading a url
import csv  # the module for opening a .csv format file
import pandas  # the module for opening a .xlsx file

# read a .txt format file with a given URL
onlinefile_text = urllib.urlopen("https://miamioh.instructure.com/courses/38817/files/3039066/download")
local_text = onlinefile_text.read()
print local_text  # read all rows in the file

# read a .csv format file with a given URL
onlinefile_csv = urllib.urlopen("https://miamioh.instructure.com/courses/38817/files/3039068/download")
local_csv = csv.reader(onlinefile_csv)

for row in local_csv:  # read all rows in the file
     print row

# read a .xlsx format file with a given URL
onlinefile_excel = urllib.urlopen("https://miamioh.instructure.com/courses/38817/files/3039067/download")
local_excel = pandas.ExcelFile(onlinefile_excel)
local_excel.sheet_names  # check out names of spreadsheets
wholetable = local_excel.parse(local_excel.sheet_names[0], header=0)  
# header = 0 means the first row is the header, use header = None if no header
print wholetable
wholetable["Product_ID"]  # only read the column of Product ID
wholetable.head()  # read the first five rows in the table
wholetable["Customer_ID"].head()  # read the first five rows of Customer ID in the table
wholetable[["Customer_ID", "Date"]].head()

Note:

  1. Tutorials for learning the module urllib can be found at python.org (urllib - Open arbitrary resources by URL).
  2. More detail about the data frame in Python can be found in the Managing Data section.

.txt format

Now the online stationery store would like to provide email accounts for its customers. The email accounts will be Customer_ID@happy.com. We want to create a file that contains only the customers' emails. In addition, we illustrate how to add more rows into an existing file.

oldtext = open("C:\\Users\\your_username\\Documents\\example1.txt", "r")
newtext = open("C:\\Users\\your_username\\Documents\\emaillist.txt","w")
lines = oldtext.readlines()  # write data to the file, the file is created if it doesn't exist
newtext.write("Email\n")  # give a title of the column and move to the next line
for line in lines[1:len(lines)]:  # here we want to skip the first line since it is the header
     line.replace("\n", '') # remove the \n in the string
     case = line.split() # split the string by tab
     newtext.write(case[1]+'@happy.com\n') # write email address into the file
oldtext.close()  # close the file oldtext
newtext.close()  # close the file newtext

# We show how to add more rows into an existing file in the following 
addtext = open("C:\\Users\\your_username\\Documents\\emaillist.txt","a")
addtext.write("michelangeloftt@happy.com\n")  # append the email into the file
addtext.write("everarduvi@happy.com\n")  # append the email into the file
addtext.close()  # close the file addtext

Note: We can just add a new column to the original file and save it. It is not necessary to create a new file.

.csv format

For the same online stationery store, we want to create a new column of customers' emails and add this column to the original data and save the data to a new file. In addition, we illustrate how to append rows to an existing file.

import csv
oldfile = open("C:\\Users\\your_username\\Documents\\example1.csv", "r")
newfile = open("C:\\Users\\your_username\\Documents\\emaillist.csv", "wb")  
# "wb" means that writing binary data to the file, the file is created if it doesn't exist

csvfile = csv.reader(oldfile)  # get rows in oldfile
newwrite = csv.writer(newfile)  # ready to write data to newfile
line_no = 0  # set up the line number from 0
for line in csvfile:
     if line_no == 0:  # if this is the first line, we want to add a title for the column
          line.append("Email")  # name a title for the new column and append it to the row
          newwrite.writerow(line)  # write the line to newfile
     else:
          line.append(line[1]+'@happy.com')  # append the email to the row
          newwrite.writerow(line)  # write the line to newfile
     line_no = +1  # add 1 to the line number 

oldfile.close()  # close oldfile
newfile.close()  # close newfile

# We show how to add more rows in an existing file in the following
existfile = open("C:\\Users\\your_username\\Documents\\example1.csv", "ab")  
# "ab" command means appending binary data to the file, the file is created if it doesn't exist
addwrite = csv.writer(existfile)  # append rows to the file
addwrite.writerow(("#543275", "AibekS", "S15037", 1.75, 21, "12/19/2015"))
addwrite.writerow(("#417893", "RonaldL", "T29643", 1.25, 18, "7/24/2015"))
existfile.close()  #  close the file

Note:

1. Use "a+" instead of "a" in the open() function for reading and appending.

2. We can just add the column to the original data and save the data to the original file without creating a new file.

.xlsx format

There are several Python packages that we can use for writing data to an excel file. Here we introduce the Python package xlsxwriter. The following example shows how we get data from an excel file and append a new column to the data, then we save the data to a new file. We can simply save the data back to the original file without creating a new file as well.

import openpyxl
import xlsxwriter
excelfile = openpyxl.load_workbook("C:\\Users\\Ying-Ju\\Google Drive\\FSB\\CAD\\importing data\\example1.xlsx")  # original data file
workbook = xlsxwriter.Workbook("C:\\Users\\Ying-Ju\\Google Drive\\FSB\\CAD\\importing data\\newdata.xlsx")  # new data file 
sheet1 = excelfile.get_sheet_by_name("Sheet1")  # get the data we want to use
row_no = sheet1.max_row  # get the number of rows in the spreadsheet
col_no = sheet1.max_column  # get the number of columns in the spreadsheet
worksheet = workbook.add_worksheet()  # create a new spreadsheet in the new file

bold = workbook.add_format({'bold': True, 'font_size': 16}) # set the font style for the header
datefont = workbook.add_format({'font_size':14, 'num_format': 'mm/dd/yy'})  # set the format for calendar data
fontsize = workbook.add_format({'font_size': 14}) # set the general font style
worksheet.set_column(0, 0, 20)  # set the width of Column A to 20
worksheet.set_column(1, 5, 15)  # set the width of Columns B-F to 25
worksheet.set_column(6, 6, 35)  # set the width of Column G to 35

for row_index in range(0, row_no):
     for col_index in range(0, (1+col_no)):
          if row_index == 0:  # put the header 
               if col_index == col_no:  # add a name of new column
                    worksheet.write(0, col_index, "Email", bold)
               else:
                    worksheet.write(row_index, col_index, sheet1.cell(row=1, column=(col_index+1)).value, bold)  
                   # put names of other columns 
          else: # put cases
               if col_index == (col_no-1): # put the column of Date with the format mm/dd/yy specified before
                    worksheet.write(row_index, col_index, sheet1.cell(row=(row_index+1), column=(col_index+1)).value, datefont)
               else: 
                    if col_index == col_no: # add a column for emails to the file
                         worksheet.write(row_index, col_index, str(sheet1.cell(row=(row_index+1), column=2).value)+"@happy.com", fontsize) 
                    else:
                         worksheet.write(row_index, col_index, sheet1.cell(row=(row_index+1), column=(col_index+1)).value, fontsize) 
                         # put other columns in the file
workbook.close() # close the file

Note: In fact, we can just use the package openpyxl to write data to files. However, the way xlsxwriter package sets up the format (width of columns, font size, font color, etc.) in excel files is much easier and simple.

More detail about the package xlsxwriter can be found on the official page of the package.

Center for Analytics and Data Science

165 McVey Data Science Building
105 Tallawanda Rd
Oxford, OH 45056