python *args and **kwargs variable

Reading CSV files in Python 3

In this Python 3 tutorial,  we will learn how to import and use CSV files in Python. First of you should be aware of what is CSV file- it means comma separated values.

Let’s import our datafile mpg.csv,(download by clicking on that file) which contains fuel economy data for 234 cars.

  • mpg : miles per gallon
  • class : car classification
  • cty : city mpg
  • cyl : # of cylinders
  • displ : engine displacement in liters
  • drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd
  • fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)
  • hwy : highway mpg
  • manufacturer : automobile manufacturer
  • model : model of car
  • trans : type of transmission
  • year : model year

Program to Import, read and print first 3 dictionaries of the CSV file.

import csv


with open('mpg.csv') as csvfile:
         mpg = list(csv.DictReader(csvfile))
 
mpg[:2]    # The first two dictionaries in our list.


OutPut:

 [OrderedDict([('', '1'), 
('manufacturer', 'audi'), 
('model', 'a4'), 
('displ', '1.8'), 
('year', '1999'), 
('cyl', '4'), 
('trans', 'auto(l5)'), 
('drv', 'f'), 
('cty', '18'), 
('hwy', '29'), 
('fl', 'p'), 
('class', 'compact')]), 
OrderedDict([('', '2'), 
('manufacturer', 'audi'), 
('model', 'a4'), 
('displ', '1.8'), 
('year', '1999'), 
('cyl', '4'), 
('trans', 'manual(m5)'), 
('drv', 'f'), 
('cty', '21'), 
('hwy', '29'), 
('fl', 'p'), 
('class', 'compact')])]

 

Note:

  1.  csv.Dictreader  function is used to read each row of our csv file as a dictionary.
  2. len(mpg) will show that our list is comprised of 234 dictionaries(Total number of the records in the CSV file).
  3. keys()  gives us the column names of our csv.

Below code will be used to find out the average cty fuel economy across all cars. All values are in String, so we need to convert it to float.

sum(float(d['cty']) for d in mpg) / len(mpg)  #output = 16.86

Similarly the code to find out the average hwy fuel economy across all cars.

sum(float(d['hwy']) for d in mpg) / len(mpg)  #Output = 23.44

Below is the complete code to Reading CSV files in Python and manipulating with it through example.

# -*- coding: utf-8 -*-
"""
Created on Thu Aug  3 15:17:27 2017
 
@author: Kundan.Kumar
"""
 
import csv
 
with open('mpg.csv') as csvfile:
    mpg = list(csv.DictReader(csvfile))
 
print(mpg[:2]) # The first three dictionaries in our list.
print('Total Number of record in CSV:',len(mpg)) # print lenth of the list of dictionaries.
 
print('Name of columns:',mpg[0].keys()) # It gives the column name of our CSV file
 
av_cty_fuel_ec  = sum(float(d['cty']) for d in mpg) / len(mpg) # find the average cty fuel economy across all cars
av_hwy_fuel_ec = sum(float(d['hwy']) for d in mpg) / len(mpg) # find the average hwy fuel economy across all cars
 
print('Average city fuel economy for all car:', av_cty_fuel_ec)
print('Average hiway fuel economy for all car:', av_hwy_fuel_ec)
 
cylinders = set(d['cyl'] for d in mpg) # set of unique number of car's cylinder
 
print(cylinders)    
 
#group the cars by number of cylinder, and finding the average cty mpg for each group.
CtyMpgByCyl = []
 
for c in cylinders: # iterate over all the cylinder levels
    summpg = 0
    cyltypecount = 0
    for d in mpg: # iterate over all dictionaries
        if d['cyl'] == c: # if the cylinder level type matches,
            summpg += float(d['cty']) # add the cty mpg
            cyltypecount += 1 # increment the count
    CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple ('cylinder', 'avg mpg')
 
CtyMpgByCyl.sort(key=lambda x: x[0])
print(CtyMpgByCyl)

OutPut:

[OrderedDict([('', '1'), ('manufacturer', 'audi'), ('model', 'a4'), ('displ', '1.8'), ('year', '1999'), ('cyl', '4'), ('trans', 'auto(l5)'), ('drv', 'f'), ('cty', '18'), ('hwy', '29'), ('fl', 'p'), ('class', 'compact')]), 
OrderedDict([('', '2'), ('manufacturer', 'audi'), ('model', 'a4'), ('displ', '1.8'), ('year', '1999'), ('cyl', '4'), ('trans', 'manual(m5)'), ('drv', 'f'), ('cty', '21'), ('hwy', '29'), ('fl', 'p'), ('class', 'compact')])]
Total Number of record in CSV: 234
Name of columns:  odict_keys(['', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans', 'drv', 'cty', 'hwy', 'fl', 'class'])
Average city fuel economy for all car:  16.858974358974358
Average hiway fuel economy for all car: 23.44017094017094
{'4', '6', '8', '5'}
[('4', 21.012345679012345), ('5', 20.5), ('6', 16.21518987341772), ('8', 12.571428571428571)]

 

If you find the tutorial useful, please share and put your query if you have any.

You can copy the above code and run it but before the please download the  mpg.csv file and place it to your working directory.

Reference: Coursera data science course.

 

 

Leave a Reply