How could the content be improved?
The following section introduce how data can be processed using loops
Automating data processing using For Loops
I believe it would also be advantageous to have a similar section in the following
Reading CSV Data Using Pandas
Here we can briefly introduce python generators as well. For example, consider a CSV file where entries are name, age, location. We can parse this data to a dataframe using a generator. Image location is a comma separated string field and we want to read latitude and longitude separately.
| name |
age |
location |
| John |
50 |
123341,123321 |
| Emily |
25 |
321321,123321 |
| Wick |
35 |
123341,654789 |
| Raj |
40 |
987789,123321 |
import csv
import pandas as pd
def transform_lines(csv_path):
reader = csv.reader(open(csv_path))
for line_no, line in enumerate(reader):
if line_no == 0:
yield ["Name", "Age", "Latitude", "Longitude"]
else:
name, age, location = line
lat, lng = location.split(",")
yield [name, int(age), float(lat), float(lng)]
lines = transform_lines("./data.csv")
df = pd.DataFrame(lines)
print(df.head())
This is specially useful in large datasets where loading large amount of data in text form is memory consuming.
How could the content be improved?
The following section introduce how data can be processed using loops
Automating data processing using For Loops
I believe it would also be advantageous to have a similar section in the following
Reading CSV Data Using Pandas
Here we can briefly introduce python generators as well. For example, consider a CSV file where entries are
name,age,location. We can parse this data to a dataframe using a generator. Imagelocationis a comma separated string field and we want to read latitude and longitude separately.This is specially useful in large datasets where loading large amount of data in text form is memory consuming.