Visualization - Area Plot
IBM Data Science Specialization: Area Plot
%%capture
!pip3 install xlrd
import matplotlib.pyplot as plt
import pandas as pd
df_can = pd.read_excel(
'./data/ibm/canada.xlsx',
sheet_name='Canada by Citizenship',
skiprows=range(20),
skipfooter=2
)
df_can.columns = list(map(lambda x: str(x), df_can.columns))
drops = [
'AREA',
'REG',
'DEV',
'Type',
'Coverage'
]
df_can.drop(columns=drops, inplace=True)
columns = {
'OdName': 'Country',
'AreaName': 'Continent',
'RegName': 'Region'
}
df_can.rename(columns=columns, inplace=True)
df_can.set_index('Country', inplace=True)
df_can['Total'] = df_can.sum(axis=1)
years = list(map(str, range(1980, 2014)))
df_can.sort_values('Total', ascending=False, axis=0, inplace=True)
df_top5 = df_can.head()
df_top5 = df_top5[years].transpose()
df_top5.head()
df_top5.index = df_top5.index.map(int)
df_top5.plot(
kind='area',
stacked=False,
figsize=(20, 10)
)
plt.title('Immigration Trend of Top 5 Countries')
plt.ylabel('Number of Immigrants')
plt.xlabel('Years')
plt.show()
The unstacked plot has a default transparency (alpha value) at 0.5. We can modify this value by passing in the alpha parameter.
df_top5.plot(
kind='area',
alpha=0.25,
stacked=False,
figsize=(20, 10),
)
plt.title('Immigration Trend of Top 5 Countries')
plt.ylabel('Number of Immigrants')
plt.xlabel('Years')
plt.show()
Option 1: Scripting layer (procedural method) - using matplotlib.pyplot as plt
You can use plt i.e. matplotlib.pyplot and add more elements by calling different methods procedurally; for example, plt.title(...) to add title or plt.xlabel(...) to add label to the x-axis.
# option 1: this is what we have been using so far
df_top5.plot(kind='area', alpha=0.35, figsize=(20, 10))
plt.title('Immigration trend of top 5 countries')
plt.ylabel('Number of immigrants')
plt.xlabel('Years')
Option 2: Artist layer (Object oriented method) - using an Axes instance from Matplotlib (preferred)
You can use an Axes instance of your current plot and store it in a variable (eg. ax). You can add more elements by calling methods with a little change in syntax (by adding *set_* to the previous methods). For example, use ax.set_title() instead of plt.title() to add title, or ax.set_xlabel() instead of plt.xlabel() to add label to the x-axis.
This option sometimes is more transparent and flexible to use for advanced plots.
# option 2: preferred option with more flexibility
ax = df_top5.plot(kind='area', alpha=0.35, figsize=(20, 10))
ax.set_title('Immigration Trend of Top 5 Countries')
ax.set_ylabel('Number of Immigrants')
ax.set_xlabel('Years')