“A death of a woman while pregnant or within 42 days of termination of pregnancy, irrespective of the duration and site of pregnancy, from any cause related or aggravated by the pregnancy or its management, but not from accidental or incidental causes.”
The definition of maternal mortality ratio is:
“The number of maternal deaths during a given time period per 100,000 live births during the same time period.”
In this notebook we will create a long-term time-series of maternal mortality ratio. To do this we will combine four existing datasets:
Gapminder (1751-2008)
OECD (1960-2020)
World Health Organization (2000-2017)
We make the assumption that their methods are comparable and that the data can be combined without transformation.
In years where there is an overlap in both time-series we use the data from the WHO.
This document will go through the steps required to create the data for this chart:
Downloading the Gapminder data
Firstly we download the data from Gapminder and save it locally. Then we read in the Gapminder data, clean up the column names and take a look at the data.
Code
r = requests.get("https://www.gapminder.org/documentation/documentation/gapdata010.xls")assert r.okPath("data/input/").mkdir(parents=True, exist_ok=True)output =open("data/input/gapminder.xls", "wb")output.write(r.content)output.close()
Some preliminary cleaning of the data: 1) remove the first 16 rows, 2) remove white space from the country names and 3) remove rows where the maternal mortality rate is NA.
There are some issues with the Gapminder data - we will manually clean them here.
The first three rows in the ‘year’ column seem to be wrong as it seems as if each row covers 110 years, but in another source they cover only 10 years.
I will replace them manually here with the middle of each decade, as is used in the other data from the same source.
There are cases where the year for Finland has been entered incorrectly three times, there are two 1772s, 1775s & 1967s, the second of each should be 1872, 1875 and 1957 respectively.
There are also two errors for New Zealand, and one error for both Sweden and the US.
Ireland, Denmark and Sri Lanka’s maternal mortality rate drops to zero at some points - we will remove these.
Checking the given MMR matches a calculated MMR, where the data for live births and maternal deaths is given. It seems largely okay but the values for Ireland, particularly from 1901-1920. There is a drop of almost 1000 maternal deaths between 1920 and 1922.
In the registrar general of Ireland it states that there were approximately 599 maternal deaths in Ireland in 1912 and on average 634 per year between 1903 and 1911. The estimated rate is 6 deaths per 1,000 births, so 600 per 100,000 births.
I think this suggests that the ‘maternal deaths’ figures may be wrong for the period 1901-1920. I think they may have back-calculated the maternal deaths from the MMR but accidentally used the year column instead as the calculated MMR works out as the year value.
Additionally there are some rows for the UK which don’t quite match, but they are very close so we will continue with the original values.
For some rows a range of years is given for a particular maternal mortality rate. We want to find the mid-point of that range. We use this function to find the mid-value of years given.
Let’s check the mid-year estimates have worked as expected. There is one case where the mid-year estimate is not the mid-point of the range of years. Let’s fix this manually.
Add in the OECD Stat Health data - this is not available on the OECD stat API so we manually download it from here: https://stats.oecd.org/index.aspx?queryid=30116
Read in data from WDI - in future iterations we will read this in from ETL so it is auto updated. The WDI sources this variable from the WHO so we will henceforth refer to it as the WHO data.
Code
try: r_who = requests.get("https://api.worldbank.org/v2/en/indicator/SH.STA.MMRT?downloadformat=csv" )except requests.exceptions.RequestException as e:print(e)raiseSystemExit(e) output =open("data/input/who_mmr.zip", "wb")output.write(r_who.content)output.close()with zipfile.ZipFile("data/input/who_mmr.zip", "r") as zip_ref: zip_ref.extractall("data/input/who_mmr")who = pd.read_csv("data/input/who_mmr/API_SH.STA.MMRT_DS2_en_csv_v2_4252399.csv", skiprows=4)
Cleaning the latest WHO data
Pivot and clean the WHO data, standardise the country names and drop values that are NA.
Remove entities that we aren’t interested in displaying on grapher as it is difficult to clearly define them.
Code
#%%[markdown]# ## Cleaning geographic entities# We don't want to include all of the available regions in the data as it is difficult for us to define them clearly, so we drop out a selection here.entities_to_drop = ["Africa Eastern and Southern","Africa Western and Central","Arab World","Central Europe and the Baltics","Early-demographic dividend","East Asia & Pacific (excluding high income)","East Asia & Pacific (IDA & IBRD countries)","Euro area","Europe & Central Asia (excluding high income)","Europe & Central Asia (IDA & IBRD countries)","European Union","Fragile and conflict affected situations","Heavily indebted poor countries (HIPC)","IBRD only","IDA & IBRD total","IDA blend","IDA only","IDA total","Late-demographic dividend","Latin America & Caribbean (excluding high income)","Latin America & the Caribbean (IDA & IBRD countries)","Least developed countries: UN classification","Middle East & North Africa (excluding high income)","Middle East & North Africa (IDA & IBRD countries)","Not classified","OECD members","Other small states","Pacific island small states","Post-demographic dividend","Pre-demographic dividend","Small states","South Asia (IDA & IBRD)","Sub-Saharan Africa (excluding high income)","Sub-Saharan Africa (IDA & IBRD countries)",]who_df = who_df[~who_df["entity"].isin(entities_to_drop)]
Combine the three datasets: Gapminder, OECD Stat and WHO
Plot the countries which have data from Gapminder so we can inspect the full timeline. Where there are duplicate values sourced from WHO we drop these.
The data is shown from from 1950 onwards. WHO is shown in blue, Gapminder in green and OECD in red.
Code
df_dup = df[(df.entity.isin(gap_ents)) & (df.year >=1950)]plt.figure(figsize=(12,7), facecolor="white")# plot numbering starts at 1, not 0plot_number =1for countryname, selection in df_dup.groupby("entity"):# Inside of an image that's a 15x13 grid, put this# graph in the in the plot_number slot. ax = plt.subplot(4, 4, plot_number) selection_gap = selection[selection["source"] =="gapminder"] selection_who = selection[selection["source"] =="who"] selection_oecd = selection[selection["source"] =="oecd"]if selection_who.shape[0] >0: selection_who.plot( x="year", y="maternal_mortality_rate", color ='b', ax=ax, label=countryname, legend=False, )if selection_gap.shape[0] >0: selection_gap.plot( x="year", y="maternal_mortality_rate", color ='g', ax=ax, label=countryname, legend=False, )if selection_oecd.shape[0] >0: selection_oecd.plot( x="year", y="maternal_mortality_rate", color ='r', ax=ax, label=countryname, legend=False, ) ax.set_title(countryname)# Go to the next plot for the next loop plot_number = plot_number +1plt.tight_layout()
The jump in the Sri Lanka and Malaysia trends are odd - we will drop the Gapminder data for this entity.
Plot the countries which have data from OECD so we can inspect the full timeline. Where there are duplicate values sourced from WHO we drop these.
The data is shown from from 1950 onwards. WHO is shown in blue, Gapminder in green and OECD in red.
Code
df_dup = df[(df.entity.isin(oecd_ents)) & (df.year >=1950)]plt.figure(figsize=(12,7), facecolor="white")# plot numbering starts at 1, not 0plot_number =1for countryname, selection in df_dup.groupby("entity"):# Inside of an image that's a 15x13 grid, put this# graph in the in the plot_number slot. ax = plt.subplot(6, 7, plot_number) selection_gap = selection[selection["source"] =="gapminder"] selection_who = selection[selection["source"] =="who"] selection_oecd = selection[selection["source"] =="oecd"]if selection_who.shape[0] >0: selection_who.plot( x="year", y="maternal_mortality_rate", color ='b', ax=ax, label=countryname, legend=False, )if selection_gap.shape[0] >0: selection_gap.plot( x="year", y="maternal_mortality_rate", color ='g', ax=ax, label=countryname, legend=False, )if selection_oecd.shape[0] >0: selection_oecd.plot( x="year", y="maternal_mortality_rate", color ='r', ax=ax, label=countryname, legend=False, ) ax.set_title(countryname)# Go to the next plot for the next loop plot_number = plot_number +1plt.tight_layout()
Turkey only has data every 8/9 years with the OECD but it’s not appropriate to fill in the gaps with the WHO data so we will drop the WHO data for Turkey.
---title: "Maternal Mortality Ratio"execute: warning: false message: falseformat: html: fontsize: 0.9em code-tools: true code-fold: true code-copy: hover toc: true toc-depth: 2 self-contained: true anchor-sections: truecomments: hypothesis: truejupyter: python3---The definition of maternal mortality is: > "A death of a woman while pregnant or within 42 days of termination of pregnancy, irrespective of the duration and site of pregnancy, from any cause related or aggravated by the pregnancy or its management, but not from accidental or incidental causes."The definition of maternal mortality ratio is:> "The number of maternal deaths during a given time period per 100,000 live births during the same time period."In this notebook we will create a long-term time-series of maternal mortality ratio. To do this we will combine four existing datasets:* Gapminder (1751-2008)* OECD (1960-2020)* World Health Organization (2000-2017)We make the assumption that their methods are comparable and that the data can be combined without transformation.In years where there is an overlap in both time-series we use the data from the WHO.This document will go through the steps required to create the data for this chart:```{=html}<iframe src="https://ourworldindata.org/grapher/maternal-mortality" style="width: 100%; height: 600px; border: 0px none;"></iframe>```## Downloading the Gapminder data```{python}#| echo: false import numpy as npimport pandas as pdimport requestsfrom pathlib import Pathimport zipfileimport matplotlib.pyplot as pltfrom functools importreducedef standardise_entities(entities_to_standardise: pd.Series, stan_dict:dict) -> pd.Series:""" A function to standardise the entities so they match OWID entities. """ standardised_entities = entities_to_standardise.map(stan_dict) ents = entities_to_standardise.drop_duplicates().to_list()for ent in ents:if ent notin stan_dict.keys():print(f"{ent} missing from entity mapping")assert standardised_entities.isna().sum() ==0return standardised_entities```Firstly we download the data from Gapminder and save it locally. Then we read in the Gapminder data, clean up the column names and take a look at the data. ```{python}r = requests.get("https://www.gapminder.org/documentation/documentation/gapdata010.xls")assert r.okPath("data/input/").mkdir(parents=True, exist_ok=True)output =open("data/input/gapminder.xls", "wb")output.write(r.content)output.close()```Some preliminary cleaning of the data: 1) remove the first 16 rows, 2) remove white space from the country names and 3) remove rows where the maternal mortality rate is NA.```{python}df = pd.read_excel("data/input/gapminder.xls")df_d = df.drop(df.index[0:16]).reset_index()df_d["Country"] = df_d["Country"].str.strip()df_d = df_d[~pd.isna(df_d["MMR"])]```## Cleaning the Gapminder data:::{.callout-important}There are some issues with the Gapminder data - we will manually clean them here. :::* The first three rows in the 'year' column seem to be wrong as it seems as if each row covers 110 years, but in [another source](https://docs.google.com/spreadsheets/u/0/d/14ZtQy9kd0pMRKWg_zKsTg3qKHoGtflj-Ekal9gIPZ4A/pub?gid=1#) they cover only 10 years.* I will replace them manually here with the middle of each decade, as is used in the other data from the same source.* There are cases where the year for Finland has been entered incorrectly three times, there are two 1772s, 1775s & 1967s, the second of each should be 1872, 1875 and 1957 respectively.* There are also two errors for New Zealand, and one error for both Sweden and the US.* Ireland, Denmark and Sri Lanka's maternal mortality rate drops to zero at some points - we will remove these.```{python}df_d.loc[:2, "year"] = ["1875", "1885", "1895"]df_d.loc[ (df_d["year"] ==1772)& (df_d["Country"] =="Finland")& (df_d["Maternal deaths"] ==487),"year",] =1872df_d.loc[ (df_d["year"] ==1775)& (df_d["Country"] =="Finland")& (df_d["Maternal deaths"] ==629),"year",] =1875df_d.loc[ (df_d["year"] ==1967)& (df_d["Country"] =="Finland")& (df_d["Maternal deaths"] ==77),"year",] =1957df_d.loc[ (df_d["year"] ==1967)& (df_d["Country"] =="Sweden")& (df_d["Maternal deaths"] ==39),"year",] =1957df_d.loc[ (df_d["year"] ==1967)& (df_d["Country"] =="United States")& (df_d["Maternal deaths"] ==1766.28),"year",] =1957# Drop any rows where MMR is zero or '...' (Ireland, Denmark and Sri Lanka)df_d = df_d.drop(df_d[(df_d["MMR"] ==0) | (df_d["MMR"] =='...')].index)df_d = df_d.drop( df_d[ (df_d["Country"] =="New Zealand")& (df_d["year"] ==1950)& (df_d["MMR"] ==90) ].index)# Duplicates after get_mid_year()df_d = df_d.drop( df_d[ (df_d["year"].isin(["1989-02", "1899-03"])) & (df_d["Country"] =="New Zealand") ].index)```Check that there are no duplicates in the data.```{python}assert ( df_d[df_d.duplicated(subset = ["Country", "year"],keep=False)].shape[0]==0)```Checking the given MMR matches a calculated MMR, where the data for live births and maternal deaths is given. It seems largely okay but the values for Ireland, particularly from 1901-1920. There is a drop of almost 1000 maternal deaths between 1920 and 1922. In the registrar general of Ireland it states that there were approximately 599 maternal deaths in Ireland in 1912 and on average 634 per year between 1903 and 1911. The estimated rate is 6 deaths per 1,000 births, so 600 per 100,000 births.I think this suggests that the 'maternal deaths' figures may be wrong for the period 1901-1920. I think they may have back-calculated the maternal deaths from the MMR but accidentally used the year column instead as the calculated MMR works out as the year value. Additionally there are some rows for the UK which don't quite match, but they are very close so we will continue with the original values. ```{python}df_check = df_d[['Country','year','Live Births', 'Maternal deaths', 'MMR']].dropna().reset_index(drop=True)df_check['mmr_calc'] = ((df_check['Maternal deaths']/df_check['Live Births'])*100000).astype(float).round(2)df_check['MMR'] = df_check['MMR'].astype(float).round(2)df_check[ df_check['mmr_calc'].round(2)!= df_check['MMR'].round(2)]```## Finding the mid-point of each year rangeFor some rows a range of years is given for a particular maternal mortality rate. We want to find the mid-point of that range. We use this function to find the mid-value of years given.```{python}def get_mid_year(year: str):if"-"instr(year): year_begin = year.split("-")[0] year_end = year.split("-")[1] year_end_len =len(year.split("-")[1]) year_end_pref =4- year_end_lenif year_end_pref >0: year_pref = year_begin[0:year_end_pref] year_end = year_pref + year_end year_out =int(np.mean([int(year_begin), int(year_end)]))else: year_out =int(np.mean([int(year_begin), int(year_end)]))else: year_begin =int(year) year_out =int(year)return pd.Series([year_begin, year_out])```Let's check the mid-year estimates have worked as expected. There is one case where the mid-year estimate is not the mid-point of the range of years. Let's fix this manually.```{python}df_years = df_d["year"].apply(get_mid_year)df_d["year_begin"] = df_years[0].astype(int)df_d["mid_year"] = df_years[1].astype(int)df_d[df_d["year_begin"] > df_d["mid_year"]]df_d.loc[ (df_d["year"] =="1897-01") & (df_d["Country"] =="New Zealand"), "mid_year"] =1899assert(df_d[df_d[["Country", "year"]].duplicated(keep=False)].shape[0] ==0)```Rename the columns, keeping only the columns we are interested in and standardise the country names and drop rows with NA values.```{python}df_d = df_d.copy(deep=False)df_d["year"] = df_d["mid_year"]df_gap = df_d[["Country", "year", "MMR"]]df_gap["source"] ="gapminder"df_gap.rename( columns={"Country": "entity", "MMR": "maternal_mortality_rate"}, inplace=True)stand_countries = pd.read_csv("data/input/countries_to_standardise_country_standardized.csv")stan_dict = stand_countries.set_index("Country").squeeze().to_dict()df_gap["entity"] = standardise_entities(df_gap["entity"], stan_dict)df_gap = df_gap.dropna(subset=['maternal_mortality_rate'])assert df_gap['entity'].isna().sum() ==0```## Read in the OECD Stat dataAdd in the OECD Stat Health data - this is not available on the OECD stat API so we manually download it from here: https://stats.oecd.org/index.aspx?queryid=30116```{python}oecd_df = pd.read_csv("data/input/HEALTH_STAT_08082022144439675.csv")oecd_df = oecd_df[(oecd_df['VAR'] =='MATIMATM')]oecd_df = oecd_df[["Country", "Year", "Value"]].rename( columns={"Country": "entity", "Year": "year", "Value": "maternal_mortality_rate"})oecd_df["source"] ="oecd"oecd_df["entity"] = standardise_entities(oecd_df["entity"], stan_dict)assert oecd_df['entity'].isna().sum() ==0```## Read in the latest WHO dataRead in data from WDI - in future iterations we will read this in from ETL so it is auto updated. The WDI sources this variable from the WHO so we will henceforth refer to it as the WHO data.```{python}try: r_who = requests.get("https://api.worldbank.org/v2/en/indicator/SH.STA.MMRT?downloadformat=csv" )except requests.exceptions.RequestException as e:print(e)raiseSystemExit(e) output =open("data/input/who_mmr.zip", "wb")output.write(r_who.content)output.close()with zipfile.ZipFile("data/input/who_mmr.zip", "r") as zip_ref: zip_ref.extractall("data/input/who_mmr")who = pd.read_csv("data/input/who_mmr/API_SH.STA.MMRT_DS2_en_csv_v2_4252399.csv", skiprows=4)```## Cleaning the latest WHO dataPivot and clean the WHO data, standardise the country names and drop values that are NA.```{python}year_cols =list(range(1960, 2022))year_cols = [str(int) forintin year_cols]who_df = pd.melt(who, id_vars=["Country Name"], value_vars=year_cols)who_df.rename( columns={"Country Name": "entity","variable": "year","value": "maternal_mortality_rate", }, inplace=True,)who_df["source"] ="who"who_df["entity"] = standardise_entities(who_df["entity"], stan_dict)who_df['year'] = who_df['year'].astype(int)who_df = who_df.dropna(subset=['maternal_mortality_rate'])```Remove entities that we aren't interested in displaying on grapher as it is difficult to clearly define them.```{python}#%%[markdown]# ## Cleaning geographic entities# We don't want to include all of the available regions in the data as it is difficult for us to define them clearly, so we drop out a selection here.entities_to_drop = ["Africa Eastern and Southern","Africa Western and Central","Arab World","Central Europe and the Baltics","Early-demographic dividend","East Asia & Pacific (excluding high income)","East Asia & Pacific (IDA & IBRD countries)","Euro area","Europe & Central Asia (excluding high income)","Europe & Central Asia (IDA & IBRD countries)","European Union","Fragile and conflict affected situations","Heavily indebted poor countries (HIPC)","IBRD only","IDA & IBRD total","IDA blend","IDA only","IDA total","Late-demographic dividend","Latin America & Caribbean (excluding high income)","Latin America & the Caribbean (IDA & IBRD countries)","Least developed countries: UN classification","Middle East & North Africa (excluding high income)","Middle East & North Africa (IDA & IBRD countries)","Not classified","OECD members","Other small states","Pacific island small states","Post-demographic dividend","Pre-demographic dividend","Small states","South Asia (IDA & IBRD)","Sub-Saharan Africa (excluding high income)","Sub-Saharan Africa (IDA & IBRD countries)",]who_df = who_df[~who_df["entity"].isin(entities_to_drop)]```## Combine the three datasets: Gapminder, OECD Stat and WHO```{python}df = pd.concat([df_gap, who_df, oecd_df], ignore_index=True)df["year"] = df["year"].astype(int)df['maternal_mortality_rate'] = df['maternal_mortality_rate'].astype(float)```Identify entities and years which have data from multiple datasets so that we can plot these. ```{python}dup_ents = df[['entity', 'year']][df.duplicated(subset=['entity', 'year'])].drop_duplicates()dup_ents['ent_year'] = dup_ents['entity'] +'_'+ dup_ents['year'].astype(str)gap_ents = df['entity'][df['source'] =='gapminder'].drop_duplicates().to_list()oecd_ents = df['entity'][df['source'] =='oecd'].drop_duplicates().to_list()``````{python}df['ent_year'] = df['entity'] +'_'+ df['year'].astype(str)df = df[~((df['ent_year'].isin(dup_ents['ent_year'])) & (df['source'] =='who'))]```Plot the countries which have data from Gapminder so we can inspect the full timeline. Where there are duplicate values sourced from WHO we drop these. The data is shown from from 1950 onwards. WHO is shown in blue, Gapminder in green and OECD in red.```{python}df_dup = df[(df.entity.isin(gap_ents)) & (df.year >=1950)]plt.figure(figsize=(12,7), facecolor="white")# plot numbering starts at 1, not 0plot_number =1for countryname, selection in df_dup.groupby("entity"):# Inside of an image that's a 15x13 grid, put this# graph in the in the plot_number slot. ax = plt.subplot(4, 4, plot_number) selection_gap = selection[selection["source"] =="gapminder"] selection_who = selection[selection["source"] =="who"] selection_oecd = selection[selection["source"] =="oecd"]if selection_who.shape[0] >0: selection_who.plot( x="year", y="maternal_mortality_rate", color ='b', ax=ax, label=countryname, legend=False, )if selection_gap.shape[0] >0: selection_gap.plot( x="year", y="maternal_mortality_rate", color ='g', ax=ax, label=countryname, legend=False, )if selection_oecd.shape[0] >0: selection_oecd.plot( x="year", y="maternal_mortality_rate", color ='r', ax=ax, label=countryname, legend=False, ) ax.set_title(countryname)# Go to the next plot for the next loop plot_number = plot_number +1plt.tight_layout()```The jump in the Sri Lanka and Malaysia trends are odd - we will drop the Gapminder data for this entity. ```{python}df = df[~((df['entity'].isin(['Sri Lanka', 'Malaysia'])) & (df['source'] =='gapminder'))]```Plot the countries which have data from OECD so we can inspect the full timeline. Where there are duplicate values sourced from WHO we drop these. The data is shown from from 1950 onwards. WHO is shown in blue, Gapminder in green and OECD in red.```{python}df_dup = df[(df.entity.isin(oecd_ents)) & (df.year >=1950)]plt.figure(figsize=(12,7), facecolor="white")# plot numbering starts at 1, not 0plot_number =1for countryname, selection in df_dup.groupby("entity"):# Inside of an image that's a 15x13 grid, put this# graph in the in the plot_number slot. ax = plt.subplot(6, 7, plot_number) selection_gap = selection[selection["source"] =="gapminder"] selection_who = selection[selection["source"] =="who"] selection_oecd = selection[selection["source"] =="oecd"]if selection_who.shape[0] >0: selection_who.plot( x="year", y="maternal_mortality_rate", color ='b', ax=ax, label=countryname, legend=False, )if selection_gap.shape[0] >0: selection_gap.plot( x="year", y="maternal_mortality_rate", color ='g', ax=ax, label=countryname, legend=False, )if selection_oecd.shape[0] >0: selection_oecd.plot( x="year", y="maternal_mortality_rate", color ='r', ax=ax, label=countryname, legend=False, ) ax.set_title(countryname)# Go to the next plot for the next loop plot_number = plot_number +1plt.tight_layout()```Turkey only has data every 8/9 years with the OECD but it's not appropriate to fill in the gaps with the WHO data so we will drop the WHO data for Turkey. ```{python}df = df[~((df['entity'].isin(['Turkey'])) & (df['source'] =='who'))]```We'll keep the full extent of the Gapminder data where possible as this will ensure the smoothest time-series.```{python}no_dups_df = df[~df.duplicated(subset=['entity', 'year'],keep=False)]keep_gap = df[(df.duplicated(subset=['entity', 'year'], keep =False)) & (df.source =='gapminder')]df_clean = pd.concat([no_dups_df, keep_gap], ignore_index=True)```Check that there are no country-year duplicates in the final dataset and do some checks that the data doesn't have any unexpected NAs in it. ```{python}assert df_clean[(df_clean.duplicated(subset=['entity', 'year']))].shape[0] ==0assert df_clean['entity'].isna().sum() ==0assert df_clean['year'].isna().sum() ==0assert df_clean['maternal_mortality_rate'].isna().sum() ==0```Some final cleaning for the main variable we are interested in. ```{python}df_clean['maternal_mortality_rate'] = df_clean['maternal_mortality_rate'].astype(float)df_clean['maternal_mortality_rate'] =round(df_clean['maternal_mortality_rate'], 2)df_clean = df_clean[["entity", "year", "maternal_mortality_rate"]]```Add a column for each source, for comparison in grapher.```{python}who_df = who_df.rename(columns={"maternal_mortality_rate": "maternal_mortality_rate_who_2019"}).drop(columns=['source'])oecd_df = oecd_df.rename(columns={"maternal_mortality_rate": "maternal_mortality_rate_oecd"}).drop(columns=['source'])df_gap = df_gap.rename(columns={"maternal_mortality_rate": "maternal_mortality_rate_gapminder"}).drop(columns=['source'])df_clean = df_clean.rename(columns={"maternal_mortality_rate": "maternal_mortality_rate_owid"})dfs = [df_clean, who_df, oecd_df, df_gap]df_merged =reduce(lambda left,right: pd. merge(left,right,on=['entity', 'year'],how='outer'), dfs)df_merged = df_merged[~df_merged['entity'].isin(entities_to_drop)]df_merged.to_csv("data/output/maternal_mortality_rate.csv", index=False)```