Minard’s chart using Matplotlib in Python

Amit Amola
8 min readMay 15, 2023

--

By Charles Minard (1781–1870) — see upload log, Public Domain, https://commons.wikimedia.org/w/index.php?curid=297925
Original Minard’s Chart

Note- In case you are here just to get to the code, here you go — Minard Chart GitHub.

Minard’s chart of Napoleon’s Russian campaign is a historic data visualization created by Charles Joseph Minard, a French civil engineer in the 19th century. It depicts the disastrous campaign of Napoleon Bonaparte in 1812, showcasing the size of his army as it advanced into Russia and suffered massive losses during the retreat.

The significance of Minard’s chart lies in its pioneering approach to data visualization. It effectively combines geography, time, and statistical elements to tell a compelling story and communicate complex information with remarkable clarity. It is regarded as one of the most influential examples of data visualization in history.

Inspired by Minard’s chart, many artists and data visualization enthusiasts have attempted to recreate it or draw inspiration from its design principles. These recreations often pay homage to the original chart while incorporating modern aesthetics and techniques to enhance the visual representation. Michael Friendly in his website has collected various renditions of such attempts on Minard’s chart that we can see some of them below:

Minard’s chart created in different way (source)

As part of my Data Visualization module in my masters at Trinity College Dublin, we were required to replicate Minard’s chart in any manner that effectively conveys all the information contained in the original chart. I chose to utilize Matplotlib, a Python library that I am well acquainted with, to accomplish this task. The purpose of this article is to showcase the versatility and extensive customization options offered by Matplotlib in Python, highlighting how it enables seamless modifications and unrestricted creativity.

Not wasting more time on reading, let’s go right into the coding part…

The dataset

The first thing we do is to get the data that we are going to need to recreate this chart. It’s available at various platforms and we are going to use the one available under RPubs(link to dataset). The dataset is clearly not a plain and simple CSV file. In fact it has a peculiar structure. Let’s check it out:

This is just part of the data, there are more rows available. As we can see it’s in an uneven form. It essentially consists of 3 separate tables:

  • Columns 1–3 are longitud, latitude and names of cities
  • Columns 4–8: longitude, temperature and dates (during the march home only)
  • Columns 9–14: longitude, latitude, number of survivors, direction of travel (A=towards the attack/R=return journey) and division of army

So first thing we will do is to divide this data into three parts and we are going to use Pandas for this. We are also going to make use of Geopandas library to add rivers and make use of longitude and latitude info later.

import pandas as pd
import geopandas

import numpy as np

data = pd.read_csv('minard-data.csv')

### Cities data
cities = data[["LONC", "LATC", "CITY"]].copy()
cities = cities.dropna()
cities.columns = ['Longitude', 'Latitude', 'City']

### Temperature
temperatures = data[["LONT", "TEMP", "DAYS", "MON", "DAY"]].copy()
temperatures = temperatures[temperatures['TEMP'].notna()]

### Troops
troops = data[["LONP", "LATP", "SURV" , "DIR", "DIV"]].copy()
troops = troops.dropna()
troops_attack = troops[troops.DIR=='A']
troops_retreat = troops[troops.DIR=='R']

### Geopandas for loading cities from longitude and latitudes
gdf = geopandas.GeoDataFrame(cities[['City', 'Latitude', 'Longitude']], geometry=geopandas.points_from_xy(cities.Longitude, cities.Latitude))

Now we will load all the various methods and functions from Matplotlib that we have to use. We are also going to set a different font that usual for our chart as it enhances the final look of the chart and we do it using font_manager. The font is available in the GitHub repo and you can use your own as well.

import matplotlib.pyplot as plt
%matplotlib inline

from matplotlib.collections import LineCollection
from matplotlib.patches import ConnectionPatch

from matplotlib import font_manager

font_path = 'BentonSans Book.otf' # Your font path goes here
font_manager.fontManager.addfont(font_path)
prop = font_manager.FontProperties(fname=font_path)

plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.sans-serif'] = prop.get_name()

Now that we have all our required stuff, let’s create our chat. First thing to notice about Minard chart is that it’s combination of two charts- the above part containing the army details as lines and the bottom half that has the temperature information. And there’s one piece of code that is common in both of the charts and those are the vertical lines connecting the temperature plot to the main plot. So what we need we will do is to find point of intersection of temperature lines and the troops retreat lines. Here’s the piece of code that does this for us; we will be using it later.

def line_intersection(line1, line2):
xdiff = (line1[0][0] - line1[1][0], line2[0][0] - line2[1][0])
ydiff = (line1[0][1] - line1[1][1], line2[0][1] - line2[1][1])

def det(a, b):
return a[0] * b[1] - a[1] * b[0]

div = det(xdiff, ydiff)
if div == 0:
raise Exception('lines do not intersect')

d = (det(*line1), det(*line2))
x = det(d, xdiff) / div
y = det(d, ydiff) / div
return x, y

intersection_points = []
for vals in temperatures.iterrows():
long = vals[1]['LONT']
temp = vals[1]['TEMP']

min_ind = np.argmin(np.abs(troops_retreat['LONP'].values-long))

if min_ind==0:
min_ind2 = 1

elif long>troops_retreat['LONP'].values[min_ind]:
min_ind2 = min_ind-1

else:
min_ind2 = min_ind+1

## Points on troop retreat segment
pointA = troops_retreat.iloc[min_ind,0:2].values
pointB = troops_retreat.iloc[min_ind2,0:2].values

#Points on temperature line
pointC = (long, temp)
pointD = (long, temp+100)
intersection_points.append(line_intersection((pointA, pointB), (pointC, pointD)))

As I said before, this chart is combination of two, thus we are going to make use of subplots. The first subplot is going to be the line plot which we will make use LineCollection under Matplotlib. The idea is simply to create multiple lines as one collection which has same attributes i.e., colour, design and everything, except the width. As the armies moved ahead, there were lot of deaths, thus each division saw a decrement in their numbers. This can be visualized and achived using LineCollection. Moreover, we are going to make use of pastel colours of different tints. Attacking division is mix of greens while retreating one is red. Let’s start:

import matplotlib.patches as mpatches

fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(25,12), gridspec_kw={'height_ratios': [5, 1]})

rivers = geopandas.read_file("rivers_europe.geojson")
rivers.plot(ax=ax1, alpha=0.1)

attack_colours = ['#97BDB6','#92CDB6','#99DCAD']
retreat_colours = ['#E87563','#CD647F','#A15F8E']

We are going to add rivers in our chart using geopandas rivers_europe file. Morover, we are using custom colours for our line plot. We also defined two different axes- one for line plot above and other for the temperature one. Next for each division, we make collections of line with different different widths, which are basically scaled down values of original troops count by 6000 (this was measured by trial and error).

for div in (1,2,3):
attack_data = troops_attack[troops_attack.DIV==div]
retreat_data = troops_retreat[troops_retreat.DIV==div]
attack_data = pd.concat([attack_data, retreat_data.iloc[0:1,:]])

#For attack info
lwidths = attack_data.SURV/6000
points = np.array([attack_data.LONP, attack_data.LATP]).T.reshape(-1, 1, 2)
attack_segments = np.concatenate([points[:-1], points[1:]], axis=1)
lc = LineCollection(attack_segments, linewidth= lwidths, color=attack_colours[div-1], capstyle='round', zorder=2, label=f"Div_Attack {div}")
ax1.add_collection(lc)

#For retreating info
lwidths = retreat_data.SURV/6000
points = np.array([retreat_data.LONP, retreat_data.LATP]).T.reshape(-1, 1, 2)
retreat_segments = np.concatenate([points[:-1], points[1:]], axis=1)
lc = LineCollection(retreat_segments, linewidth= lwidths, color=retreat_colours[div-1], capstyle='round', zorder=1, label=f"Div_Retreat {div}")
ax1.add_collection(lc)

for x, y, label in zip(gdf.geometry.x, gdf.geometry.y, gdf.City):
ax1.annotate(label, xy=(x, y), xytext=(0, 0), textcoords="offset points", zorder=3, fontsize=11, weight='bold')

ax1.set_xlim(23,38.2)
ax1.set_ylim(53.3,57)
ax1.axis('off')

ax1.annotate(r"$\bf{Napolean's }$"+" Russian Campaign", (23.7, 56.5),
color='#475768', fontsize=60)
ax1.annotate("Figurative map of successive losses in men of the French army in the Russian campaign 1812",
(23.75, 56.2), color='#475768', fontsize=20, weight='bold')

The second for loop above is to annotate the name of the cities, which we again do using geopandas’ gdf dataframe we had created earlier. And the last two annotate methods are used to add the title and subtitle to the chart. One interesting part to notice is that I am adding the first value in the retreat table to the end of the attack table at Line 4 above, so that when you make the line plot, there is no gap left in between the two and they join together. This allows you to make a continuous line plot. Below you can see the effect of performing the action on Line 4:

Without(left) and with(right) concatination

Next step is to create the Temperature subplot. We are basically creating horizontal lines here which is basic stuff. The interesting part is Line 4 again. Here you see ConnectionPatch being used. ConnectionPatch is what allows you to make lines that join two different points on two different axes. Took me a while to find this method(a classmate’s suggestion helped).

for i, vals in enumerate(temperatures.iterrows()):
xy_temp = tuple(vals[1][['LONT', 'TEMP']].values)
xy_retreat = intersection_points[i]
con = ConnectionPatch(xyA=xy_temp, coordsA=ax2.transData,
xyB=xy_retreat, coordsB=ax1.transData,
arrowstyle="-", shrinkB=5, alpha=0.3)
fig.add_artist(con)

long, temp, mon, day = vals[1][['LONT','TEMP', 'MON', 'DAY']].values
if pd.isna(mon):
lab = f"{int(temp)}°"
loc = (long-0.1,temp-3)
else:
lab = f"{int(temp)}° {mon} {int(day)}"
loc = (long-0.3,temp-3)

ax2.annotate(lab, (long, temp), xytext=loc)

for val in (0, -10, -20, -30):
ax2.hlines(val, long, 38, alpha=0.2, color='black')

ax2.set_xlim(23,38.2)
ax2.set_ylim(-30,0)
ax2.yaxis.tick_right()
for tick in ax2.yaxis.get_majorticklabels():
tick.set_horizontalalignment("right")

ax2.yaxis.set_label_position("right")
ax2.set_ylabel("°R", rotation=0, fontsize=13)
ax2.yaxis.set_tick_params(labelsize=11)
ax2.yaxis.set_label_coords(0.997, 1.2)
ax2.set_xticks([])
ax2.annotate("GRAPHIC TABLE of the temperature in degrees of Reaumur thermometer", (25.4, -5.3),
bbox= dict(facecolor='#EAC260', alpha=0.3, boxstyle='round'), color='black',
size='x-large', weight='bold')

ax2.plot(temperatures.LONT, temperatures.TEMP, c='#EAC260', lw=3, linestyle=':')
ax2.tick_params(axis='both', which='both',length=0)

Moreover the if and else part inside the first loop is for the case where for one entry of temperature we are just given the value of -11 degrees and no dates etc. In fact the original chart was same, which is kind of peculiar but in any case, our code is able to handle that. We end our plot with the usual legend:

legend_dict = {'Attack_Div 1':attack_colours[0], 'Attack_Div 2':attack_colours[1], 'Attack_Div 3':attack_colours[2],
'Retreat_Div 1':retreat_colours[0], 'Retreat_Div 2':retreat_colours[1], 'Retreat_Div 3':retreat_colours[2]}

patchList = []
for key in legend_dict:
data_key = mpatches.Patch(color=legend_dict[key], label=key)
patchList.append(data_key)

ax1.legend(handles=patchList, ncol=2, loc='best', frameon=False, fontsize=12)
#X............................................................................X

#Fixing spaces between the two subplots
plt.subplots_adjust(wspace=0, hspace=0)
plt.box(False)
plt.show()

What we get from all this code is a high quality Minard Chart that is almost a replica of the original chart:

But this was a bit faded. If only we could increase the contrast and decrease brightness a bit? Well, guess what, we can. You can use various tools to perform this next operation, but I just used plain old Microsoft Word for this. All you need to do is import this image on MS Word and then under Picture Format-> Corrections-> Picture Corrections Options, make a custom change of Brightness to -19% and increase Contrast to 50%. And what you achieve by doing that is as below:

Now isn’t this lovely? The only part that I couldn’t correct was the matching of first and second division’s retreat matching. This wasn’t possible as the dataset itself had some issue in it, but well, we can’t make changes to the original dataset. In any case, we got a beautiful modern rendition of Minard’s Chart which is indeed a fascinating chart to create. Here’s the Jupyter Notebook over on my GitHub with all the code.

This post was to help everyone learn different things that one can do using Matplotlib and it’s capabilities which you don’t get that easily using Tableau or Power BI types of tools. Hope you all liked the read. I’ll be posting more of such stuff I learned in my masters on Medium in coming days. Do follow if you like such content.

--

--

Amit Amola

An iota among complex ones, a line among conics and small multi-cellular within this vast universe — just trying to grow!