Complex pollen diagram¶
This is an example of bars, stacked, lines and area plots in one single diagram.
We display pollen count data from the site Hoya del Castillo in Spain collected by Davis and Stevenson (2007) obtained from the European pollen database (EPD). We extended this data by two additional columns: summer (JJA) and winter (DJF) temperature.
We start by importing the necessary libraries. We use pandas
to read
and manage the pollen data and stratplot
for it’s visualization.
import psyplot.project as psy
import pandas as pd
from psy_strat.stratplot import stratplot
import matplotlib as mpl
Adjusting the figure size and dpi improves the readability of the plots in this notebook.
mpl.rcParams['figure.figsize'] = (16, 10)
mpl.rcParams['figure.dpi'] = 150
The data is stored as a comma-separated text file that can be loaded
into a pandas.DataFrame
using pandas pandas.read_csv
function:
df = pd.read_csv('pollen-data.csv', index_col='agebp')
print(df.shape)
df.head(5)
(34, 37)
DJF Temperature | JJA Temperature | Alnus | Anthemis-type | Artemisia | Betula | Bidens-type | Carpinus | Caryophyllaceae | Cerealia-type | ... | Plantago coronopus | Plantago major/P. media | Plantago maritima | Potamogeton | Pteridium | Quercus ilex-type | Quercus suber-type | Ruppia | Sparganium | Ulmus | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
agebp | |||||||||||||||||||||
4690 | 5.6125 | 21.1750 | NaN | NaN | 4.0 | NaN | NaN | NaN | NaN | NaN | ... | 1.0 | NaN | NaN | NaN | NaN | 11.0 | NaN | NaN | NaN | NaN |
4890 | 6.3375 | 21.8250 | 1.0 | NaN | 6.0 | NaN | NaN | NaN | NaN | NaN | ... | 7.0 | 3.0 | NaN | NaN | 2.0 | 13.0 | NaN | NaN | NaN | NaN |
5087 | 6.2750 | 21.8750 | NaN | NaN | 8.0 | NaN | NaN | NaN | NaN | NaN | ... | 8.0 | 1.0 | NaN | NaN | NaN | 21.0 | NaN | 1.0 | NaN | NaN |
5278 | 4.8750 | 20.8250 | NaN | NaN | 7.0 | NaN | NaN | NaN | NaN | NaN | ... | 2.0 | NaN | NaN | NaN | NaN | 24.0 | NaN | 3.0 | NaN | NaN |
5465 | 2.2750 | 20.3125 | 1.0 | NaN | 4.0 | NaN | NaN | NaN | NaN | NaN | ... | 1.0 | NaN | NaN | NaN | NaN | 18.0 | NaN | 1.0 | NaN | NaN |
5 rows × 37 columns
This data contains 34 samples and 37 columns. Note that we chose the
DataFrame to be indexed by the 'agebp'
column. This will be the
vertical axis of our stratigraphic diagram that is shared between all
the variables.
Now we can display this dataframe using stratplot
:
sp, groupers = stratplot(df)
You see now one plot for each column in the above dataframe. However, this figure is not very informative. The x-axis labels are hardly readable and the taxa all have different scalings. We can significantly improve this plot by grouping the variables together.
Luckily, the EPD comes with a mapping from taxon name to group names.
This mapping is stored in the tab separated file epd-groups.tsv
:
groups = pd.read_csv('epd-groups.tsv', delimiter='\t', index_col=0)
groups.head(5)
groupname | |
---|---|
varname | |
Abies | Trees and shrubs |
Abies undiff. | Trees and shrubs |
Acacia | Trees and shrubs |
Acer | Trees and shrubs |
Acer cf. A. campestre | Trees and shrubs |
Using this data, we can group the columns in our DataFrame using the following function:
def grouper(col):
if 'Temperature' in col:
return 'Temperature'
else:
return groups.groupname.loc[col]
And have a look into how this functions groups our data:
group2taxon = pd.DataFrame.from_dict(df.groupby(grouper, axis=1).groups, orient='index').T
group2taxon.fillna('')
Aquatics | Dwarf shrubs | Helophytes | Herbs | Nonpollen | Temperature | Trees and shrubs | Vascular cryptogams (Pteridophytes) | |
---|---|---|---|---|---|---|---|---|
0 | Potamogeton | Ericaceae | Sparganium | Anthemis-type | Concentration pollen | DJF Temperature | Alnus | Filicopsida |
1 | Ruppia | Artemisia | JJA Temperature | Betula | Pteridium | |||
2 | Bidens-type | Carpinus | ||||||
3 | Caryophyllaceae | Corylus | ||||||
4 | Cerealia-type | Ephedra distachya-type | ||||||
5 | Chenopodiaceae | Ephedra fragilis-type | ||||||
6 | Compositae subf. Cichorioideae | Juniperus | ||||||
7 | Cruciferae | Olea | ||||||
8 | Cyperaceae | Pinus | ||||||
9 | Filipendula | Quercus ilex-type | ||||||
10 | Gramineae | Quercus suber-type | ||||||
11 | Helianthemum-type | Ulmus | ||||||
12 | Mentha-type | |||||||
13 | Plantago coronopus | |||||||
14 | Plantago major/P. media | |||||||
15 | Plantago maritima |
For our pollen diagram, we are actually only interested in Temperature,
Herbs, Trees and shrubs. Therefore we can exclude the other groups from
our plot using the exclude parameter of stratplot
. Additionally we
transform the pollen counts into percentages to get a better scaling of
the diagram. Since Trees and shrubs and Herbs should both be
considered when calculating the percentages, we additionally put them
into a larger Pollen group.
sp, groupers = stratplot(
df, grouper,
widths={'Temperature': 0.1, 'Pollen': 0.9},
percentages=['Pollen'],
subgroups={'Pollen': ['Trees and shrubs', 'Herbs']},
exclude=['Aquatics', 'Dwarf shrubs', 'Helophytes', 'Nonpollen',
'Vascular cryptogams (Pteridophytes)'])
This diagram already looks much better, however there are still to many taxa in there that have only very little amount of data. Therefore we use the thresh parameter to set a threshold of 1%. Every taxon now that is never above 1% will not be displayed.
Additionally we apply a new order to the columns such that we put
Juniperus, Pinus and Quercus ilex-type, and then the other trees and
shrubs to the left. Note that, if you run this in the psyplot GUI, you
can change the order of the variables more easily without scripting.
However, it is also not so difficult using the reorder
method of the
grouper for the pollen variables.
sp, groupers = stratplot(
df, grouper,
thresh=1.0,
widths={'Temperature': 0.1, 'Pollen': 0.9},
percentages=['Pollen'],
subgroups={'Pollen': ['Trees and shrubs', 'Herbs']},
exclude=['Aquatics', 'Dwarf shrubs', 'Helophytes', 'Nonpollen',
'Vascular cryptogams (Pteridophytes)'])
# apply a new order where we first display Juniperus, Pinus and Quercus, and then the rest
pollen_grouper = groupers[1]
first_taxa = ['Juniperus', 'Pinus', 'Quercus ilex-type']
remaining_trees = group2taxon['Trees and shrubs'][
~group2taxon['Trees and shrubs'].isin(first_taxa)].dropna().tolist()
neworder = first_taxa + remaining_trees
pollen_grouper.reorder(neworder)
Finally, we can display the temperature columns in one single diagram because they share the same units. This is done using the all_in_one parameter. Additionally we can include a sum of the different pollen subgroups using the summed parameter.
sp, groupers = stratplot(
df, grouper,
thresh=1.0,
all_in_one=['Temperature'],
summed=['Trees and shrubs', 'Herbs'],
widths={'Temperature': 0.1, 'Pollen': 0.9},
percentages=['Pollen'],
subgroups={'Pollen': ['Trees and shrubs', 'Herbs']},
exclude=['Aquatics', 'Dwarf shrubs', 'Helophytes', 'Nonpollen',
'Vascular cryptogams (Pteridophytes)'])
# apply a new order where we first display Juniperus, Pinus and Quercus, and then the rest
pollen_grouper = groupers[1]
first_taxa = ['Juniperus', 'Pinus', 'Quercus ilex-type']
remaining_trees = group2taxon['Trees and shrubs'][
~group2taxon['Trees and shrubs'].isin(first_taxa)].dropna().tolist()
neworder = first_taxa + remaining_trees
pollen_grouper.reorder(neworder)
Last but not least, let’s talk a bit about the final layout. To better
distinguish herbs from Trees and shrubs, we can display this group using
a bar plot by making use of the use_bars parameter of stratplot
.
Furthermore we decrease the size of the groupers and increase the height
of the plot using the trunc_height
parameter.
Additionally we can use the psyplot framework to do some changes to the colors, etc..:
display trees in green
exaggerate the trees by a factor of 4
highlight low pollen occurences below 1% with a
+
change the JJA temperature curve to red
change the legendlabels for temperature
change x- and y-label for temperature
This modifications make the plot look much nicer!
sp, groupers = stratplot(
df, grouper,
thresh=1.0,
trunc_height=0.1,
use_bars=['Herbs'],
all_in_one=['Temperature'],
summed=['Trees and shrubs', 'Herbs'],
widths={'Temperature': 0.1, 'Pollen': 0.9},
percentages=['Pollen'],
calculate_percentages=True,
subgroups={'Pollen': ['Trees and shrubs', 'Herbs']},
exclude=['Aquatics', 'Dwarf shrubs', 'Helophytes', 'Nonpollen',
'Vascular cryptogams (Pteridophytes)'])
# apply a new order where we first display Juniperus, Pinus and Quercus, and then the rest
pollen_grouper = groupers[1]
first_taxa = ['Juniperus', 'Pinus', 'Quercus ilex-type']
remaining_trees = group2taxon['Trees and shrubs'][
~group2taxon['Trees and shrubs'].isin(first_taxa)].dropna().tolist()
neworder = first_taxa + remaining_trees
pollen_grouper.reorder(neworder)
# -- psyplot update
blue = '#1f77b4'
orange = '#ff7f0e'
green = '#2ca02c'
red = '#d62728'
# change the color of trees and shrubs to green
sp(group='Trees and shrubs').update(color=[green])
# exaggerate the trees with low counts by a factor of 4
sp(name=remaining_trees).update(exag='areax', exag_factor=4)
# mark small taxon occurences below 1% with a +
sp(maingroup='Pollen').update(occurences=1.0)
# change the color of JJA temperature to red, shorten legend labels and
# change the x- and y-label
sp(group='Temperature').update(color=[blue, red], legendlabels=['DJF', 'JJA'],
ylabel='Age BP [years]', xlabel='$^\circ$C',
legend={'loc': 'lower left'})
# change the color of the summed trees and shrubs to green and put the legend
# on the bottom
sp(group='Summed').update(color=[green, orange], legend={'loc': 'lower left'})
psy.close('all')
References¶
Davis, B.A. and Stevenson, A.C., 2007. The 8.2 ka event and Early–Mid Holocene forests, fires and flooding in the Central Ebro Desert, NE Spain. Quaternary Science Reviews, 26(13-14), pp.1695-1712.
Download python script: example_pollen.py
Download Jupyter notebook: example_pollen.ipynb
View the notebook in the Jupyter nbviewer
Download supplementary data: