Complex pollen diagram¶

This is an example of bars, stacked, lines and area plots in one single diagram.

We display pollen count data from the site Hoya del Castillo in Spain collected by Davis and Stevenson (2007) obtained from the European pollen database (EPD). We extended this data by two additional columns: summer (JJA) and winter (DJF) temperature.

We start by importing the necessary libraries. We use pandas to read and manage the pollen data and stratplot for it’s visualization.

import psyplot.project as psy
import pandas as pd
from psy_strat.stratplot import stratplot
import matplotlib as mpl

Adjusting the figure size and dpi improves the readability of the plots in this notebook.

    mpl.rcParams['figure.figsize'] = (16, 10)
mpl.rcParams['figure.dpi'] = 150

The data is stored as a comma-separated text file that can be loaded into a pandas.DataFrame using pandas pandas.read_csv function:

df = pd.read_csv('pollen-data.csv', index_col='agebp')
print(df.shape)
df.head(5)

(34, 37)

	DJF Temperature	JJA Temperature	Alnus	Anthemis-type	Artemisia	Betula	Bidens-type	Carpinus	Caryophyllaceae	Cerealia-type	...	Plantago coronopus	Plantago major/P. media	Plantago maritima	Potamogeton	Pteridium	Quercus ilex-type	Quercus suber-type	Ruppia	Sparganium	Ulmus
agebp
4690	5.6125	21.1750	NaN	NaN	4.0	NaN	NaN	NaN	NaN	NaN	...	1.0	NaN	NaN	NaN	NaN	11.0	NaN	NaN	NaN	NaN
4890	6.3375	21.8250	1.0	NaN	6.0	NaN	NaN	NaN	NaN	NaN	...	7.0	3.0	NaN	NaN	2.0	13.0	NaN	NaN	NaN	NaN
5087	6.2750	21.8750	NaN	NaN	8.0	NaN	NaN	NaN	NaN	NaN	...	8.0	1.0	NaN	NaN	NaN	21.0	NaN	1.0	NaN	NaN
5278	4.8750	20.8250	NaN	NaN	7.0	NaN	NaN	NaN	NaN	NaN	...	2.0	NaN	NaN	NaN	NaN	24.0	NaN	3.0	NaN	NaN
5465	2.2750	20.3125	1.0	NaN	4.0	NaN	NaN	NaN	NaN	NaN	...	1.0	NaN	NaN	NaN	NaN	18.0	NaN	1.0	NaN	NaN

5 rows × 37 columns

This data contains 34 samples and 37 columns. Note that we chose the DataFrame to be indexed by the 'agebp' column. This will be the vertical axis of our stratigraphic diagram that is shared between all the variables.

Now we can display this dataframe using stratplot:

sp, groupers = stratplot(df)

You see now one plot for each column in the above dataframe. However, this figure is not very informative. The x-axis labels are hardly readable and the taxa all have different scalings. We can significantly improve this plot by grouping the variables together.

Luckily, the EPD comes with a mapping from taxon name to group names. This mapping is stored in the tab separated file epd-groups.tsv:

groups = pd.read_csv('epd-groups.tsv', delimiter='\t', index_col=0)
groups.head(5)

	groupname
varname
Abies	Trees and shrubs
Abies undiff.	Trees and shrubs
Acacia	Trees and shrubs
Acer	Trees and shrubs
Acer cf. A. campestre	Trees and shrubs

Using this data, we can group the columns in our DataFrame using the following function:

def grouper(col):
    if 'Temperature' in col:
        return 'Temperature'
    else:
        return groups.groupname.loc[col]

And have a look into how this functions groups our data:

group2taxon = pd.DataFrame.from_dict(df.groupby(grouper, axis=1).groups, orient='index').T
group2taxon.fillna('')

	Aquatics	Dwarf shrubs	Helophytes	Herbs	Nonpollen	Temperature	Trees and shrubs	Vascular cryptogams (Pteridophytes)
0	Potamogeton	Ericaceae	Sparganium	Anthemis-type	Concentration pollen	DJF Temperature	Alnus	Filicopsida
1	Ruppia			Artemisia		JJA Temperature	Betula	Pteridium
2				Bidens-type			Carpinus
3				Caryophyllaceae			Corylus
4				Cerealia-type			Ephedra distachya-type
5				Chenopodiaceae			Ephedra fragilis-type
6				Compositae subf. Cichorioideae			Juniperus
7				Cruciferae			Olea
8				Cyperaceae			Pinus
9				Filipendula			Quercus ilex-type
10				Gramineae			Quercus suber-type
11				Helianthemum-type			Ulmus
12				Mentha-type
13				Plantago coronopus
14				Plantago major/P. media
15				Plantago maritima

For our pollen diagram, we are actually only interested in Temperature, Herbs, Trees and shrubs. Therefore we can exclude the other groups from our plot using the exclude parameter of stratplot. Additionally we transform the pollen counts into percentages to get a better scaling of the diagram. Since Trees and shrubs and Herbs should both be considered when calculating the percentages, we additionally put them into a larger Pollen group.

sp, groupers = stratplot(
    df, grouper,
    widths={'Temperature': 0.1, 'Pollen': 0.9},
    percentages=['Pollen'],
    subgroups={'Pollen': ['Trees and shrubs', 'Herbs']},
    exclude=['Aquatics', 'Dwarf shrubs', 'Helophytes', 'Nonpollen',
             'Vascular cryptogams (Pteridophytes)'])

This diagram already looks much better, however there are still to many taxa in there that have only very little amount of data. Therefore we use the thresh parameter to set a threshold of 1%. Every taxon now that is never above 1% will not be displayed.

Additionally we apply a new order to the columns such that we put Juniperus, Pinus and Quercus ilex-type, and then the other trees and shrubs to the left. Note that, if you run this in the psyplot GUI, you can change the order of the variables more easily without scripting. However, it is also not so difficult using the reorder method of the grouper for the pollen variables.

sp, groupers = stratplot(
    df, grouper,
    thresh=1.0,
    widths={'Temperature': 0.1, 'Pollen': 0.9},
    percentages=['Pollen'],
    subgroups={'Pollen': ['Trees and shrubs', 'Herbs']},
    exclude=['Aquatics', 'Dwarf shrubs', 'Helophytes', 'Nonpollen',
             'Vascular cryptogams (Pteridophytes)'])

# apply a new order where we first display Juniperus, Pinus and Quercus, and then the rest
pollen_grouper = groupers[1]
first_taxa = ['Juniperus', 'Pinus', 'Quercus ilex-type']
remaining_trees = group2taxon['Trees and shrubs'][
    ~group2taxon['Trees and shrubs'].isin(first_taxa)].dropna().tolist()

neworder = first_taxa + remaining_trees
pollen_grouper.reorder(neworder)

Finally, we can display the temperature columns in one single diagram because they share the same units. This is done using the all_in_one parameter. Additionally we can include a sum of the different pollen subgroups using the summed parameter.

sp, groupers = stratplot(
    df, grouper,
    thresh=1.0,
    all_in_one=['Temperature'],
    summed=['Trees and shrubs', 'Herbs'],
    widths={'Temperature': 0.1, 'Pollen': 0.9},
    percentages=['Pollen'],
    subgroups={'Pollen': ['Trees and shrubs', 'Herbs']},
    exclude=['Aquatics', 'Dwarf shrubs', 'Helophytes', 'Nonpollen',
             'Vascular cryptogams (Pteridophytes)'])

# apply a new order where we first display Juniperus, Pinus and Quercus, and then the rest
pollen_grouper = groupers[1]
first_taxa = ['Juniperus', 'Pinus', 'Quercus ilex-type']
remaining_trees = group2taxon['Trees and shrubs'][
    ~group2taxon['Trees and shrubs'].isin(first_taxa)].dropna().tolist()

neworder = first_taxa + remaining_trees
pollen_grouper.reorder(neworder)

Last but not least, let’s talk a bit about the final layout. To better distinguish herbs from Trees and shrubs, we can display this group using a bar plot by making use of the use_bars parameter of stratplot. Furthermore we decrease the size of the groupers and increase the height of the plot using the trunc_height parameter.

Additionally we can use the psyplot framework to do some changes to the colors, etc..:

display trees in green
exaggerate the trees by a factor of 4
highlight low pollen occurences below 1% with a +
change the JJA temperature curve to red
change the legendlabels for temperature
change x- and y-label for temperature

This modifications make the plot look much nicer!

sp, groupers = stratplot(
    df, grouper,
    thresh=1.0,
    trunc_height=0.1,
    use_bars=['Herbs'],
    all_in_one=['Temperature'],
    summed=['Trees and shrubs', 'Herbs'],
    widths={'Temperature': 0.1, 'Pollen': 0.9},
    percentages=['Pollen'],
    calculate_percentages=True,
    subgroups={'Pollen': ['Trees and shrubs', 'Herbs']},
    exclude=['Aquatics', 'Dwarf shrubs', 'Helophytes', 'Nonpollen',
             'Vascular cryptogams (Pteridophytes)'])

# apply a new order where we first display Juniperus, Pinus and Quercus, and then the rest
pollen_grouper = groupers[1]
first_taxa = ['Juniperus', 'Pinus', 'Quercus ilex-type']
remaining_trees = group2taxon['Trees and shrubs'][
    ~group2taxon['Trees and shrubs'].isin(first_taxa)].dropna().tolist()

neworder = first_taxa + remaining_trees
pollen_grouper.reorder(neworder)


# -- psyplot update
blue = '#1f77b4'
orange = '#ff7f0e'
green = '#2ca02c'
red = '#d62728'

# change the color of trees and shrubs to green
sp(group='Trees and shrubs').update(color=[green])

# exaggerate the trees with low counts by a factor of 4
sp(name=remaining_trees).update(exag='areax', exag_factor=4)

# mark small taxon occurences below 1% with a +
sp(maingroup='Pollen').update(occurences=1.0)

# change the color of JJA temperature to red, shorten legend labels and
# change the x- and y-label
sp(group='Temperature').update(color=[blue, red], legendlabels=['DJF', 'JJA'],
                               ylabel='Age BP [years]', xlabel='$^\circ$C',
                               legend={'loc': 'lower left'})

# change the color of the summed trees and shrubs to green and put the legend
# on the bottom
sp(group='Summed').update(color=[green, orange], legend={'loc': 'lower left'})

psy.close('all')

References¶

Davis, B.A. and Stevenson, A.C., 2007. The 8.2 ka event and Early–Mid Holocene forests, fires and flooding in the Central Ebro Desert, NE Spain. Quaternary Science Reviews, 26(13-14), pp.1695-1712.

Download python script: example_pollen.py

Download Jupyter notebook: example_pollen.ipynb

View the notebook in the Jupyter nbviewer

Download supplementary data: