Skip to content

Data Preparation

For future studies that will both have intensive data and workflows, this will be necessary.

import numpy as np
import pandas as pd

# Load data
bbrp_100m2 = pd.read_csv("bbrp_100m2.csv").drop(columns=['Unnamed: 0'], errors='ignore')
print(bbrp_100m2.info())
# Set proper data types
for col in bbrp_100m2.select_dtypes(include="int64").columns:
    bbrp_100m2[col] = bbrp_100m2[col].astype("uint8")
print(bbrp_100m2.info())
# Modify dataset to a more convenient format for analysis

# Separate species names list
sp_names_list = bbrp_100m2[['sp_no', 'sp_name']].copy()

# Reshape dataset and have 'quadrats_' as rows and 'sp_no' as columns
bbrp_100m2_counts = bbrp_100m2.set_index('sp_no', drop=True).drop(columns='sp_name')
bbrp_100m2_counts = bbrp_100m2_counts.T.reset_index(drop=True)
bbrp_100m2_counts.info()

Density: Mean and Variance

Case 1.1 BBRP

Case: Beni Biosphere Reserve Plot 01

Target species: Schelea princeps (sp_no = 1), Calcycophyllum spruceanum (sp_no = 4), Aracia lorentensis (sp_no = 8), ordered most abundant, commonly occurring, and rare

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Define parameters to investigate
np.random.seed(123)
sp_no_list = [1, 4, 18]
random_quadrats_n4 = np.random.choice(range(100), size=4)
random_quadrats_n8 = np.random.choice(range(100), size=8)
print(random_quadrats_n4, random_quadrats_n8)
# Calculate mean and variance for defined species

# Define results containers
results = pd.DataFrame()
results['sp_no'] = sp_no_list
mean_n4 = []
var_n4 = []
mean_n8 = []
var_n8 = []

for sp_no in sp_no_list:
    data = bbrp_100m2_counts[sp_no].values
    mean = np.mean([data[x] for x in random_quadrats_n4])
    var = np.var([data[x] for x in random_quadrats_n4])
    mean_n4.append(round(mean, 2))
    var_n4.append(round(var, 2))
    
for sp_no in sp_no_list:
    data = bbrp_100m2_counts[sp_no].values
    mean = np.mean([data[x] for x in random_quadrats_n8])
    var = np.var([data[x] for x in random_quadrats_n8])
    mean_n8.append(round(mean, 2))
    var_n8.append(round(var, 2))

results['mean_n4'] = mean_n4
results['mean_n8'] = mean_n8
results['var_n4'] = var_n4
results['var_n8'] = var_n8    

print(results)

# If printing data in dictionary format
#import json
#print("Results for n=4: \n", json.dumps(results_n4, indent=4))
#print("Results for n=8: \n", json.dumps(results_n8, indent=4))

Tchebychev's Theorem

States that for any kind or shape of data set (populations or samples), and regardless of the statistical distribution, the proportion, p, of the observations within any selected number of units, k, of their mean will be at least:

# Calculate proportion of quadrats within 2std of the mean, for both n=4, n=8
results_dict = {"sp_no": sp_no_list}

for n, quadrats in [(4, random_quadrats_n4), (8, random_quadrats_n8)]:
    prop_list = []
    interval_list = []
    
    for sp_no in sp_no_list:
        data = bbrp_100m2_counts[sp_no].values
        selected_data = [data[x] for x in quadrats]
        
        sp_std = np.std(selected_data)
        sp_mean = np.mean(selected_data)
        
        prop = 1 - (sp_std**2) / ((2 * sp_std) ** 2)
        prop_list.append(round(prop, 2))
        
        interval_list.append([round(sp_mean - sp_std, 2), round(sp_mean + sp_std, 2)])

    results_dict[f"Tchebychev's Prop, k=2std, n={n}"] = prop_list
    results_dict[f"Expected Tcheb. Conf. Interval, n={n}"] = interval_list

results = pd.DataFrame(results_dict)
print(results)

For a small number of quadrats, meaningful results can only be obtained from species that are abundant across. This can be seen with Schelea princeps while the other 2 species are mostly non-existent with N=10.

Even with unbiased estimates or sampling, uncertainty still exists especially for small sample size or quadrats and when estimating metrics for rare species. This point is always worth reiterating (a fundamental principle in ecological sampling and analysis, if I may) as making assumptions about (rare) species distribution to include in models affect results and considering assumptions on distribution in field planning affects field work methodology and parameters.

  • i.e. Observing rarity needs randomly significant sample size and to include rarity (in models) needs (theoretical) distribution and not just (observed) instances.

Case 1.2

Heyer, 1974 study: Sampling of anuran larvae with 3 different sweeps. The 3 sweep types sampled different microhabitats within one pond and further relate to habitat partitioning.

Sampling method: 25-m sweeps sampling done once a week from 09 March to 29 June, 1974 (total of 16 weeks) for each sample orsweep