In this vignette, we walk through data preparation, variogram analysis, and maximum likelihood estimation.

Data Preparation

The first step is to import your MoveBank csv file:

yourAnimals <- as.telemetry(file="yourAnimalsMoveBank.csv")

For most species, the default two-point equidistant projection will be fine, however you can provide any PROJ.4 formatted projection with the proj argument (see help(as.telemetry)). A single fixed projection should be used if you are going to plot groups of individuals that span multiple MoveBank files.

The output of as.telemetry will be an individual telemetry object or list of telemetry objects, depending on how many individual animals are in your csv file. telemetry objects contain the components t for time in seconds, and x and y for the projected locations in meters. Times in your MoveBank file should be sorted and non-repeating, though we will handle those cases when we introduce telemetry error modeling in a future update.

Our example buffalo data is already prepared into a list of telemetry objects. Let us look at the first buffalo and then every buffalo:

library(ctmm)
data(buffalo)
billy <- buffalo[[1]]
plot.telemetry(billy)
plot.telemetry(buffalo,col.data=rainbow(length(buffalo),alpha=0.5))

Looking at the raw movement tracks is a good way to pick out any obvious migratory behaviors. In the future, we will have migration models to select, but for now all of our models are range resident and so only those portions of the data should be selected. These buffalo all look fairly range resident, and so we can move on to variograms.

Variograms

Variograms are an unbiased way to visualize autocorrelation structure when migration, range shifting, drift, or other translations of the mean location are not happening. When drift occurs in the data, then the variogram represents a mixture of both the drift and the autocorrelation structure, each of which contains distinct movement behaviors.

These buffalo do not appear to be significantly drifting, so let us take a look at the first buffalo’s variogram:

var <- variogram(billy)
plot.variogram(var,fraction=0.005)
plot.variogram(var,fraction=0.65,alpha=0.5)

The first plot is zoomed in to the short lag behavior, while the second plot is zoomed out. You can do this on the fly with variogram.zoom in R-studio. The variogram represents the average square distance traveled (vertical axis) within some time lag (horizontal axis).

For the long range behavior we can see that the variogram flattens (asymptotes) at approximately 20 days. This is, roughly, how coarse you would need to make the timeseries so that methods assuming independence (no autocorrelation) could be valid. This includes, conventional kernel density estimation (KDE), minimum convex polygon (MCP), conventional species distribution modeling (SDM), and a host of other analyses.

The asymptote of our variogram is around \(24~\mathrm{km}^2\), and the fact that it takes some time lag, \(\tau \sim 20~\mathrm{days}\), for the variogram to asymptote is indicative of the fact that the buffalo’s location appears continuous at this timescale. This is also, roughly, the time it takes for the buffalo to cross its home range. We can guesstimate some continuous-time models for this behavior with the commands

m0 <- ctmm(sigma=23*1000^2) # 23 km^2 in m^2
m1 <- ctmm(sigma=23*1000^2,tau=6*24*60^2) # and 6 days in seconds
plot.variogram(var,model=m0,fraction=0.65,alpha=0.5)
plot.variogram(var,model=m1,fraction=0.65,alpha=0.5)

where for both the models m0 and m1, sigma is the asymptotic variance. In model m1, tau is a single timescale that governs the autocorrelation in position and dictates the animal’s home-range crossing time. The null model m0 has no autocorrelation. Notice that all units are in meters and seconds. Better though, there is a convenient R-studio function variogram.fit that gives you sliders to choose the most visually appropriate parameters and save them to the variable global.variogram.fit.

The uncorrelated model m0 is obviously incorrect and in the zoomed in plot we can also see that the model m1 is incorrectly linear at short lags, whereas the empirical variogram actually curves up for an hour or two before it becomes linear. Let us introduce a model that incorporates this behavior.

m2 <- ctmm(sigma=23*1000^2,tau=c(1*60^2,6*24*60^2)) # and 1 hour in seconds
plot.variogram(var,model=m1,fraction=0.002)
plot.variogram(var,model=m2,fraction=0.002)

The confidence intervals at short lags are also very narrow, though both of these models look the same at coarser scales and so the discrepancy is only revealed by high resolution data.

plot.variogram(var,model=m1,fraction=0.65,alpha=0.5)
plot.variogram(var,model=m2,fraction=0.65,alpha=0.5)

The model m2 introduces an additional autocorrelation timescale for the animal’s velocity, so that it more closely matches the initial behavior of the variogram. The initial curve upwards tells us that there is continuity in the animal’s velocity at this timescale. Conventional Markovian animal movement models do not capture this, which leads to the same kind of bias and underestimation of confidence intervals as when ignoring autocorrelation entirely.

The linear regime of the variogram (regular diffusion) is just as important as the asymptotic regime. In the linear regime it is reasonable to assume a Markovian model as with step selection functions (SSF) and Brownian bridges (BB). Therefore, the variogram has informed us as to how much we need to coarsen our data for it to be appropriate in many common analyses that neglect various aspects of movement.

Variogram Error is Autocorrelated

It is important to note that variogram errors—the difference between the empirical variogram and the true semi-variance function—are themselves autocorrelated. Therefore, the smooth wiggling around that the point estimate does within the confidence bands is not necessarily meaningful. To demonstrate this, let us simulate some data from the model m2 and look at its empirical variogram.

# simulate fake buffalo with the same sampling schedule
willy <- simulate(m2,billy$t)
plot.telemetry(willy)
# now calculate and plot its variogram
var2 <- variogram(willy)
plot.variogram(var2,model=m2,fraction=0.65,alpha=0.5)

Non-Stationarity in Stationary Variograms

Non-stationary behaviors, like a seasonal change in variance, is averaged over in the variogram. Moreover, if we fit a stationary model to non-stationary data, we are estimating an average effect.

More Models

Telemetry Error

The GPS buffalo example do not exhibit telemetry errors that are significant enough to notice in our variograms. If we were working with ARGOS data or the high-resolution and unfiltered GPS data of a small animal, then we get a “nugget” effect that looks like an initial discontinuity at short time lags.

# ARGOS type errors
curve(1+x,0,5,xlab="Short time lag",ylab="Semi-variance",ylim=c(0,6))
points(c(0,0),c(0,1))
# detector array type errors (qualitatively only)
curve((1-exp(-x))/(1-exp(-1/2)),0,1/2,xlab="Short time lag",ylab="Semi-variance",ylim=c(0,6),xlim=c(0,5))
curve(1/2+x,1/2,5,xlab="Short time lag",ylab="Semi-variance",ylim=c(0,6),add=TRUE,xlim=c(0,5))
points(1/2,1)

The height of this initial discontinuity corresponds to the variance of uncorrelated location errors. You will soon be able to incorporate these kinds of errors into ctmm analysis.

The second plot is a depiction of the kind of initial discontinuity one has with detector array data. The end of the (slope) discontinuity is highlighted with a circle. This discontinuity is smooth because the movement and detection are correlated. The height of this initial discontinuity is also (at least roughly) the variance of the location errors.

Cycles and Periodicities

One-Off Migrations

Repeated Migrations

Model Fitting

Now let us fit each of our proposed models m0, m1, m2, store the corresponding best-fit result in M0, M1, M2, and then compare some of their outputs.

M0 <- ctmm.fit(billy,m0)
summary(M0)
##                               low      ML     high
## area (square kilometers) 432.9209 432.923 432.9251
M1 <- ctmm.fit(billy,m1)
summary(M1)
##                                 low        ML      high
## tau[1] (days)              5.669665  12.98878  29.75632
## area (square kilometers) 205.190097 395.33232 646.78113
M2 <- ctmm.fit(billy,m2)
summary(M2)
##                                 low         ML      high
## tau[1] (minutes)          43.746618  46.732026  49.92117
## tau[2] (days)              3.511083   5.940428  10.05066
## area (square kilometers) 278.977632 437.901896 632.07845
## speed (kilometers/day)    10.893432  11.075483  11.26058

Notice how tiny the (Gaussian) area uncertainty is in model M0. Let us look into some details of the models.

TAB <- rbind( c(M0$AICc,M0$DOF.mu) , c(M1$AICc,M1$DOF.mu) , c(M2$AICc,M2$DOF.mu) )
colnames(TAB) <- c("AICc","DOF(mu)")
rownames(TAB) <- c("M0","M1","M2")
TAB
##        AICc     DOF(mu)
## M0 139075.6 3528.000000
## M1 103322.6    6.648594
## M2 101864.3   13.282184

AICc is the (linearly) corrected Akaike information criteria. AIC balances likelihood against model complexity in a way that is good if we want to make optimal predictions. A lower AIC is better. Getting the AIC to go down by 5 is great, while getting the AIC to go down by 10 is awesome. Our AIC is going down by thousands.

The fit parameter DOF.mu is the number of degrees of freedom worth of data we have to estimate the mean parameter mu, assuming that the model is correct. Notice that the uncorrelated model M0 perceives thousands of independent data points, while the autocorrelated models M1 and M2 only see a handful of independent data points. This is why the uncorrelated model produced tiny confidence intervals on the predicted (Gaussian) area.

Back to the Variogram

Now its time to make sure that our selected model is explaining the most significant features of the animal’s movement. Let us plot our variogram again with our fit models

plot.variogram(var,model=list(M0,M1,M2),col.model=c("black","purple","blue"),fraction=0.65,alpha=0.5)
plot.variogram(var,model=list(M0,M1,M2),col.model=c("black","purple","blue"),fraction=0.002)

Notice that the purple model M1 is significantly biased downward and is underestimating diffusion. This is because the continuous-velocity behavior at short time lags, which M1 does not account for, is throwing off the estimate. M0 is ignoring autocorrelation completely, while M1 is ignoring autocorrelation in the buffalo’s velocity.