This vignette shows how the stopdetection package can be used to segment timestamped trajectories into sequences of stops and tracks (alternatively known as stay points and trajectories).
Living beings tend to have movement behavior that intersperses time spent in motion with time spent at rest. Often it’s interesting to distinguish between the the two states of “moving” and “stopped,” because we want to investigate only one or the other, we want to aggregate over the state, or because we want to calculate some statistics for these states independently.
Humans tend to be quite good at subjectively distinguishing between states. You can probably list the different places you’ve been today without much difficulty, for example, because you are able to combine the aspects of space, time and motive. Raw location data, on the other hand, must be in some way analyzed in order to differentiate between these states. This package uses a very simple algorithm to cluster trajectories.
We can think of a stop as a place where our location didn’t change very much for a certain length of time. How much and how long depend very much on the goals of the researcher. While the same concepts apply across varied disciplines, this vignette is written from the context of human mobility studies on transportation. If a person is at home for many hours, his or her location may differ slightly from hour to hour, but their subjective place, home, has not changed. Because an object’s location may change while the stop of interest remains the same, we uncover the first necessary parameter in our algorithm, θD. This reflects the distance (in meters) that someone may range from a stop point before they are considered to be no longer at that stop.
The smaller θD is, the more closely locations must be grouped together to be called a stop. Below is an example of stop clustering under a θD of 100 (red), 200 (blue) or 500 (green) meters.
Now consider person who is walking from their home to the grocery store. They are in the “moving” state, but come to a stop light where they must wait to cross the street. During this time, their location does not change, or changes very little, but in our context, we would not consider this person to be at a “stop.” This demonstrates the need for the second parameter in the algorithm, θT. This we use to place a lower limit on the length of time that someone must stay within θD meters of an initiating location before it will be considered a stop.
Timestamped trajectories of latitude and longitude coordinates can come from many sources, but the stop detection algorithm in this package has been built to function best on streams of location data coming from individual persons.
The following data set contains daily movement behavior for one person over a span of two weeks time. It contains WGS84 latitude and longitude coordinates collected every few minutes.
Importantly, this package works with the library
data.table
. This has been done in order to improve
performance when working with large data sets, or when calling the
functions in this package repeatedly, as one might within a simulation
study.
It’s easy to go from a data.frame object to a data.table object using .
library(data.table)
setDT(loc_data_2019)
loc_data_2019
#> latitude longitude timestamp
#> <num> <num> <POSc>
#> 1: 52.07211 5.123721 2019-11-01 00:02:46
#> 2: 52.07211 5.123721 2019-11-01 00:04:59
#> 3: 52.07211 5.123721 2019-11-01 00:08:02
#> 4: 52.07211 5.123721 2019-11-01 00:10:15
#> 5: 52.07211 5.123721 2019-11-01 00:13:07
#> ---
#> 21907: 52.07212 5.123699 2019-11-14 23:48:23
#> 21908: 52.07212 5.123699 2019-11-14 23:50:28
#> 21909: 52.07212 5.123699 2019-11-14 23:53:23
#> 21910: 52.07212 5.123699 2019-11-14 23:56:23
#> 21911: 52.07212 5.123699 2019-11-14 23:59:23
Initial stop clusters can be identified using . This requires named parameters thetaD and thetaT, as described above. The data.table supplied as the first argument will be modified by reference, and columns will be added for the state and state_id.
stopFinder(loc_data_2019, thetaD = 200, thetaT = 300)[]
#> latitude longitude timestamp stop_initiation_idx timedif
#> <num> <num> <POSc> <int> <num>
#> 1: 52.07211 5.123721 2019-11-01 00:02:46 1 66.5265
#> 2: 52.07211 5.123721 2019-11-01 00:04:59 1 158.2015
#> 3: 52.07211 5.123721 2019-11-01 00:08:02 1 157.9520
#> 4: 52.07211 5.123721 2019-11-01 00:10:15 1 152.5980
#> 5: 52.07211 5.123721 2019-11-01 00:13:07 1 163.8395
#> ---
#> 21907: 52.07212 5.123699 2019-11-14 23:48:23 21707 138.6700
#> 21908: 52.07212 5.123699 2019-11-14 23:50:28 21707 150.0025
#> 21909: 52.07212 5.123699 2019-11-14 23:53:23 21707 177.5640
#> 21910: 52.07212 5.123699 2019-11-14 23:56:23 21707 180.0490
#> 21911: 52.07212 5.123699 2019-11-14 23:59:23 21707 90.0665
#> state_id state
#> <int> <char>
#> 1: 1 stopped
#> 2: 1 stopped
#> 3: 1 stopped
#> 4: 1 stopped
#> 5: 1 stopped
#> ---
#> 21907: 285 stopped
#> 21908: 285 stopped
#> 21909: 285 stopped
#> 21910: 285 stopped
#> 21911: 285 stopped
Once the initial stops have been generated, it is possible to use the function to extract a data.table containing one row per event. For both stop and move events, these are annotated with the state and state_id, begin_time and end_time and number of locations belonging to the state. For move states, the raw distance traveled is included (sum of all distances between points). For stop states, the mean latitude and longitude coordinates are included.
events <- returnStateEvents(loc_data_2019)
events[]
#> state_id state meanlat meanlon begin_time end_time
#> <int> <char> <num> <num> <POSc> <POSc>
#> 1: 1 stopped 52.07212 5.123761 2019-11-01 00:02:46 2019-11-01 08:05:39
#> 2: 2 moving NA NA 2019-11-01 08:05:55 2019-11-01 08:06:27
#> 3: 3 stopped 52.07788 5.122714 2019-11-01 08:06:42 2019-11-01 08:11:29
#> 4: 4 moving NA NA 2019-11-01 08:12:00 2019-11-01 08:15:24
#> 5: 5 stopped 52.08902 5.109717 2019-11-01 08:15:40 2019-11-01 08:24:10
#> ---
#> 281: 281 stopped 52.07211 5.123767 2019-11-14 16:45:02 2019-11-14 19:02:28
#> 282: 282 moving NA NA 2019-11-14 19:02:43 2019-11-14 19:11:46
#> 283: 283 stopped 52.08177 5.138043 2019-11-14 19:12:02 2019-11-14 19:57:11
#> 284: 284 moving NA NA 2019-11-14 19:57:40 2019-11-14 20:08:32
#> 285: 285 stopped 52.07213 5.123719 2019-11-14 20:08:47 2019-11-14 23:59:23
#> raw_travel_dist stop_id move_id n_locations
#> <num> <int> <int> <int>
#> 1: NA 1 NA 471
#> 2: 158.2833 NA 1 2
#> 3: NA 2 NA 21
#> 4: 1253.8918 NA 2 13
#> 5: NA 3 NA 36
#> ---
#> 281: NA 180 NA 115
#> 282: 2171.3438 NA 102 33
#> 283: NA 181 NA 65
#> 284: 1911.7921 NA 103 38
#> 285: NA 182 NA 205
It may be useful to merge successive locations that have been clustered into stops. Consider the situation in which multiple stops have been identified within a building with the same semantic meaning. This parameter is another distance parameter and reflects how far away the centroids of the stops may be while being merged. This doesn’t have to be the same as the distance parameter set during the stop detection algorithm.
Often tracks consisting of only one point, lasting for only a few
seconds, or covering very little distance are actually errors, rather
than tracks, and interrupt what would otherwise be a single contiguous
stop. These sets of locations can be handled either by merging them with
a stop, or by excluding them. Short tracks may be excluded on the basis
of time, using the max_time
parameter, distance (in meters)
using the max_dist
parameter, or total number of locations
involved, using the max_locs
parameter.
Most often, these steps will be carried out together, as removing or merging short tracks will tend to create two subsequent stops with very close centroids.
mergingCycle(loc_data_2019,
thetaD = 200,
small_track_action = "exclude",
max_time = 600,
max_dist = 2000,
max_locs = 20)
returnStateEvents(loc_data_2019)[]
#> state_id state meanlat meanlon begin_time
#> <int> <char> <num> <num> <POSc>
#> 1: 1 stopped 52.07212 5.123760 2019-11-01 00:02:46
#> 2: NA excluded NA NA 2019-11-01 08:05:55
#> 3: NA excluded NA NA 2019-11-01 08:05:55
#> 4: 2 stopped 52.07798 5.122445 2019-11-01 08:06:42
#> 5: 3 stopped 52.08889 5.109546 2019-11-01 08:15:40
#> ---
#> 196: 194 stopped 52.07211 5.123737 2019-11-13 18:20:59
#> 197: 195 moving NA NA 2019-11-14 19:02:43
#> 198: 196 stopped 52.08177 5.138043 2019-11-14 19:12:02
#> 199: 197 moving NA NA 2019-11-14 19:57:40
#> 200: 198 stopped 52.07213 5.123719 2019-11-14 20:08:47
#> end_time raw_travel_dist stop_id move_id n_locations
#> <POSc> <num> <int> <int> <int>
#> 1: 2019-11-01 08:05:39 NA 1 NA 471
#> 2: 2019-11-14 16:44:47 NA NA NA 255
#> 3: 2019-11-14 16:44:47 NA NA 27 255
#> 4: 2019-11-01 08:11:29 NA 2 NA 21
#> 5: 2019-11-01 08:24:10 NA 3 NA 36
#> ---
#> 196: 2019-11-14 19:02:28 NA 142 NA 1300
#> 197: 2019-11-14 19:11:46 2171.344 NA 53 33
#> 198: 2019-11-14 19:57:11 NA 143 NA 65
#> 199: 2019-11-14 20:08:32 1911.792 NA 54 38
#> 200: 2019-11-14 23:59:23 NA 144 NA 205