7 Vector GIS operations in R

7.1 Case study and demo datasets

Ok, for demonstrating some of the many GIS operations R can perform we will be using data from one of my favourite study areas, the Cape Peninsula.

The datasets we will use, some of their properties and where to source them are tabled below. You can also download them as one (40MB) .zip file here, because the City of Cape Town updates the files from time to time and small changes in format, naming etc break the code to come. Please do not use the version from the .zip file in any real analyses etc, because I can make no guarantees about their accuracy etc. It is best to use the latest version from the links in the table.


Name Data source Data model Geometry type File format
Localities iNaturalist Vector Point Data frame from API
Watercourses City of Cape Town Vector Line ESRI shapefile
Vegetation Types (historical) City of Cape Town Vector Polygon ESRI shapefile
Vegetation Types (remnants) City of Cape Town Vector Polygon ESRI shapefile
Elevation City of Cape Town Raster Raster MapServer


If you’d like to follow along and run the analyses that follow, please download the datasets. There’s no need to download the iNaturalist data as we’ll download it directly from R.

For installing R and the required packages see section 1.2.


7.2 Reading and writing

sf has a one-size-fits-all approach in that most functions can be applied to most different data types (point, line, polygon, etc) or, in the case of reading and writing, file formats. To read data the function you want is st_read().

You’ll note that most of the sf functions begin with “st_” - this stands for “spatial and temporal” and is the same in some other GIS like PostGIS.

Let’s try to read in some data with st_read():

NOTE: if you’re trying any of the read/write code at home, you’ll need to set the file path to where you put the data and want the outputs on your local machine. You can also use ?setwd to simplify this. If you are on Windows, make sure to change the backslashes to either double backslashes or forward slashes “/”.


library(sf)
## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
veg <- st_read("data/cape_peninsula/veg/Vegetation_Indigenous.shp")
## Reading layer `Vegetation_Indigenous' from data source 
##   `/Users/jasper/GIT/spatial-r/data/cape_peninsula/veg/Vegetation_Indigenous.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1325 features and 5 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -63972.95 ymin: -3803535 xmax: 430.8125 ymax: -3705149
## Projected CRS: WGS_1984_Transverse_Mercator

This has successfully read in the data and given us a summary of some of its properties. Note the “Projected CRS” WGS_1984_Transverse_Mercator, so it is Transverse Mercator (TM), using the WGS84 datum, but it hasn’t told us what line of longitude it’s centred on, which is an essential feature of any TM projection.

The first thing you should do when interrogating any spatial data is to check the coordinate reference system (CRS). In sf, you do this with the function st_crs, like so:

st_crs(veg)
## Coordinate Reference System:
##   User input: WGS_1984_Transverse_Mercator 
##   wkt:
## PROJCRS["WGS_1984_Transverse_Mercator",
##     BASEGEOGCRS["WGS 84",
##         DATUM["Hartebeesthoek94",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ID["EPSG",6148]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["Degree",0.0174532925199433]]],
##     CONVERSION["unnamed",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",19,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]

This shows us the CRS as a WKT string and looks very complicated… Essentially there are three components to it:

  • BASEGEOGCRS - the geographic (or unprojected) CRS
  • CONVERSION - the projection, which includes a lot of information, but essentially tells us it’s Transverse Mercator, and the "Longitude of natural origin",19 indicates that it is centred on the 19 degree line of longitude (i.e. we’re dealing with Transverse Mercator Lo19)
  • CS - the cartesian axes, showing that we’re dealing with axes oriented to North and East and units of metres

TM Lo19 is a good projection for most calculations at this scale (and on this line of longitude). If you’re using Transverse Mercator, always make sure it is set for your closest “odd” line of longitude (i.e. Lo19, Lo21, Lo23)!

More on working with coordinate reference systems in see section 7.11.


Let’s have a closer look at the data:

class(veg)
## [1] "sf"         "data.frame"

It is an object of two different “classes”, a data.frame, which is an R object class you should be familiar with, and class sf, which is the native class for the sf library.

The nice thing about being both classes is it means you can apply the functions built for either class, such as head, a commonly used function for looking at the first few rows of a dataframe.

head(veg)
## Simple feature collection with 6 features and 5 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -51324.95 ymin: -3732075 xmax: -35653.98 ymax: -3718136
## Projected CRS: WGS_1984_Transverse_Mercator
##      AREA_HCTR   PRMT_MTR            National_                      Subtype
## 1 1807.5183616 24763.8073 Atlantis Sand Fynbos on marine-derived acid sands
## 2    2.1437754   609.5892 Atlantis Sand Fynbos on marine-derived acid sands
## 3    0.2134855   185.5566 Atlantis Sand Fynbos on marine-derived acid sands
## 4    2.8602421   652.1671 Atlantis Sand Fynbos on marine-derived acid sands
## 5    0.5468058   336.8006 Atlantis Sand Fynbos on marine-derived acid sands
## 6    0.4172046   259.7772 Atlantis Sand Fynbos on marine-derived acid sands
##          Community                       geometry
## 1 Need to Find Out POLYGON ((-48203.88 -372294...
## 2 Need to Find Out POLYGON ((-36676.72 -371974...
## 3 Need to Find Out POLYGON ((-35891.46 -371837...
## 4 Need to Find Out POLYGON ((-35750.07 -371847...
## 5 Need to Find Out POLYGON ((-35823.89 -371817...
## 6 Need to Find Out POLYGON ((-35929.18 -371824...

Note there are 5 attribute columns (the attribute table as you would see in most GIS software) and a 6th geometry column. All sf objects have a geometry column. This is where it stores the geometry - i.e. the point, line, polygon etc - associated with each row of attribute data.


To write data with sf you use st_write(), like so:

st_write(veg, "data/cape_peninsula/veg/Vegetation_Indigenous_duplicate.shp", append = FALSE)
## Writing layer `Vegetation_Indigenous_duplicate' to data source 
##   `data/cape_peninsula/veg/Vegetation_Indigenous_duplicate.shp' using driver `ESRI Shapefile'
## Writing 1325 features with 5 fields and geometry type Polygon.

Note that I added , append = FALSE because in my case it I want it to overwrite an existing file by the same name, and this command suppresses the warning it would usually give.

file.exists("data/cape_peninsula/veg/Vegetation_Indigenous_duplicate.shp")
## [1] TRUE

Confirms that the file exists, so it has written a file out successfully.

Note that the function recognised that I wanted to write out an ESRI shapefile from the .shp file extension I provided. You can set the file type using the driver = setting in st_write(). Try st_drivers() for the list of file types supported.


7.3 Basic plotting

As with other data types in R (and perhaps even more so with spatial data), you can really go to town with plotting. I’m only going to show you enough to be able to interrogate your data. Making it look pretty is a week-long course or more in its own right. Check out the “Making maps with R” chapter in Lovelace et al’s online book Geocomputation with R for a good start. You could also check out library(tmap) for plotting thematic maps or library(mapview) for interactive maps.


The easiest way to plot datasets in R is often a bad thing to do when working with spatial datasets!

plot(veg)

Fortunately, in this case the dataset isn’t too big, but often you’ll either be overwhelmed with plots or your computer will crash…

Why 5 plots and not one? This is because sf wants to plot the properties of each attribute in the attribute table. Fortunately, there were only 5, but there could have been hundreds! You can select the one you want with indexing like so:

plot(veg[3])

These are the National Vegetation Types for the City of Cape Town municipality.

You’ll note that we’re using the base R graphics functions. I mentioned before that sf integrates well with the Tidyverse, so this could also be plotted like so:

library(tidyverse) #calls ggplot2 and other Tidyverse packages together

ggplot() +
  geom_sf(data=veg, aes(fill = `National_`))

That’s better for the legend, but now we’ve squashed the map. Let’s narrow in on the Cape Peninsula for convenience.


7.4 Cropping

Here we’ll apply the function st_crop(). To use the function you need an object to crop, and an extent or bounding box to crop to. sf is clever, and you can set the extent by giving it another object who’s extent you’d like to match (check the bounding box given when we read in the data earlier).

We don’t have a second object in this case, so we have to provide a “numeric vector with named elements xmin, ymin, xmax and ymax”, like so:

#Make a vector with desired coordinates in metres according to TM Lo19
ext <- c(-66642.18, -3809853.29, -44412.18, -3750723.29) 

ext
## [1]   -66642.18 -3809853.29   -44412.18 -3750723.29
#Give the vector names
names(ext) <- c("xmin", "ymin", "xmax", "ymax") 

ext
##        xmin        ymin        xmax        ymax 
##   -66642.18 -3809853.29   -44412.18 -3750723.29

Now we can feed that into st_crop

veg <- st_crop(veg, ext) #Note that I'm overwriting the old data object "veg"
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
ggplot() + geom_sf(data=veg, aes(fill = `National_`))

Better?

Note that in this case I gave it the coordinates in TM Lo19. These are not always easy to work out, so you may want to create an extent using Lat/Long and then project it to TM Lo19 before cropping. You can do this like so:

myextent <- st_sf(a = 1:2, geom = st_sfc(st_point(c(18,-33)), st_point(c(19,-34))), crs = 4326)

myextent <- st_transform(myextent, crs = "+proj=tmerc +lat_0=0 +lon_0=19 +k=1 +x_0=0 +y_0=0 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs") 

# And then run veg <- st_crop(veg, myextent)

Ok. Moving on. But what about the silly splits like Peninsula Granite Fynbos - North/South and Cape Flats Dune Strandveld - West Coast/False Bay in the figure above.

Which ones do I mean?


7.5 Select and subset by attribute

Let’s select them from the attribute table and plot them.

#Make a vector of the veg types we want
split_veg <- c("Peninsula Granite Fynbos - North", 
               "Peninsula Granite Fynbos - South", 
               "Cape Flats Dune Strandveld - West Coast", 
               "Cape Flats Dune Strandveld - False Bay")

#Use base R indexing to select attributes
vegsub <- veg[which(veg$National_ %in% split_veg),]

#Plot
ggplot() + geom_sf(data=vegsub, aes(fill = `National_`))

Or tidyverse…

#Using tidyverse piping to filter and plot
veg %>% 
  filter(National_ %in% split_veg) %>%
  ggplot() +
  geom_sf(aes(fill = `National_`))

#The advantage being that you don't have to make the intermediate "vegsub" object

Ok. What if we decided we don’t want them split?

7.6 Combine classes and dissolve by attribute

We can just rename them in appropriate column in the attribute table…

vegsub$National_ <- str_replace_all(vegsub$National_,
                  c("Peninsula Granite Fynbos - North" = "Peninsula Granite Fynbos", 
                  "Peninsula Granite Fynbos - South" = "Peninsula Granite Fynbos", 
                  "Cape Flats Dune Strandveld - West Coast" = "Cape Flats Dune Strandveld", 
                  "Cape Flats Dune Strandveld - False Bay" = "Cape Flats Dune Strandveld"))

ggplot() + geom_sf(data=vegsub, aes(fill = `National_`))

Nice, but from the polygon boundaries we see that there are a number of adjacent polygons (i.e. they have shared boundaries) that are of the same veg type. We can “dissolve” and plot it without the unwanted boundaries using summarize():

vegsub %>% group_by(National_) %>% 
  summarize() %>% 
  ggplot() + geom_sf(aes(fill = National_))

Ok… I think we’ve flogged that horse as far as it’ll go for now. Let’s bring in another dataset. How about points?


7.7 Calling iNaturalist locality (point) data from R

A very cool feature of iNaturalist is that the team at rOpenSci have built a great R package for interfacing with it directly, called rinat!

Let’s get all the records we can for the King Protea (Protea cynaroides).

library(rinat)

#Call the data directly from iNat
pc <- get_inat_obs(taxon_name = "Protea cynaroides",
                   bounds = c(-35, 18, -33.5, 18.5),
                   maxresults = 1000)

#View the first few rows of data
head(pc)
##     scientific_name                  datetime                      description
## 1 Protea cynaroides 2026-03-04 10:49:00 +0200 The area has recently been burnt
## 2 Protea cynaroides 2026-03-15 09:16:00 +0200                                 
## 3 Protea cynaroides 2026-03-15 08:52:00 +0200                                 
## 4 Protea cynaroides 2026-02-28 09:03:38 +0200                                 
## 5 Protea cynaroides 2026-02-05 09:53:00 +0200                                 
## 6 Protea cynaroides 2026-03-14 10:39:44 +0200                                 
##                                                                    place_guess
## 1                        Silver Mine (Nature Reserve), Cape Town, South Africa
## 2 Breakfast Rock hike, Silvermine East section of Table Mountain National Park
## 3 Breakfast Rock hike, Silvermine East section of Table Mountain National Park
## 4                                   Cecilia Ridge, Cape Town, Western Cape, ZA
## 5                                          Cape Point, Cape Town, South Africa
## 6                                                                    Cape Town
##    latitude longitude tag_list common_name
## 1 -34.10011  18.42911          King Protea
## 2 -34.09531  18.43570          King Protea
## 3 -34.09472  18.43188          King Protea
## 4 -33.99512  18.41889          King Protea
## 5 -34.29154  18.42853          King Protea
## 6 -33.99249  18.37527          King Protea
##                                                  url
## 1 https://www.inaturalist.org/observations/343319197
## 2 https://www.inaturalist.org/observations/343190054
## 3 https://www.inaturalist.org/observations/343189760
## 4 https://www.inaturalist.org/observations/343182773
## 5 https://www.inaturalist.org/observations/343142065
## 6 https://www.inaturalist.org/observations/342874315
##                                                                    image_url
## 1 https://inaturalist-open-data.s3.amazonaws.com/photos/625359832/medium.jpg
## 2 https://inaturalist-open-data.s3.amazonaws.com/photos/625083830/medium.jpg
## 3 https://inaturalist-open-data.s3.amazonaws.com/photos/625083138/medium.jpg
## 4 https://inaturalist-open-data.s3.amazonaws.com/photos/625082353/medium.jpg
## 5 https://inaturalist-open-data.s3.amazonaws.com/photos/624993534/medium.jpg
## 6 https://inaturalist-open-data.s3.amazonaws.com/photos/624498893/medium.jpg
##        user_login        id species_guess iconic_taxon_name taxon_id
## 1         ydnewp2 343319197   King Protea           Plantae   132848
## 2      tonyrebelo 343190054   King Protea           Plantae   132848
## 3      tonyrebelo 343189760   King Protea           Plantae   132848
## 4       jennylouw 343182773   King Protea           Plantae   132848
## 5 francoisdurandt 343142065   King Protea           Plantae   132848
## 6    hanneliekemp 342874315   King Protea           Plantae   132848
##   num_identification_agreements num_identification_disagreements
## 1                             1                                0
## 2                             1                                0
## 3                             1                                0
## 4                             2                                0
## 5                             2                                0
## 6                             2                                0
##                         observed_on_string observed_on        time_observed_at
## 1                      2026/03/04 10:49 AM  2026-03-04 2026-03-04 08:49:00 UTC
## 2                       2026/03/15 9:16 AM  2026-03-15 2026-03-15 07:16:00 UTC
## 3                       2026/03/15 8:52 AM  2026-03-15 2026-03-15 06:52:00 UTC
## 4                      2026-02-28 09:03:38  2026-02-28 2026-02-28 07:03:38 UTC
## 5                       2026/02/05 9:53 AM  2026-02-05 2026-02-05 07:53:00 UTC
## 6 Sat Mar 14 2026 10:39:44 GMT +0200 (GMT)  2026-03-14 2026-03-14 08:39:44 UTC
##   time_zone positional_accuracy public_positional_accuracy geoprivacy
## 1  Pretoria                 242                        242           
## 2  Pretoria                  10                         10           
## 3  Pretoria                  10                         10           
## 4  Pretoria                   3                          3           
## 5  Pretoria                4024                       4024           
## 6  Pretoria                   4                          4           
##   taxon_geoprivacy coordinates_obscured positioning_method positioning_device
## 1             open                false                                      
## 2             open                false                                      
## 3             open                false                                      
## 4             open                false                                      
## 5             open                false                                      
## 6             open                false                                      
##   user_id         user_name              created_at              updated_at
## 1  772460             Wendy 2026-03-16 14:52:27 UTC 2026-03-16 21:02:47 UTC
## 2  383144       Tony Rebelo 2026-03-15 20:55:00 UTC 2026-03-16 06:47:57 UTC
## 3  383144       Tony Rebelo 2026-03-15 20:53:46 UTC 2026-03-16 21:15:07 UTC
## 4  102748        Jenny Louw 2026-03-15 20:18:06 UTC 2026-03-15 20:37:46 UTC
## 5  773840 Francois du Randt 2026-03-15 16:56:29 UTC 2026-03-15 17:23:53 UTC
## 6 9588100                   2026-03-14 08:39:54 UTC 2026-03-15 09:24:08 UTC
##   quality_grade  license sound_url oauth_application_id captive_cultivated
## 1      needs_id CC-BY-NC        NA                   NA              false
## 2      research CC-BY-NC        NA                   NA              false
## 3      research CC-BY-NC        NA                   NA              false
## 4      research CC-BY-NC        NA                  843              false
## 5      research CC-BY-NC        NA                   NA              false
## 6      research CC-BY-NC        NA                  333              false
#Filter returned observations by a range of column attribute criteria
pc <- pc %>% filter(positional_accuracy<46 & 
                latitude<0 &
                !is.na(latitude) &
                captive_cultivated == "false" &
                quality_grade == "research")

class(pc)
## [1] "data.frame"

Ok, so this is a dataframe with lat/long data, but it isn’t registered as an object with spatial attributes (i.e. geometries).


7.8 Converting a dataframe into a spatial object

To make it an object of class(sf) we use the function st_as_sf().

#Make the dataframe a spatial object of class = "sf"
pc <- st_as_sf(pc, coords = c("longitude", "latitude"), crs = 4326)

Note that I had to define the CRS!!! I defined it to be Geographic WGS84, using the EPSG code in this case. Recall the difference between defining and CRS and projecting of transforming the CRS as discussed in yesterday’s notes…

#What class is it?
class(pc)
## [1] "sf"         "data.frame"
#Note the new "geometry" column
names(pc)
##  [1] "scientific_name"                  "datetime"                        
##  [3] "description"                      "place_guess"                     
##  [5] "tag_list"                         "common_name"                     
##  [7] "url"                              "image_url"                       
##  [9] "user_login"                       "id"                              
## [11] "species_guess"                    "iconic_taxon_name"               
## [13] "taxon_id"                         "num_identification_agreements"   
## [15] "num_identification_disagreements" "observed_on_string"              
## [17] "observed_on"                      "time_observed_at"                
## [19] "time_zone"                        "positional_accuracy"             
## [21] "public_positional_accuracy"       "geoprivacy"                      
## [23] "taxon_geoprivacy"                 "coordinates_obscured"            
## [25] "positioning_method"               "positioning_device"              
## [27] "user_id"                          "user_name"                       
## [29] "created_at"                       "updated_at"                      
## [31] "quality_grade"                    "license"                         
## [33] "sound_url"                        "oauth_application_id"            
## [35] "captive_cultivated"               "geometry"
#Plot
ggplot() + geom_sf(data=pc)


Great! We got lots of points, but without a base layer its very difficult to tell where exactly these are?


7.9 Adding basemaps to plots

There are lots of ways to make the basemap from data objects etc that we can plot our points over, but an easy way is to pull in tiles from Open Street Maps and plot our points on those.


library(rosm)
library(ggspatial)

ggplot() + 
  annotation_map_tile(type = "osm", progress = "none") + 
  geom_sf(data=pc)


Note that there are quite a few base layer/tile options that can be set with type = "". Try rosm::osm.types() to see them all.

This is better than nothing, but the scale of the map is too small to really see where the plants actually are. It would be much easier if we could look at the data interactively?


7.10 Interactive maps with leaflet and mapview

We can generate interactive maps by calling the leaflet mapserver using wrapper functions in the leaflet R package written for this purpose.

NOTE: If you can’t get leaflet to work it is probably a CRS problem. Your data need to be in Geographic or Web Mercator


library(leaflet)
library(htmltools)
leaflet() %>%
  # Add default OpenStreetMap map tiles
  addTiles(group = "Default") %>%  
  # Add our points
  addCircleMarkers(data = pc,
                   group = "Protea cynaroides",
                   radius = 3, 
                   color = "green") 


Much better!

Strange, but even though we filtered our iNaturalist records for captive_cultivated == "false" we still have a number of observations that appear to be in people’s gardens.

Let this serve as a warning to be wary of all data! Always do “common-sense-checks” on your data and the outputs of your analyses!!!

One way to do common sense checks with interactive plots is to add popup labels that allow you to inspect the data. See here to do this with library(leaflet), but here’s an example with library(mapview).


library(mapview)
library(leafpop)

mapview(pc, 
        popup = 
          popupTable(pc,
            zcol = c("user_login", "captive_cultivated", "url")))


Nice, but we can’t click on the URL, we have to copy and paste it. Fortunately, with a little html formatting we can make them live links.


lpc <- pc %>%
  mutate(click_url = paste("<b><a href='", url, "'>Link to iNat observation</a></b>"))

mapview(pc, 
        popup = 
          popupTable(lpc,
            zcol = c("user_login", "captive_cultivated", "click_url")))


7.11 Reprojecting

One way to drastically reduce the number of cultivated records is to overlay the localities (points) with the remaining extent of the vegetation types (i.e. anything that is not in natural vegtation is likely to be cultivated). Let’s try that…


#Get the remnants layer
vegr <- st_read("data/cape_peninsula/veg/Vegetation_Indigenous_Remnants.shp")
## Reading layer `Vegetation_Indigenous_Remnants' from data source 
##   `/Users/jasper/GIT/spatial-r/data/cape_peninsula/veg/Vegetation_Indigenous_Remnants.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 3428 features and 7 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -63951.23 ymin: -3803532 xmax: 420.7595 ymax: -3705506
## Projected CRS: WGS_1984_Transverse_Mercator
hmm <- st_intersection(pc, vegr)
## Error in `geos_op2_geom()`:
## ! st_crs(x) == st_crs(y) is not TRUE

Oops! The Coordinate Reference Systems are different! We will need to reproject one of the two datasets…


Let’s see what CRS are currently set:

st_crs(pc)
## Coordinate Reference System:
##   User input: EPSG:4326 
##   wkt:
## GEOGCRS["WGS 84",
##     ENSEMBLE["World Geodetic System 1984 ensemble",
##         MEMBER["World Geodetic System 1984 (Transit)"],
##         MEMBER["World Geodetic System 1984 (G730)"],
##         MEMBER["World Geodetic System 1984 (G873)"],
##         MEMBER["World Geodetic System 1984 (G1150)"],
##         MEMBER["World Geodetic System 1984 (G1674)"],
##         MEMBER["World Geodetic System 1984 (G1762)"],
##         MEMBER["World Geodetic System 1984 (G2139)"],
##         MEMBER["World Geodetic System 1984 (G2296)"],
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]],
##         ENSEMBLEACCURACY[2.0]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Horizontal component of 3D system."],
##         AREA["World."],
##         BBOX[-90,-180,90,180]],
##     ID["EPSG",4326]]

So the points are Geographic, with no projected CRS CONVERSION.

st_crs(vegr)
## Coordinate Reference System:
##   User input: WGS_1984_Transverse_Mercator 
##   wkt:
## PROJCRS["WGS_1984_Transverse_Mercator",
##     BASEGEOGCRS["WGS 84",
##         DATUM["Hartebeesthoek94",
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ID["EPSG",6148]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["Degree",0.0174532925199433]]],
##     CONVERSION["unnamed",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",19,
##             ANGLEUNIT["Degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",1,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]

The remnants of vegetation types are in Transverse Mercator Lo19, just like the dataset of the historical extent of the veg types we were working with earlier.


In this case, either CRS is fine for our purposes, but let’s stick with Transverse Mercator Lo19, because it’ll be useful later. For this we need to project the points like so:

pc <- st_transform(pc, st_crs(vegr)) 

Note that I fed it the CRS from vegr. This guarantees that they’ll be the same, even if we misidentified what the actual CRS is…


NOTE: Sometimes you will


7.12 Intersecting points and polygons

…and now we can try to intersect the points and polygons again…

First lets see how many rows and columns the point data before the intersection:

#call the dimensions of pc
dim(pc)  
## [1] 675  36


And after the intersection?

pc <- st_intersection(pc, vegr)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
dim(pc)
## [1] 639  43

Less rows, but more columns! Two things have happened:

  1. The attribute data from the polygons in vegr intersected by the points in pc have been added to the attribute table in pc!
  2. All points that do not intersect the polygons in vegr were dropped (i.e. those that were recorded outside the remaining extent of natural vegetation).


Let’s have a look

ggplot() + 
  annotation_map_tile(type = "osm", progress = "none") + 
  geom_sf(data=pc)

Yup, the localities in suburbia are gone… The map is a bit bland though. How about we use our “new information” about which vegetation types the observations occur in to colour or label the points on the map?


7.13 Colour or label points

First, let’s add colour:

library(wesanderson)

pal <- wes_palette("Darjeeling1", length(unique(pc$National_)), type = "continuous") # Note that the "length(..." code sets the number of colours to the number of unique vegetation types in the dataset

ggplot() + 
  annotation_map_tile(type = "osm", progress = "none") + 
  geom_sf(data=pc, aes(col = National_)) +
  scale_colour_manual(values = pal)

Looks like almost all of them are in Peninsula Sandstone Fynbos…

pc %>% group_by(National_) %>% summarise(n())
## Simple feature collection with 7 features and 2 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -61674.52 ymin: -3797349 xmax: -49141.81 ymax: -3755691
## Projected CRS: WGS_1984_Transverse_Mercator
## # A tibble: 7 × 3
##   National_                        `n()`                                geometry
##   <chr>                            <int>                          <GEOMETRY [m]>
## 1 Cape Flats Sand Fynbos               4 MULTIPOINT ((-53316.05 -3769921), (-53…
## 2 Peninsula Granite Fynbos - North     2 MULTIPOINT ((-58247.12 -3761829), (-56…
## 3 Peninsula Granite Fynbos - South     2 MULTIPOINT ((-53919.28 -3764621), (-53…
## 4 Peninsula Sandstone Fynbos         621 MULTIPOINT ((-61674.52 -3780003), (-61…
## 5 Peninsula Shale Fynbos               1               POINT (-53114.6 -3757773)
## 6 Peninsula Shale Renosterveld         2 MULTIPOINT ((-55863.43 -3755809), (-55…
## 7 Southern Afrotemperate Forest        7 MULTIPOINT ((-53698.74 -3762096), (-53…

Yup! Note the numbers in column n(). But I can’t see where the Hangklip Sand Fynbos record is, so let’s label that one with text using geom_sf_label().

hsf <- pc %>% filter(National_ == "Hangklip Sand Fynbos") #find the locality

ggplot() + 
  annotation_map_tile(type = "osm", progress = "none") + 
  geom_sf(data=pc, aes(col = National_)) +
  scale_colour_manual(values = pal) +
  geom_sf_label(data=hsf, aes(label = "Here"))

Aha!

Note that you can specify that the label = setting points to a column in your dataset with names if you have lots of labels to add.


7.14 Buffering

One issue here may be that all localities should be in Peninsula Sandstone Fynbos, but the vegetation type boundaries are wrong. After all, the transition or ecotone between two vegetation types is usually diffuse rather than a clear boundary, not to mention that the data may have been digitized at a very small scale, compromizing precision and accuracy. One way to check this is to buffer the points using st_buffer to see if they are within some distance (say 250m) of the boundary with Peninsula Sandstone Fynbos.

#Find the localities that are not in Peninsula Sandstone Fynbos and add a 250m buffer
npsf <- pc %>% 
  filter(National_ != "Peninsula Sandstone Fynbos") %>%
  st_buffer(dist = 250)

#NOTE that st_buffer() makes them polygons, because they now have area!
npsf$geometry[1] #The first geometry in npsf
## Geometry set for 1 feature 
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -53872.89 ymin: -3762414 xmax: -53372.89 ymax: -3761914
## Projected CRS: WGS_1984_Transverse_Mercator
## POLYGON ((-53372.89 -3762164, -53373.23 -376217...
#Get the number of unique iNaturalist record numbers
length(unique(npsf$id)) 
## [1] 18
#Intersect new polygons with veg remnants and filter for those that overlap Peninsula Sandstone Fynbos only
npsf <- st_intersection(npsf, vegr) %>% filter(National_.1 == "Peninsula Sandstone Fynbos")
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
#Get the number of unique iNaturalist record numbers that overlap PSF
length(unique(npsf$id))
## [1] 7

So a fair proportion of the records are suspiciously close to Peninsula Sandstone Fynbos…


7.15 Within distance and intersect

Perhaps a more interesting use of buffering is to see if a species’ locality is within a certain distance of a particular habitat etc. For example, we could ask if a species is associated with riparian zones by buffering either the localities (points) or rivers (lines) and then doing an intersection.

But of course there are many ways to skin a cat, and it turns out buffering and intersecting may not be the most efficient here. If we don’t want to pull the attribute data from one dataset to the other we can just use st_intersects() to see if they overlap at all. We can even take it one step further, because sf has the function st_is_within_distance(), which is similar to applying st_buffer() and st_intersects() in one go.

Here we’ll use Brabejum stellatifolium (a riparian tree in the Proteaceae) as our focal species and the watercourse layer from the City of Cape Town.

#Get the watercourse data
water <- st_read("data/cape_peninsula/Open_Watercourses.geojson")
## Reading layer `Open_Watercourses' from data source 
##   `/Users/jasper/GIT/spatial-r/data/cape_peninsula/Open_Watercourses.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 10848 features and 11 fields
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: 18.31249 ymin: -34.28774 xmax: 18.99045 ymax: -33.47256
## Geodetic CRS:  WGS 84
#Check it's CRS
st_crs(water)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]
#Call the data directly from iNat
bs <- get_inat_obs(taxon_name = "Brabejum stellatifolium",
                   bounds = c(-35, 18, -33.5, 18.5),
                   maxresults = 1000)

#Filter returned observations by a range of attribute criteria
bs <- bs %>% filter(positional_accuracy<46 & 
                latitude<0 &
                !is.na(latitude) &
                captive_cultivated == "false" &
                quality_grade == "research")

#See how many records we got
nrow(bs)
## [1] 463
#Make the dataframe a spatial object of class = "sf"
bs <- st_as_sf(bs, coords = c("longitude", "latitude"), crs = 4326) #Note that I had to define the CRS (as Geographic WGS84)!!!

Let’s see what we’ve got…

#Crop the water courses to the extent of the locality data
water <- st_crop(water, bs)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
#Plot
ggplot() + 
  annotation_map_tile(type = "osm", progress = "none") + 
  geom_sf(data = water, colour = "blue") +
  geom_sf(data=bs)

Hard to tell, but they could all be on rivers?

Let’s try st_intersects() without any buffering first to see if they overlap at all.

st_intersects(bs, water) %>% unlist()
## integer(0)

Oops! We forgot to project our data!

bs <- st_transform(bs, st_crs(vegr))

water <-  st_transform(water, st_crs(vegr))

st_intersects(bs, water) %>% unlist()
## integer(0)

So none of them intersect, but this is not surprising, because lines and points in GIS do not have area, so they can’t really intersect unless you buffer them… Let’s try st_is_within_distance() and set it for 20 metres.

Note that I add unlist() %>% unique() so that it gives me a vector of the unique features (i.e. once each) that are within 20m, because the function returns a list and will return the same feature (line/river) multiple times - once for every point (tree) it is within 20m of.

st_is_within_distance(bs, water, 20) %>% unlist() %>% unique()
##  [1] 349 328 346 592 327 264 284 316 364 285 304 347 885 351 224 615 292 332 165
## [20] 322 290 179 101 280 617 507 330 294 333 281 282  19

So it’s given us the list of lines (rivers) within 20m of our points, but that doesn’t tell us how many (or what proportion) of our points are within 20m of a river. Let’s apply the function again, swapping the layers around:

st_is_within_distance(water, bs, 20) %>% unlist() %>% unique()
##   [1] 450 203 435 451 461 160 179 370 391 400 114  29  40 137 301  46  71 247
##  [19]  63 375 408 170 129 385 386  74  75  53  26  50  90  92  93 123 124 125
##  [37] 126 127 151 204 210 226 253 269 273 283 297 321 328 329 355 356 427 428
##  [55]   6   7   8   9  30  43  45  67  82  83  84  96 119 120 121 140 152 239
##  [73] 242 307 317 320 339 374 377 443 132 135 262 300 426 410  23  86  87 147
##  [91] 156 158 174 178 197 198 200 206 233 234 293 294 295 308 315 350 371 372
## [109] 401 316   4 373 101 169 302 368  55 282  25  31  41 251 263  88 143 367
## [127] 369 381 439

So only 129 of the trees are within 20m of the rivers. What about 100m?

st_is_within_distance(water, bs, 100) %>% unlist() %>% unique()
##   [1] 228 229 254 298 193 389 450  99 114 115 141 203 435 451 461 160 161 189
##  [19] 219 311 323 344 413 442 179 370 391 400 305 113 181 312  13  29  36  39
##  [37]  40  52  61  62  71  72  76  77 100 116 117 130 131 137 139 144 145 146
##  [55] 173 175 176 177 180 183 185 191 192 194 207 215 218 223 241 246 247 255
##  [73] 260 261 279 301 319 353 375 376 392 394 395 396 404 408 412 421 425  98
##  [91]  14  63 188 452  15  46 403 429 453 182  60  78 102 103 128 142 170 393
## [109] 422 423 424 129 252 385 386 415  70 112 352 333  74  75  49  53  54  55
## [127] 136 434 378   9  17  26  28  38  50  51  68  73  79  80  82  83  84  89
## [145]  90  91  92  93 123 124 125 126 127 134 149 150 151 152 163 164 171 190
## [163] 196 204 208 209 210 211 213 214 225 226 236 239 240 245 253 264 269 270
## [181] 271 273 274 275 276 277 278 283 289 290 297 314 320 321 324 326 327 328
## [199] 329 334 339 342 355 356 362 363 364 365 366 383 388 398 409 416 418 419
## [217] 420 427 428 443 445 456   1   5   6   7   8  30  34  43  44  45  48  59
## [235]  67  69  96 119 120 121 122 138 140 148 167 187 242 258 296 306 307 317
## [253] 331 332 341 343 357 358 359 360 361 374 377 390 272 132 135 231 262 263
## [271] 268 300 426 410 158 155 166 265 266 267  23  86  87 147 156 157 174 178
## [289] 197 198 199 200 206 233 234 238 244 248 259 287 293 294 295 308 315 316
## [307] 350 371 372 401 411 457   4 373 101 169 302 310 368 454 282  31  58 195
## [325] 250  25  41 251  16  18  19  20  24  32  56 249 281 284 285 288  57  88
## [343] 143 367 369 381 439 387

348

It’s at this point that it’s worth thinking about the scale, precision and accuracy of both the species localities and the watercourse data before drawing any strong conclusions!!!