class: title-slide, left, bottom # Database connections in R ---- ## **Session 11** ### ### --- # Two common methods There are two common methods of connection, both of which use Open Database Connectivity (ODBC) drivers: 1. The `RODBC` package (older method, not shown here) 2. The `DBI` system, using `dplyr,` `dbplyr` and `odbc`. <br> + Both of these create a connection, using a 'connection string' with server details. + This can be used to create a connection object that you use in `R`. + We can use the object to pull data into `R` or manipulate it in the database. -- <br> For these examples, we will assume: + SQL Server connection, call 'MyServer' + A database called 'MyDB' + A table called 'MyTable' --- # DBI connection Requires a couple of packages, and knowing your connection details: + `DBI` - a common Database Interface engine for use in `S` and `R` (see <a href="https://cran.r-project.org/web/packages/DBI/vignettes/DBI-1.html">here</a>) + `odbc`- provides the odbc drivers, but you could use the functions below with other drivers instead. ```r library(DBI) library(odbc) DBI_connection <- dbConnect(odbc() , Driver="SQL Server" , Server="MyServer" , database = "MyDB" , trusted_connection="true" ) ``` -- + `trusted_connection=true` passes your windows credentials to the server. + Sometimes this is: `Trusted_Connection` and `yes` or `no`, instead of `true` or `false`. + You can, instead, specify a username (`uid`) and a password (`pwd`) --- # Using SQL & returning data to `R` (1) + Can write an SQL query directly using the `dbSendQuery` function. + Executes the query on the server-side only. + If you want the results back in `R`, you need to use `dbFetch` as well. ```r SomeRecords <- dbFetch( dbSendQuery(DBI_connection , "Select TOP 100 * from MyTable" ) ) #or SomeRecords <- dbSendQuery(DBI_connection , "Select TOP 100 * from MyTable") %>% dbFetch() ``` --- # Using tables in the database Now we can define a table as if it was part of our `R` workspace, using the connection object and the names of the table in the database. + `dplyr` - to make the `tbl` and use it, we'll work with `dplyr` syntax. + `dbplyr` - this add-on package allows translation from `dplyr` to SQL. -- ```r MyTable <- tbl(DBI_connection, "MyTable") ``` ```r # Or if your table was in a schema called 'ref', rather than default like'dbo' MyTable_inschema <- tbl(DBI_connection, in_schema("ref", "MyTable")) ``` -- `glimpse` is a useful function that shows you a summary: ```r glimpse(MyTable) ``` ``` ## Rows: ?? ## Columns: 8 ## Database: sqlite 3.38.2 [:memory:] ## $ date <dbl> 15949, 15949, 15949, 15949, 15949, 15949, 15949, 15949, 15949… ## $ org_code <chr> "R1A", "R1C", "R1E", "R1F", "R1H", "R1J", "RA2", "RA3", "RA4"… ## $ org_name <chr> "Worcestershire Health And Care", "Solent", "Staffordshire An… ## $ beds_av <dbl> 129, 105, NA, 54, NA, NA, NA, NA, NA, NA, NA, NA, NA, 6, NA, … ## $ occ_av <dbl> 117, 82, NA, 42, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, N… ## $ year <dbl> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2… ## $ month <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9… ## $ day <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1… ``` --- # Constructing `dplyr` query + We can then perform select or aggregate queries without translation. + Even though it returns results, the data are still in the database ```r library(dplyr) library(dbplyr) MyTable %>% filter(year == 2015) %>% group_by(org_name) %>% summarise(AvgEvents = mean(beds_av)) %>% arrange(month) ``` ``` ## # Source: lazy query [?? x 2] ## # Database: sqlite 3.38.2 [:memory:] ## # Ordered by: month ## org_name AvgEvents ## <chr> <dbl> ## 1 2gether 207 ## 2 5 Boroughs Partnership 306. ## 3 Aintree University Hospital NA ## 4 Airedale NA ## 5 Alder Hey Children's 7 ## 6 Ashford And St Peter's Hospitals NA ## 7 Avon And Wiltshire Mental Health Partnership 520. ## 8 Barking, Havering And Redbridge University Hospitals NA ## 9 Barnet, Enfield And Haringey Mental Health 508 ## 10 Barnsley Hospital NA ## # … with more rows ``` --- # Returning data to `R` with `dbplyr` + May need to pull the data from the server into memory in `R` sometimes. + Can do this with `collect()` ```r MyTable_local<- MyTable %>% filter(year == 2015) %>% group_by(org_name) %>% summarise(AvgEvents = mean(beds_av)) %>% arrange(org_name) %>% collect() MyTable_local ``` ``` ## # A tibble: 224 × 2 ## org_name AvgEvents ## <chr> <dbl> ## 1 2gether 207 ## 2 5 Boroughs Partnership 306. ## 3 Aintree University Hospital NA ## 4 Airedale NA ## 5 Alder Hey Children's 7 ## 6 Ashford And St Peter's Hospitals NA ## 7 Avon And Wiltshire Mental Health Partnership 520. ## 8 Barking, Havering And Redbridge University Hospitals NA ## 9 Barnet, Enfield And Haringey Mental Health 508 ## 10 Barnsley Hospital NA ## # … with 214 more rows ``` --- # Writing to databases You can also write back to a database using the `dbWriteTable` function. + For example: + Writing a new table current connection, call the table: 'NewDatabaseTable' + We are writing back the `R` `data.frame` called "MyTable_local" (that we created in the last section) + First time, `dbWriteTable` will create the table. + Subsequent uses, you can specify whether to `append` or `overwrite` options using TRUE/FALSE. ```r dbWriteTable(DBI_connection ,"NewDatabaseTable" , MyTable_local , overwrite=TRUE) ``` --- class: middle, center # Exercise: ### Follow the instructions in the `11_database_excerise.R` file --- # Useful DBI commands |Command | Summary | | --------------|:------------------------------------------------------:| |dbConnect() | Create a DBI connection object | |dbListTables() | List the tables on the connection | |dbListFields() | List the fields for a given table on a given connection| |dbSendQuery() | Send a query to execute on the server/connection | |dbFetch() | Fetch the results from the server/connection | |dbWriteTable() | Write a table to the connection | | | | |tbl() | Set a table on the connection as a 'tibble' for `dplyr`| |glimpse() | See a summary of the rows, data types and top rows | |in_schema() | Use a named schema, other than 'dbo' (`dbplyr` package) | --- #### This work is licensed as </br> Creative Commons </br> Attribution </br> ShareAlike 4.0 </br> International </br> To view a copy of this license, visit </br> https://creativecommons.org/licenses/by/4.0/