The package NHSRdatasets has several datasets from a range of areas related to health and care.
Once installed, as for any package, this will have to be called through:
To view the datasets (which is useful as CRAN and the GitHub versions may differ), type:
NHSRdatasets::
If you are using RStudio an automatic prompt will appear with all the datasets available. It’s also possible to bring the view up by putting the cursor after the second colon and using the tab key.
Once a dataset is selected the first 10 rows will appear in the console:
NHSRdatasets::ae_attendances
This is because the data is stored as tibbles. Tibbles are the same as a data frame but they change the behaviour slightly to improve some features, one of which is how many rows appear in the console which defaults to a more manageable 10.
Another feature is that if you have a lot of columns and only a few
are shown in the console, the names of the obscured columns are printed
under the top 10 rows. In this example the column
admissions
couldn’t fit onto the console but is listed
underneath the data.
# ℹ 12,755 more rows
# ℹ 1 more variable: admissions <dbl>
# ℹ Use `print(n = ...)` to see more rows
You can also return many more rows by applying the
print(n =)
function to the code like:
print(NHSRdatasets::ae_attendances, n = 2)
Tibbles in an RMarkdown or Quarto output, like this vignette, will print every row as the behaviour that is coded to return the top 10 rows is related to how it is viewed within RStudio, not in a report. To restrict the number of rows in an RMarkdown output like html the code will have to be:
NHSRdatasets::ae_attendances |>
head(n = 2)
#> period org_code type attendances breaches admissions
#> 1 2017-03-01 RF4 1 21289 2879 5060
#> 2 2017-03-01 RF4 2 813 22 0
Creating an object of the dataset you are working with makes it easier to view the whole of the dataset, use functionality within the IDE (like RStudio or VS Code) to order/search and means you can change the data by adding new columns for example.
The NHS-R Community slides from the Introduction to R and R Studio go into more detail on objects.
To create an object and open it in RStudio:
dat <- NHSRdatasets::ae_attendances
A new object called dat
will appear in the Environment
tab of the top right pane of RStudio (if you have the default layout).
The object also says it has 12765 obs (which are rows) and 6 variables
(which are columns).
Using code to see something similar but in the Console in RStudio type:
glimpse(dat)
Clicking on the blue circle with a white arrow next to the word
dat
will expand the view in the Environment tab and
clicking on the word dat
will open the data in a new tab in
the top left panel.
If you are not using RStudio and want to use the Console where code is run directly use:
View(dat)
and you will see this code in the Console when you click on the name as every action in RStudio is translated to code in the Console.
To find out about the dataset you can use the question mark before it in code:
?ons_mortality
You will see the Usage
says:
data("ons_mortality")
This does the same as creating an object using the code
ons_mortality <- NHSRdatasets::ons_mortality
and uses
the package utils which is available through RStudio automatically. It
also doesn’t require a name like the above code as it uses the existing
name for the data.
If you try to use the assignment operator with the function it will
appear as a vector
just saying “ons_mortality”.
ons <- data("ons_mortality")