This dataset is available from the NHSRDatasets package and similar comparisons can be made with the above. These examples can be used for data wrangling and data visualisation.
library(NHSRdatasets)
NEWS_var <- NHSRdatasets::synthetic_news_data
For mode information about the synthpop package.
NEWS is short for the National Early Warning Score. NHS England have provided a detailed introduction here
The latest iteration of the NEWS score is NEWS2.
The premise of NEWS is that physiology such as heart rate (pulse), respiration rate, consciousness (GCS or AVPU) are all routinely measured.
GCS = Glasgow Coma Score (Categorical score 3-15) measuring the Eyes, verbal and motor responses. AVPU = A categorical description of how concious a patient is A - Alert, V - Responds to voice, P - Responds to painful stimuli, U - Unresponsive.
However there are a range of professional groups who use these measurements, and it can e challenging to recognise the deteriorating patient from the raw measurements alone especially if you do not often work with acutely unwell patients.
NEWS(2) provides categorical classifications for distinct ranges of physiology. Each category is scored 0-3.
The more abnormal a measure of physiology the greater the categorical score attributed. The score is supposed to be calculated at the time the physiology is measured. In a hospital this is often when the nurse or healthcare assistant completes their observation rounds.
The categorical NEWS score then is linked to distinct actions that should be followed. These actions will typically be localised by organisations depending on the level of resource that is available to support medical emergencies.
There are some criticisms of NEWS that were addressed by NEWS2. These were that normal measures of Oxygen saturation (SpO2) were not universal and often meant over escalation of “normal” abnormal physiology in patients with respiratory diseases such as COPD. These were addressed though adjusted ranges for SpO2.
There have also been concerns that in some cases the NEWS score has been introduced to settings (often mandatory) where it has not been validated. The Score was developed by the Royal College of Physicians. They often represent clinical specialties who work in-patient medicine. As such the data that was used to develop the score was based on data from patients who were typically out of the acute phase of their illness and so abnormal physiology was a measure post therapeutic interventions. In most Cases NEWS has been shown to be robust to these criticisms.
NEWS is more work for (typically nursing) staff to complete, NEWS is also not validated as an incomplete score for example where just a heart rate, Blood pressure and SpO2 are recorded which is a common set of measurements in most outpatient settings.
syst
)
library(NHSRdatasets)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
sbp_news <- NEWS_var |>
mutate(sbp = as.numeric(syst)) |>
mutate(news = case_when(
sbp <= 90 | sbp >= 220 ~ 3,
sbp %in% c(91:100) ~ 2,
sbp %in% c(101:110) ~ 1,
!is.numeric(pulse) ~ NA_real_,
TRUE ~ 0
))
pulse
)
hr_news <- NEWS_var |>
mutate(pulse = as.numeric(pulse)) |>
mutate(news = case_when(
pulse <= 40 | pulse >= 131 ~ 3,
pulse %in% c(111:130) ~ 2,
pulse %in% c(41:50, 91:110) ~ 1,
!is.numeric(pulse) ~ NA_real_,
TRUE ~ 0
))
resp
)
rr_news <- NEWS_var |>
mutate(resp_rate = as.numeric(resp)) |>
mutate(news = case_when(
resp_rate <= 8 | resp_rate >= 25 ~ 3,
resp_rate %in% c(21:24) ~ 2,
resp_rate %in% c(9:11) ~ 1,
!is.numeric(resp_rate) ~ NA_real_,
TRUE ~ 0
))
sat
)
NEWS_var |>
mutate(news = case_when(
sat <= 91 ~ 3,
sat %in% c(92:93) ~ 2,
sat %in% c(94:95) ~ 1,
!is.numeric(sat) ~ NA_real_,
TRUE ~ 0
))
#> # A tibble: 1,000 × 13
#> male age NEWS syst dias temp pulse resp sat sup alert died news
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <int> <int> <int> <int> <dbl>
#> 1 0 68 3 150 98 36.8 78 26 96 0 0 0 0
#> 2 1 94 1 145 67 35 62 18 96 0 0 0 0
#> 3 0 85 0 169 69 36.2 54 18 96 0 0 0 0
#> 4 1 44 0 154 106 36.9 80 17 96 0 0 0 0
#> 5 0 77 1 122 67 36.4 62 20 95 0 0 0 1
#> 6 0 58 1 146 106 35.3 73 20 98 0 0 0 0
#> 7 0 25 4 65 42 35.6 72 12 99 0 0 0 0
#> 8 0 69 0 116 56 37.2 90 16 97 0 0 0 0
#> 9 0 91 1 162 72 35.5 60 16 99 0 0 0 0
#> 10 0 70 1 132 96 35.3 67 16 97 0 0 0 0
#> # ℹ 990 more rows
temp
)
NEWS_var |>
mutate(news = case_when(
temp <= 35 ~ 3,
temp >= 39.1 ~ 2,
temp %in% c(38.1:39, 35.1:36) ~ 1,
!is.numeric(temp) ~ NA_real_,
TRUE ~ 0
))
#> # A tibble: 1,000 × 13
#> male age NEWS syst dias temp pulse resp sat sup alert died news
#> <int> <int> <int> <int> <int> <dbl> <int> <int> <int> <int> <int> <int> <dbl>
#> 1 0 68 3 150 98 36.8 78 26 96 0 0 0 0
#> 2 1 94 1 145 67 35 62 18 96 0 0 0 3
#> 3 0 85 0 169 69 36.2 54 18 96 0 0 0 0
#> 4 1 44 0 154 106 36.9 80 17 96 0 0 0 0
#> 5 0 77 1 122 67 36.4 62 20 95 0 0 0 0
#> 6 0 58 1 146 106 35.3 73 20 98 0 0 0 0
#> 7 0 25 4 65 42 35.6 72 12 99 0 0 0 0
#> 8 0 69 0 116 56 37.2 90 16 97 0 0 0 0
#> 9 0 91 1 162 72 35.5 60 16 99 0 0 0 0
#> 10 0 70 1 132 96 35.3 67 16 97 0 0 0 0
#> # ℹ 990 more rows
In addition NEWS2 has altered ranges for patients with known respiratory diseases. These need additional logic on a per patient basis to implement.
In many ways, synthetic data reflects George Box’s observation that “all models are wrong, but some are useful” while providing a “useful approximation [of] those found in the real world”.
The connection between the clinical outcomes of a patient visits and costs rarely exist in practice, so being able to assess these trade-offs in synthetic data allow for measurement and enhancement of the value of care – cost divided by outcomes.
Synthetic data is likely not a 100% accurate depiction of real-world outcomes, like cost and clinical quality, but rather a useful approximation of these variables. Moreover, synthetic data is constantly improving, and methods like validation and calibration will continue to make these data sources more realistic.
Besides synthetic data used to protect the privacy and confidentiality of set of data, it can be used for testing fraud detection systems by creating realistic behaviour profiles for users and attackers. In machine learning, it can also be used to train and test models. The synthetic data can aid in creating a baseline for future testing or studies such as clinical trial studies.