Scikit-learn
(Python) pipelinefactory_predict_unlabelled_text_r.Rd
Predict unlabelled text using a fitted Scikit-learn
(Python) pipeline
factory_predict_unlabelled_text_r( dataset, predictor, pipe_path_or_object, preds_column = NULL, column_names = "all_cols", theme = NULL )
dataset | Data frame. The text data to predict classes for. |
---|---|
predictor | String. The column name of the text variable. |
pipe_path_or_object | String or
|
preds_column | A string with the user-specified name of the column that
will have the predictions. If |
column_names | A vector of strings with the names of the columns of the
supplied data frame (incl. |
theme | String. For internal use by Nottinghamshire Healthcare NHS
Foundation Trust or other trusts that use theme labels ("Access",
"Environment/ facilities" etc.). The column name of the theme variable.
Defaults to |
Data frame. The predictions column with or without any other columns
passed by the user (see column_names
).
# Prepare training and test sets data_splits <- pxtextmineR::factory_data_load_and_split_r( filename = pxtextmineR::text_data, target = "label", predictor = "feedback", test_size = 0.90) # Fit the pipeline pipe <- pxtextmineR::factory_pipeline_r( x = data_splits$x_train, y = data_splits$y_train, tknz = "spacy", ordinal = FALSE, metric = "accuracy_score", cv = 2, n_iter = 1, n_jobs = 1, verbose = 3, learners = "SGDClassifier" ) # Make predictions # # Return data frame with predictions column and all original columns from # the supplied data frame preds_all_cols <- pxtextmineR::factory_predict_unlabelled_text_r( dataset = pxtextmineR::text_data, predictor = "feedback", pipe_path_or_object = pipe, column_names = "all_cols") str(preds_all_cols)#> 'data.frame': 10334 obs. of 4 variables: #> $ feedback_preds: chr "Couldn't be improved" "Care received" "Staff" "Care received" ... #> $ label : chr "Couldn't be improved" "Environment/ facilities" "Access" "Communication" ... #> $ criticality : chr "3" "-1" "-2" "-1" ... #> $ feedback : chr "Nothing." "Temperature in theatre a little low." "Same service available at Bingham Health Centre." "Appointment details given over phone - no physical evidence/reminder which could cause problems. Other than tha"| __truncated__ ... #> - attr(*, "pandas.index")=RangeIndex(start=0, stop=10334, step=1)# Return data frame with predictions column only preds_preds_only <- pxtextmineR::factory_predict_unlabelled_text_r( dataset = pxtextmineR::text_data, predictor = "feedback", pipe_path_or_object = pipe, column_names = "preds_only") head(preds_preds_only)#> feedback_preds #> 1 Couldn't be improved #> 2 Care received #> 3 Staff #> 4 Care received #> 5 Miscellaneous #> 6 Care received# Return data frame with predictions column and columns label and feedback from # the supplied data frame preds_label_text <- pxtextmineR::factory_predict_unlabelled_text_r( dataset = pxtextmineR::text_data, predictor = "feedback", pipe_path_or_object = pipe, column_names = c("label", "feedback")) str(preds_label_text)#> 'data.frame': 10334 obs. of 3 variables: #> $ feedback_preds: chr "Couldn't be improved" "Care received" "Staff" "Care received" ... #> $ label : chr "Couldn't be improved" "Environment/ facilities" "Access" "Communication" ... #> $ feedback : chr "Nothing." "Temperature in theatre a little low." "Same service available at Bingham Health Centre." "Appointment details given over phone - no physical evidence/reminder which could cause problems. Other than tha"| __truncated__ ... #> - attr(*, "pandas.index")=RangeIndex(start=0, stop=10334, step=1)# Return data frame with the predictions column name supplied by the user preds_custom_preds_name <- pxtextmineR::factory_predict_unlabelled_text_r( dataset = pxtextmineR::text_data, predictor = "feedback", pipe_path_or_object = pipe, column_names = "preds_only", preds_column = "predictions") head(preds_custom_preds_name)#> predictions #> 1 Couldn't be improved #> 2 Care received #> 3 Staff #> 4 Care received #> 5 Miscellaneous #> 6 Care received