hanaml.CRF {hana.ml.r} | R Documentation |
hanaml.CRF is an R wrapper for PAL conditional random field algorithm.
hanaml.CRF(conn.context, data = NULL, used.cols = NULL, label = NULL, enet.lambda = NULL, tol = NULL, max.iter = NULL, lbfgs.m = NULL, thread.ratio = NULL, use.class.feature = NULL, use.word = NULL, use.ngrams = NULL, no.mid.ngrams = NULL, max.ngram.length = NULL, use.prev.word = NULL, use.next.word = NULL, use.disjunctive = NULL, disjunction.width = NULL, use.sequences = NULL, use.prev.sequences = NULL, use.type.seqs = NULL, use.type.seqs2 = NULL, use.type.ysequences = NULL, use.word.shape = NULL)
conn.context |
|
data |
|
used.cols |
In case (2), "xxx", "yyy" and "zzz" must be the column data of document ID, word position
and word respectively. |
label |
|
enet.lambda |
|
tol |
|
max.iter |
|
lbfgs.m |
|
use.class.feature |
|
use.word |
|
use.ngrams |
|
no.mid.ngrams |
|
max.ngram.length |
|
use.prev.word |
|
use.next.word |
|
use.disjunctive |
|
disjunction.width |
|
use.sequences |
|
use.prev.sequences |
|
use.type.seqs |
|
use.type.seqs2 |
|
use.type.ysequences |
|
use.word.shape |
|
thread.ratio |
|
R6Class
object.
Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data, such as sequences. It can be put into the general framework of maximum likelihood. In PAL, L-BFGS algorithms is adopted for for maximizing the (penalized) likelihood function.
A "CRF" object with the following attributes:
model: DataFrame
CRF model.
statistics: DataFrame
Summary of the CRF model traininig process.
optim.param: DataFrame
Optimal parameter of the CRF model.
Reserved for future use and currently empty.
## Not run: Input data for training a conditional random field model: > df DOC_ID WORD_POSITION WORD LABEL 1 1 1 RECORD O 2 1 2 #497321 O 3 1 3 78554939 O 4 1 4 | O 5 1 5 LRH O 6 1 6 | O 7 1 7 62413233 O 8 1 8 | O 9 1 9 | O 10 1 10 7368393 O 11 1 11 | O 12 1 12 6 O 13 1 13 g O 14 1 14 24 O 15 1 15 g O 16 1 16 2007 O 17 1 17 12:00:00 O 18 1 18 AM O 19 1 19 | O 20 1 20 INCACERATED O 21 1 21 HERNIA O 22 1 22 SEPSIS O 23 1 23 Unsigned O 24 1 24 Admission O 25 1 25 systolic O 26 1 26 less O 27 1 27 than O 28 1 28 90 O 29 1 29 heart O 30 1 30 PHYSICAL O 31 1 31 EXAMINATION O 32 1 32 : O 33 1 33 VITAL O 34 1 34 SIGNS O 35 1 35 : O 36 1 36 Blood O 37 1 37 pressure O 38 1 38 114g58 O 39 1 39 _ O 40 1 40 pulse O 41 1 41 68 O 42 1 42 _ O 43 1 43 respiratory O 44 1 44 rate O 45 1 45 20 O 46 1 46 _ O 47 1 47 she O 48 1 48 was O 49 1 49 afebrile O 50 1 50 at O 51 1 51 98.4 O 52 1 52 _ O 53 1 53 and O 54 1 54 O2 OxygenSaturation 55 1 55 saturation OxygenSaturation 56 1 56 96% OxygenSaturation 57 1 57 on OxygenSaturation 58 1 58 room OxygenSaturation 59 1 59 air OxygenSaturation 60 3 1 PHYSICAL O 61 3 2 EXAMINATION O 62 3 3 : O 63 3 4 VITAL O 64 3 5 SIGNS O 65 3 6 : O 66 3 7 Heart O 67 3 8 rate O 68 3 9 88 O 69 3 10 and O 70 3 11 irregularly O 71 3 12 irregular O 72 3 13 _ O 73 3 14 temperature O 74 3 15 100.6 O 75 3 16 _ O 76 3 17 pulse O 77 3 18 106 O 78 3 19 _ O 79 3 20 respiratory O 80 3 21 rate O 81 3 22 22 O 82 3 23 _ O 83 3 24 blood O 84 3 25 pressure O 85 3 26 108g64 O 86 3 27 98% OxygenSaturation 87 3 28 saturation OxygenSaturation 88 3 29 on OxygenSaturation 89 3 30 2 OxygenSaturation 90 3 31 liters OxygenSaturation 91 3 32 of OxygenSaturation 92 3 33 oxygen OxygenSaturation Create a CRF class instance, and train the model using the above data as input: > crf <- hanaml.CRF(conn.context = conn, data = df, thread.ratio = 1.0, enet.lambda = 0.1, max.iter = 1000, tol = 1e-4, use.word.shape = FALSE, lbfgs.m = 25) One can check detail of the training process from the \code{statistics} attribute of the above CRF instance: > crf$statistics STAT_NAME STAT_VALUE 1 obj 0.44251900977373015 2 iter 22 3 solution status Converged 4 numSentence 2 5 numWord 92 6 numFeatures 963 7 iter 1 obj=26.6557 8 iter 2 obj=14.8484 9 iter 3 obj=5.36967 10 iter 4 obj=2.4382 11 iter 5 obj=1.80108 12 iter 6 obj=1.24094 13 iter 7 obj=0.836052 14 iter 8 obj=0.584655 15 iter 9 obj=0.495138 16 iter 10 obj=0.463937 17 iter 11 obj=0.453706 18 iter 12 obj=0.447952 19 iter 13 obj=0.443964 20 iter 14 obj=0.442732 21 iter 15 obj=0.44254 22 iter 16 obj=0.442523 23 iter 17 obj=0.442519 24 iter 18 obj=0.442519 25 iter 19 obj=0.442519 26 iter 20 obj=0.442519 27 iter 21 obj=0.442519 28 iter 22 obj=0.442519 ## End(Not run)