| hanaml.CRF {hana.ml.r} | R Documentation |
hanaml.CRF is an R wrapper for PAL conditional random field algorithm.
hanaml.CRF(conn.context, data = NULL,
used.cols = NULL, label = NULL, enet.lambda = NULL,
tol = NULL, max.iter = NULL,
lbfgs.m = NULL, thread.ratio = NULL,
use.class.feature = NULL, use.word = NULL,
use.ngrams = NULL, no.mid.ngrams = NULL,
max.ngram.length = NULL, use.prev.word = NULL,
use.next.word = NULL, use.disjunctive = NULL,
disjunction.width = NULL, use.sequences = NULL,
use.prev.sequences = NULL, use.type.seqs = NULL,
use.type.seqs2 = NULL, use.type.ysequences = NULL,
use.word.shape = NULL)
conn.context |
|
data |
|
used.cols |
In case (2), "xxx", "yyy" and "zzz" must be the column data of document ID, word position
and word respectively. |
label |
|
enet.lambda |
|
tol |
|
max.iter |
|
lbfgs.m |
|
use.class.feature |
|
use.word |
|
use.ngrams |
|
no.mid.ngrams |
|
max.ngram.length |
|
use.prev.word |
|
use.next.word |
|
use.disjunctive |
|
disjunction.width |
|
use.sequences |
|
use.prev.sequences |
|
use.type.seqs |
|
use.type.seqs2 |
|
use.type.ysequences |
|
use.word.shape |
|
thread.ratio |
|
R6Class object.
Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data, such as sequences. It can be put into the general framework of maximum likelihood. In PAL, L-BFGS algorithms is adopted for for maximizing the (penalized) likelihood function.
A "CRF" object with the following attributes:
model: DataFrame CRF model.
statistics: DataFrame Summary of the CRF model traininig process.
optim.param: DataFrame Optimal parameter of the CRF model.
Reserved for future use and currently empty.
## Not run:
Input data for training a conditional random field model:
> df
DOC_ID WORD_POSITION WORD LABEL
1 1 1 RECORD O
2 1 2 #497321 O
3 1 3 78554939 O
4 1 4 | O
5 1 5 LRH O
6 1 6 | O
7 1 7 62413233 O
8 1 8 | O
9 1 9 | O
10 1 10 7368393 O
11 1 11 | O
12 1 12 6 O
13 1 13 g O
14 1 14 24 O
15 1 15 g O
16 1 16 2007 O
17 1 17 12:00:00 O
18 1 18 AM O
19 1 19 | O
20 1 20 INCACERATED O
21 1 21 HERNIA O
22 1 22 SEPSIS O
23 1 23 Unsigned O
24 1 24 Admission O
25 1 25 systolic O
26 1 26 less O
27 1 27 than O
28 1 28 90 O
29 1 29 heart O
30 1 30 PHYSICAL O
31 1 31 EXAMINATION O
32 1 32 : O
33 1 33 VITAL O
34 1 34 SIGNS O
35 1 35 : O
36 1 36 Blood O
37 1 37 pressure O
38 1 38 114g58 O
39 1 39 _ O
40 1 40 pulse O
41 1 41 68 O
42 1 42 _ O
43 1 43 respiratory O
44 1 44 rate O
45 1 45 20 O
46 1 46 _ O
47 1 47 she O
48 1 48 was O
49 1 49 afebrile O
50 1 50 at O
51 1 51 98.4 O
52 1 52 _ O
53 1 53 and O
54 1 54 O2 OxygenSaturation
55 1 55 saturation OxygenSaturation
56 1 56 96% OxygenSaturation
57 1 57 on OxygenSaturation
58 1 58 room OxygenSaturation
59 1 59 air OxygenSaturation
60 3 1 PHYSICAL O
61 3 2 EXAMINATION O
62 3 3 : O
63 3 4 VITAL O
64 3 5 SIGNS O
65 3 6 : O
66 3 7 Heart O
67 3 8 rate O
68 3 9 88 O
69 3 10 and O
70 3 11 irregularly O
71 3 12 irregular O
72 3 13 _ O
73 3 14 temperature O
74 3 15 100.6 O
75 3 16 _ O
76 3 17 pulse O
77 3 18 106 O
78 3 19 _ O
79 3 20 respiratory O
80 3 21 rate O
81 3 22 22 O
82 3 23 _ O
83 3 24 blood O
84 3 25 pressure O
85 3 26 108g64 O
86 3 27 98% OxygenSaturation
87 3 28 saturation OxygenSaturation
88 3 29 on OxygenSaturation
89 3 30 2 OxygenSaturation
90 3 31 liters OxygenSaturation
91 3 32 of OxygenSaturation
92 3 33 oxygen OxygenSaturation
Create a CRF class instance, and train the model using the above data as input:
> crf <- hanaml.CRF(conn.context = conn, data = df, thread.ratio = 1.0,
enet.lambda = 0.1, max.iter = 1000, tol = 1e-4,
use.word.shape = FALSE, lbfgs.m = 25)
One can check detail of the training process from the \code{statistics} attribute of
the above CRF instance:
> crf$statistics
STAT_NAME STAT_VALUE
1 obj 0.44251900977373015
2 iter 22
3 solution status Converged
4 numSentence 2
5 numWord 92
6 numFeatures 963
7 iter 1 obj=26.6557
8 iter 2 obj=14.8484
9 iter 3 obj=5.36967
10 iter 4 obj=2.4382
11 iter 5 obj=1.80108
12 iter 6 obj=1.24094
13 iter 7 obj=0.836052
14 iter 8 obj=0.584655
15 iter 9 obj=0.495138
16 iter 10 obj=0.463937
17 iter 11 obj=0.453706
18 iter 12 obj=0.447952
19 iter 13 obj=0.443964
20 iter 14 obj=0.442732
21 iter 15 obj=0.44254
22 iter 16 obj=0.442523
23 iter 17 obj=0.442519
24 iter 18 obj=0.442519
25 iter 19 obj=0.442519
26 iter 20 obj=0.442519
27 iter 21 obj=0.442519
28 iter 22 obj=0.442519
## End(Not run)