hanaml.CRF {hana.ml.r}R Documentation

Conditional Random Field

Description

hanaml.CRF is an R wrapper for PAL conditional random field algorithm.

Usage

hanaml.CRF(conn.context, data = NULL,
           used.cols = NULL, label = NULL, enet.lambda = NULL,
           tol = NULL, max.iter = NULL,
           lbfgs.m = NULL, thread.ratio = NULL,
           use.class.feature = NULL, use.word = NULL,
           use.ngrams = NULL, no.mid.ngrams = NULL,
           max.ngram.length = NULL, use.prev.word = NULL,
           use.next.word = NULL, use.disjunctive = NULL,
           disjunction.width = NULL, use.sequences = NULL,
           use.prev.sequences = NULL, use.type.seqs = NULL,
           use.type.seqs2 = NULL, use.type.ysequences = NULL,
           use.word.shape = NULL)

Arguments

conn.context

ConnectionContext
Connection to the SAP HANA system.

data

DataFrame
DataFrame that contains the training data for conditional random field.

used.cols

list of character, optional
This parameter specifies the three columns used for training a conditional random field model. Namely, one column should correspond to Document ID, another column should correspond to word position, and a 3rd column corresponds to word. If not NULL, this parameter should be specified in two ways:

  • (1) used.cols = list(document.id = "xxx", word.pos = "yyy", word = "zzz")

  • (2) used.cols = list("xxx", "yyy", "zzz")

In case (2), "xxx", "yyy" and "zzz" must be the column data of document ID, word position and word respectively.
Defaults to the first three non-label columns of data if not provided.

label

character, optional Specified the column in data that corresponds to class labels. Defaults to the last column of data if not provided.

enet.lambda

numeric, optional
Elastic-net penalization weight. The value should be greater than 0. Defaults to 1.0.

tol

numeric, optional
Convergence tolerance in optimization(i.e. l-bfgs algorithm). Defaults to 1e-4.

max.iter

integer, optional
Maximum number of iterations in optimization(i.e. l-bfgs algorithm). Defaults to 1000.

lbfgs.m

integer, optional
Number of previous memories to keep for l-bfgs algorithm. Defaults to 25.

use.class.feature

logical, optional
Whether to include a feature for the class or not, the same as having a bias vector in the model. Defaults to TRUE.

use.word

logical, optional
Whether to use the feature for current word or not. Defaults to TRUE.

use.ngrams

logical, optional
Whether or not to make feature from letter n-grams(i.e. substrings of the word). Defaults to TRUE.

no.mid.ngrams

logical, optional
TRUE means not to include character n-gram features for n-grams that contain neither the beginning nor the end of the word. Defaults to TRUE

max.ngram.length

integer, optional
Threshold for the size of n-grams to be used in the model. Must be positive. Defaults to 6.

use.prev.word

logical, optional
Whether to make a feature from both the current word and the previous word. Defaults to TRUE.

use.next.word

logical, optional]
Whether to make a feature from both the current word and its next word. Defauls to TRUE.

use.disjunctive

logical, optional
Whether to include in features giving disjunctions of words anywhere in left or right disjunction.width words. Defaults to TRUE.

disjunction.width

logical, optional
See use.disjunctive. Defaults to 4.

use.sequences

logical, optional
Whether or not to use class combination features. Defaults to TRUE.

use.prev.sequences

logical, optional
Whether or not to use any class combination features using the previous class. Defaults to TRUE.

use.type.seqs

logical, optional
Whether to use basic 0th order word shape features or not. Defaults to TRUE.

use.type.seqs2

logical, optional
Whethr to use additional 1st and 2nd order word shape features. Defaults to TRUE.

use.type.ysequences

logical, optional
Whehter or not to use some first order word shape patterns. Defaults to TRUE.

use.word.shape

logical, optional
Whether or not to use word shape(e.g. capitalized or numeric). Only supports chris2UseLC currently. Defaults to FALSE.

thread.ratio

numeric, optional Specifies the ratio of total number of threads that can be used by this function. The range of this parameter is from 0 to 1, where 0 means only using 1 thread, and 1 means using at most all the currently available threads. Values outside this range are ignored and this function heuristically determines the number of threads to use.

Format

R6Class object.

Details

Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data, such as sequences. It can be put into the general framework of maximum likelihood. In PAL, L-BFGS algorithms is adopted for for maximizing the (penalized) likelihood function.

Value

A "CRF" object with the following attributes:

Examples

## Not run: 
 Input data for training a conditional random field model:

    > df
       DOC_ID WORD_POSITION        WORD            LABEL
    1       1             1      RECORD                O
    2       1             2     #497321                O
    3       1             3    78554939                O
    4       1             4           |                O
    5       1             5         LRH                O
    6       1             6           |                O
    7       1             7    62413233                O
    8       1             8           |                O
    9       1             9           |                O
    10      1            10     7368393                O
    11      1            11           |                O
    12      1            12           6                O
    13      1            13           g                O
    14      1            14          24                O
    15      1            15           g                O
    16      1            16        2007                O
    17      1            17    12:00:00                O
    18      1            18          AM                O
    19      1            19           |                O
    20      1            20 INCACERATED                O
    21      1            21      HERNIA                O
    22      1            22      SEPSIS                O
    23      1            23    Unsigned                O
    24      1            24   Admission                O
    25      1            25    systolic                O
    26      1            26        less                O
    27      1            27        than                O
    28      1            28          90                O
    29      1            29       heart                O
    30      1            30    PHYSICAL                O
    31      1            31 EXAMINATION                O
    32      1            32           :                O
    33      1            33       VITAL                O
    34      1            34       SIGNS                O
    35      1            35           :                O
    36      1            36       Blood                O
    37      1            37    pressure                O
    38      1            38      114g58                O
    39      1            39           _                O
    40      1            40       pulse                O
    41      1            41          68                O
    42      1            42           _                O
    43      1            43 respiratory                O
    44      1            44        rate                O
    45      1            45          20                O
    46      1            46           _                O
    47      1            47         she                O
    48      1            48         was                O
    49      1            49    afebrile                O
    50      1            50          at                O
    51      1            51        98.4                O
    52      1            52           _                O
    53      1            53         and                O
    54      1            54          O2 OxygenSaturation
    55      1            55  saturation OxygenSaturation
    56      1            56         96% OxygenSaturation
    57      1            57          on OxygenSaturation
    58      1            58        room OxygenSaturation
    59      1            59         air OxygenSaturation
    60      3             1    PHYSICAL                O
    61      3             2 EXAMINATION                O
    62      3             3           :                O
    63      3             4       VITAL                O
    64      3             5       SIGNS                O
    65      3             6           :                O
    66      3             7       Heart                O
    67      3             8        rate                O
    68      3             9          88                O
    69      3            10         and                O
    70      3            11 irregularly                O
    71      3            12   irregular                O
    72      3            13           _                O
    73      3            14 temperature                O
    74      3            15       100.6                O
    75      3            16           _                O
    76      3            17       pulse                O
    77      3            18         106                O
    78      3            19           _                O
    79      3            20 respiratory                O
    80      3            21        rate                O
    81      3            22          22                O
    82      3            23           _                O
    83      3            24       blood                O
    84      3            25    pressure                O
    85      3            26      108g64                O
    86      3            27         98% OxygenSaturation
    87      3            28  saturation OxygenSaturation
    88      3            29          on OxygenSaturation
    89      3            30           2 OxygenSaturation
    90      3            31      liters OxygenSaturation
    91      3            32          of OxygenSaturation
    92      3            33      oxygen OxygenSaturation

    Create a CRF class instance, and train the model using the above data as input:

    > crf <- hanaml.CRF(conn.context = conn, data = df, thread.ratio = 1.0,
                        enet.lambda = 0.1, max.iter = 1000, tol = 1e-4,
                        use.word.shape = FALSE, lbfgs.m = 25)

   One can check detail of the training process from the \code{statistics} attribute of
   the above CRF instance:
    > crf$statistics
             STAT_NAME          STAT_VALUE
    1              obj 0.44251900977373015
    2             iter                  22
    3  solution status           Converged
    4      numSentence                   2
    5          numWord                  92
    6      numFeatures                 963
    7           iter 1         obj=26.6557
    8           iter 2         obj=14.8484
    9           iter 3         obj=5.36967
    10          iter 4          obj=2.4382
    11          iter 5         obj=1.80108
    12          iter 6         obj=1.24094
    13          iter 7        obj=0.836052
    14          iter 8        obj=0.584655
    15          iter 9        obj=0.495138
    16         iter 10        obj=0.463937
    17         iter 11        obj=0.453706
    18         iter 12        obj=0.447952
    19         iter 13        obj=0.443964
    20         iter 14        obj=0.442732
    21         iter 15         obj=0.44254
    22         iter 16        obj=0.442523
    23         iter 17        obj=0.442519
    24         iter 18        obj=0.442519
    25         iter 19        obj=0.442519
    26         iter 20        obj=0.442519
    27         iter 21        obj=0.442519
    28         iter 22        obj=0.442519

## End(Not run)


[Package hana.ml.r version 1.0.8 Index]