R: hanaml DataFrame

hanaml.DataFrame {hana.ml.r}

R Documentation

hanaml DataFrame

Description

This module represents a database query as a DataFrame. Most operations are designed to never bring data back from the database unless explicitly asked for.

Usage

hanaml.DataFrame(conn.context = NULL,
                 select.statement = NULL,
                 name = NULL)

Arguments

`conn.context`	`ConnectionContext, optional` Contains a handle to a database connection.
`select.statement`	`character, optional` The sql query for the DataFrame.
`name`	`character, optional` Name of the DataFrame.

Format

An object of class R6ClassGenerator of length 24.

Value

Object of R6Class with methods for DataFrame that is backed by a database sql statement.

Methods

AddId(id)

Adds an ID column based on ROW_NUMBER() as the first column.

Usage: dataframe$AddId(id = "NEW_ID")
Arguments:

id: character, name of the added ID column.

Returns: DataFrame with an ID column based on ROW_NUMBER() built-in.

Alias(aliasName)

Returns a new DataFrame with an alias set.

Usage: NewDf <- dataframe$Alias('TABLE1')
Arguments:

aliasName: character, alias name of the DataFrame.

Returns: DataFrame with an alias set.

cast(cols, new.type)

Converts columns from one datatype to another specified datatype.

Usage: dataframe$cast("ID", new.type = "DOUBLE")
Arguments:

cols: list of characters, the columns to be converted.
new.type: character, the datatype to convert expression to.

Returns: DataFrame with new datatype.

Collect()

Copies this DataFrame to an R DataFrame.

Usage: dataframe$Collect()
Returns: R DataFrame containing this DataFrame's data.

Count()

Computes the number of rows in a DataFrame.

Usage: dataframe$Count()
Returns: integer, number of rows in the DataFrame.

Describe(cols=NULL)

Generate descriptive statistics that summarize the central tendency,
ispersion and shape of a dataset’s distribution.

Usage: dataframe$Describe()
Arguments:

cols: list of characters, optional, the columns to be summarized. Defaults to summmarize all columns.

Returns: DataFrame with descriptive statistics.

distinct(cols=NULL)

Return distinct values.

Usage: dataframe$distinct()
Arguments:

cols: list of characters, optional, name of columns which return distinct values.

Returns: DataFrame with distinct values.

Drop(cols)

Returns a new DataFrame after removing specified columns.

Usage: dataframe$Drop('colList')
Arguments:

cols: list of characters, list of column names to drop.

Returns: DataFrame, new DataFrame retaining only columns not in cols.

DropDuplicates(subset.dataframe=NULL)

Returns DataFrame with duplicate rows removed.

Usage: dataframe$DropDuplicates('subsetList')
Arguments:

subset.dataframe: list of characters, optional,
List of columns to consider when deciding whether rows are duplicates of each other. Defaults to all columns.

Returns: DataFrame with only one copy of duplicate rows.

DropNa(how = NULL, thresh = NULL, subset = NULL)

Returns a new DataFrame with NULLs removed.

Usage: dataframe$DropNa(how = 'any',thresh = 1,subset = 'subsetone')
Arguments:

how : ('any', 'all'), optional, if provided, 'any' eliminates rows with any NULLs, and 'all' eliminates rows that are entirely NULLs. If neither how nor thresh are provided, how defaults to 'any'.
thresh: integer ,optional, if provided, keep rows with at least thresh non-NULL values and drop rows with less. how and thresh cannot both be provided.
subset: list of characters, optional, columns to consider when looking for NULLs. Values in other columns will be ignored, NULL or not. Defaults to all columns in the DataFrame.

Returns: DataFrame with a select statement that removes NULLs.

dtypes(subset.col = NULL)

Return column names and their data types as a list.

Usage: dataframe$dtypes()
Arguments:

subset.col: list of characters, selected columns to show datatype.

Returns: list of column names and their data types .

FillNa(value, subset.dataframe = NULL)

Returns a DataFrame with NULLs replaced with the fill value. Only supports filling numeric columns.

Usage: dataframe$FillNa(0, 'col1')
Arguments:

value: integer or double, value to replace NULLs with. value should have type appropriate for the selected columns.
subset: character, Optional, list of columns in which to replace NULLs. Defaults to all columns.

Returns: DataFrame, new DataFrame with NULLs filled.

Filter(condition)

Selects rows matching the given condition. The condition string is not sanity-checked in any way. Do not take condition strings from untrusted input, as this can easily be used for SQL injection.

Usage: dataframe$Filter("select * from test where col1 = 'A'")
Arguments:

condition: character, condition to filter on. This should be in the format of a SQL WHERE clause test (not including the word "WHERE").

Returns: DataFrame with only rows matching the given condition.

GenerateColname(prefix = 'GEN_COL')

Generates a new column name for the DataFrame.

Usage: dataframe$GenerateColname('COL1')
Arguments:

prefix: character, optional, name of the column. If no name if provided, it creates a default column named 'GEN_COL'.

Returns: character, newly generated column name.

GetDf(select.statement, name = NULL)

Creates a new DataFrame.

Usage: dataframe$GetDf('SELECT * FROM TEMP;', NAME = 'DF1')
Arguments:

select.statement: character, Dataframe sql query
name: character, optional, Dataframe name

Returns: DataFrame

GetDfCounter()

Returns the number of DataFrame.
Usage: dataframe$GetDfCounter()

Returns: integer.

GetNRows()

Sets the value of DataFrame's nrows.df.

Usage: dataframe$GetNRows()
Returns: No return value.

Has(col)

Returns TRUE if a column is in the DataFrame.

Usage: dataframe$has('col1')
Arguments:

col: character Name of column to search in the projection list of this DataFrame.

Returns: logical\, TRUE if the column exists in the DataFrame's projection list.

Head(n = 1)

Returns a new DataFrame containing the first n rows of the DataFrame.

Usage: dataframe$head(n = 5)
Arguments:

n: integer (Optional) Number of rows to return. Defaults to 1.

Returns: DataFrame, new DataFrame of the first n rows of this DataFrame.

Join(other, on.expression, how = 'inner')

Returns a new DataFrame that is a join of this DataFrame with another DataFrame.

Usage: dataframe$Drop(other = DF1,on.expression = 'col',how = 'outer')
Arguments:

other: DataFrame The DataFrame to join with.
on.expression: character Join expression
how: ('inner', 'left', 'right', 'outer'), Optional Type of join. Defaults to 'inner'.

Returns: DataFrame, new DataFrame object that joins the current DataFrame with another DataFrame.

rename.columns(new.col.names)

Updates the column name.

Usage: dataframe$rename.columns(list("A", "C"))
Arguments:

new.col.names:list of characters List of new columns' name.

Returns: DataFrame with rename columns.

RunQuery(Query)

Performs the query.
Usage: b <- dataframe$RunQuery('select "target" from IRIS')
Arguments:

Query: character: sql statement.

Returns: DataFrame, new DataFrame generated by sql Query.

save(table, table.type = NULL, force = TRUE, schema = NULL)

Creates a table holding this DataFrame's data.

Usage: Save('TAB1','ROW')
Arguments:

table: character Table name. save() will fail if a conflicting table already exists.
table.type: character, optional, what kind of table to create. Case-insensitive. Can be one of "ROW", "COLUMN", "HISTORY COLUMN", "GLOBAL TEMPORARY", "GLOBAL TEMPORARY COLUMN", "LOCAL TEMPORARY", or "LOCAL TEMPORARY COLUMN".Defaults to "LOCAL TEMPORARY COLUMN" if 'where' starts with "#" and "COLUMN" otherwise.
force: logical, optional, if TRUE, the existed table will be replaced. Defaults to TRUE.
schema: character, optional, schema name. save() will fail if a conflicting table already exists.

Returns: DataFrame representing the new table.

Select(cols)

Returns a new DataFrame with columns derived from the current DataFrame.

Usage: dataframe$Select('col1') OR
col.list <- list('*','select')
cols <- sets::as.tuple(x = col.list)
dataframe$Select(cols)
Arguments:

cols: character or (character, character) tuple Columns of the new DataFrame. A string is treated as the name of a column to select; a (character, character) tuple is treated as (SQL expression, alias). As a special case, '*' is expanded to all columns of the original DataFrame.

Returns: DataFrame, new DataFrame object with the specified columns projected.

Sort(cols, desc = FALSE)

Returns a new DataFrame sorted by the specified columns.

Usage: dataframe$Sort('COL1')
Arguments:

cols: list of characters, list of columns to sort by. Must be a list, even for sorting by one column.
desc: logical, Optional, TRUE to sort in descending order, FALSE for ascending order. Defaults to FALSE.

Returns: DataFrame, new DataFrame object with rows in sorted order.

WithColumnRenamed(original, newName)

Returns a DataFrame with a new name for one column.

Usage: dataframe$WithColumnRenamed('col1','colnew')
Arguments:

original: character, original column name.
newName: character, new column name.

Returns: DataFrame, the same data as this DataFrame, with one changed column name.

[Package hana.ml.r version 1.0.8 Index]