hanaml DataFrame — hanaml.DataFrame • hana.ml.r

This module represents a database query as a DataFrame. Most operations are designed to never bring data back from the database unless explicitly asked for.

hanaml.DataFrame(
  connection.context = NULL,
  select.statement = NULL,
  name = NULL
)

Arguments

connection.context	`ConnectionContext` SAP HANA Database connection object.
select.statement	`character, optional` The sql query for the DataFrame.
name	`character, optional` Name of the DataFrame.

Value

Object of R6Class with methods for DataFrame that is backed by a database sql statement.

Methods

AddId(id)

Adds an ID column based on ROW_NUMBER() as the first column.
Usage: dataframe$AddId(id = "NEW_ID")
Arguments:

id: character, name of the added ID column.

Returns: DataFrame with an ID column based on ROW_NUMBER() built-in.

Alias(aliasName)

Returns a new DataFrame with an alias set.
Usage: NewDf <- dataframe$Alias("TABLE1")
Arguments:

aliasName: character, alias name of the DataFrame.

Returns: DataFrame with an alias set.

cast(cols, new.type)

Converts columns from one datatype to another specified datatype.
Usage: dataframe$cast("ID", new.type = "DOUBLE")
Arguments:

cols: list of characters, the columns to be converted.
new.type: character, the datatype to convert expression to.

Returns: DataFrame with new datatype.

Collect()

Copies this DataFrame to an R DataFrame.
Usage: dataframe$Collect()
Returns: R DataFrame containing this DataFrame's data.

Count()

Computes the number of rows in a DataFrame.
Usage: dataframe$Count()
Returns: integer, number of rows in the DataFrame.

Describe(cols=NULL)

Generate descriptive statistics that summarize the central tendency,
ispersion and shape of a dataset’s distribution.
Usage: dataframe$Describe()
Arguments:

cols: list of characters, optional, the columns to be summarized. Defaults to summmarize all columns.

Returns: DataFrame with descriptive statistics.

distinct(cols=NULL)

Return distinct values.
Usage: dataframe$distinct()
Arguments:

cols: list of characters, optional, name of columns which return distinct values.

Returns: DataFrame with distinct values.

Drop(cols)

Returns a new DataFrame after removing specified columns.
Usage: dataframe$Drop("colList")
Arguments:

cols: list of characters, list of column names to drop.

Returns: DataFrame, new DataFrame retaining only columns not in cols.

DropDuplicates(subset.dataframe=NULL)

Returns DataFrame with duplicate rows removed.
Usage: dataframe$DropDuplicates("subsetList")
Arguments:

subset.dataframe: list of characters, optional,
List of columns to consider when deciding whether rows are duplicates of each other. Defaults to all columns.

Returns: DataFrame with only one copy of duplicate rows.

DropNa(how = NULL, thresh = NULL, subset = NULL)

Returns a new DataFrame with NULLs removed.
Usage: dataframe$DropNa(how = "any", thresh = 1,subset = "subsetone")
Arguments:

how : ("any", "all"), optional, if provided, "any" eliminates rows with any NULLs, and "all" eliminates rows that are entirely NULLs. If neither how nor thresh are provided, how defaults to "any".
thresh: integer, optional, if provided, keep rows with at least thresh non-NULL values and drop rows with less. how and thresh cannot both be provided.
subset: list of characters, optional, columns to consider when looking for NULLs. Values in other columns will be ignored, NULL or not. Defaults to all columns in the DataFrame.

Returns: DataFrame with a select statement that removes NULLs.

dtypes(subset.col = NULL)

Return column names and their data types as a list.
Usage: dataframe$dtypes()
Arguments:

subset.col: list of characters, selected columns to show datatype.

Returns: list of column names and their data types .

FillNa(value, subset.dataframe = NULL)

Returns a DataFrame with NULLs replaced with the fill value. Only supports filling numeric columns.
Usage: dataframe$FillNa(0, "col1")
Arguments:

value: integer or double, value to replace NULLs with. value should have type appropriate for the selected columns.
subset: character, optional, list of columns in which to replace NULLs. Defaults to all columns.

Returns: DataFrame, new DataFrame with NULLs filled.

Filter(condition)

Selects rows matching the given condition. The condition string is not sanity-checked in any way. Do not take condition strings from untrusted input, as this can easily be used for SQL injection.
Usage: dataframe$Filter("select * from test where col1 = 'A'")
Arguments:

condition: character, condition to filter on. This should be in the format of a SQL WHERE clause test (not including the word "WHERE").

Returns: DataFrame with only rows matching the given condition.

GenerateColname(prefix = "GEN_COL")

Generates a new column name for the DataFrame.
Usage: dataframe$GenerateColname("COL1")
Arguments:

prefix: character, optional, name of the column. If no name if provided, it creates a default column named "GEN_COL".

Returns: character, newly generated column name.

GetDf(select.statement, name = NULL)

Creates a new DataFrame.
Usage: dataframe$GetDf("SELECT * FROM TEMP;", NAME = "DF1")
Arguments:

select.statement: character, Dataframe sql query
name: character, optional, Dataframe name

Returns: DataFrame

GetDfCounter()

Returns the number of DataFrame.
Usage: dataframe$GetDfCounter()
Returns: integer.

GetNRows()

Sets the value of DataFrame's nrows.df.
Usage: dataframe$GetNRows()
Returns: No return value.

Has(col)

Returns TRUE if a column is in the DataFrame.
Usage: dataframe$Has("col1")
Arguments:

col: character Name of column to search in the projection list of this DataFrame.

Returns: logical, TRUE if the column exists in the DataFrame's projection list.

Join(other, on.expression, how = "inner")

Returns a new DataFrame that is a join of this DataFrame with another DataFrame.
Usage: dataframe$Join(other = DF1, on.expression = "col", how = "outer")
Arguments:

other: DataFrame The DataFrame to join with.
on.expression: character Join expression
how: ("inner", "left", "right", "outer"), Optional Type of join. Defaults to "inner".

Returns: DataFrame, new DataFrame object that joins the current DataFrame with another DataFrame.

rename.columns(new.col.names)

Updates the column name.
Usage: dataframe$rename.columns(list("A", "C"))
Arguments:

new.col.names:list of characters List of new columns' name.

Returns: DataFrame with rename columns.

RunQuery(Query)

Performs the query.
Usage: b <- dataframe$RunQuery('select "target" from IRIS')
Arguments:

Query: character: sql statement.

Returns: DataFrame, new DataFrame generated by sql Query.

save(table, table.type = NULL, force = TRUE, schema = NULL)

Creates a table holding this DataFrame's data.
Usage: Save("TAB1", "ROW")
Arguments:

table: character Table name. save() will fail if a conflicting table already exists.
table.type: character, optional, what kind of table to create. Case-insensitive. Can be one of "ROW", "COLUMN", "HISTORY COLUMN", "GLOBAL TEMPORARY", "GLOBAL TEMPORARY COLUMN", "LOCAL TEMPORARY", or "LOCAL TEMPORARY COLUMN".Defaults to "LOCAL TEMPORARY COLUMN" if `where` starts with "#" and "COLUMN" otherwise.
force: logical, optional, if TRUE, the existed table will be replaced. Defaults to TRUE.
schema: character, optional, schema name. save() will fail if a conflicting table already exists.

Returns: DataFrame representing the new table.

Select(cols)

Returns a new DataFrame with columns derived from the current DataFrame.
Usage: dataframe$Select("col1") OR
col.list <- list("*", "select")
cols <- sets::as.tuple(x = col.list)
dataframe$Select(cols)
Arguments:

cols: character or (character, character) tuple Columns of the new DataFrame. A string is treated as the name of a column to select; a (character, character) tuple is treated as (SQL expression, alias). As a special case, "*" is expanded to all columns of the original DataFrame.

Returns: DataFrame, new DataFrame object with the specified columns projected.

Sort(cols, desc = FALSE)

Returns a new DataFrame sorted by the specified columns.
Usage: dataframe$Sort("COL1")
Arguments:

cols: list of characters, list of columns to sort by. Must be a list, even for sorting by one column.
desc: logical, Optional, TRUE to sort in descending order, FALSE for ascending order. Defaults to FALSE.

Returns: DataFrame, new DataFrame object with rows in sorted order.

WithColumnRenamed(original, newName)

Returns a DataFrame with a new name for one column.
Usage: dataframe$WithColumnRenamed("col1", "colnew")
Arguments:

original: character, original column name.
newName: character, new column name.

Returns: DataFrame, the same data as this DataFrame, with one changed column name.