hanaml DataFrame — hanaml.DataFrame • hana.ml.r

This module represents a database query as a DataFrame. Most operations are designed to never bring data back from the database unless explicitly asked for.

hanaml.DataFrame(
  connection.context = NULL,
  select.statement = NULL,
  name = NULL
)

Arguments

connection.context: ConnectionContext
SAP HANA Database connection object.
select.statement: character, optional
The sql query for the DataFrame.
name: character, optional
Name of the DataFrame.

Value

Object of R6Class with methods for DataFrame that is backed by a database sql statement.

Methods

AddId(id, ref.col=NULL, order="ASC")

Adds an ID column based on ROW_NUMBER() as the first column, with possibly specified reference column and order.

Usage: dataframe$AddId(id="NEW_ID")
Arguments:

id: character
Name of the added ID column.
ref.col, character, optional
Name name of the reference column.
No default value.
order, c("ASC", "DESC")
The specified order for sorting w.r.t. the reference column.
Valid only when ref.col is specified.
Defaults to "ASC".

Returns: DataFrame with an ID column.

Alias(aliasName)

Returns a new DataFrame with an alias set.

Usage: NewDf <- dataframe$Alias("TABLE1")
Arguments:

aliasName: character
Aalias name of the DataFrame.

Returns: DataFrame with an alias set.

cast(cols, new.type)

Converts columns from one datatype to another specified datatype.

Usage: dataframe$cast("ID", new.type = "DOUBLE")
Arguments:

cols: list of characters
The columns to be converted.
new.type: character
The datatype to convert expression to.

Returns: DataFrame with new datatype.

Collect()

Copies this DataFrame to an R DataFrame.

Usage: dataframe$Collect()
Returns: R DataFrame containing this DataFrame's data.

Count()

Computes the number of rows in a DataFrame.

Usage: dataframe$Count()
Returns: integer, number of rows in the DataFrame.

Describe(cols=NULL)

Generate descriptive statistics that summarize the central tendency,
dispersion and shape of a dataset’s distribution.

Usage: dataframe$Describe()
Arguments:

cols: list of characters, optional
The columns to be summarized.
Defaults to summmarize all columns.

Returns: DataFrame with descriptive statistics.

distinct(cols=NULL)

Return distinct values.

Usage: dataframe$distinct()
Arguments:

cols: list of characters, optional
Name of columns which return distinct values.

Returns: DataFrame with distinct values.

Drop(cols)

Returns a new DataFrame after removing specified columns.

Usage: dataframe$Drop("colList")
Arguments:

cols: list of characters
List of column names to drop.

Returns: DataFrame
New DataFrame retaining only columns not in cols.

DropDuplicates(subset.dataframe=NULL)

Returns DataFrame with duplicate rows removed.

Usage: dataframe$DropDuplicates("subsetList")
Arguments:

subset.dataframe: list of characters, optional
List of columns to consider when deciding whether rows are duplicates of each other. Defaults to all columns.

Returns: DataFrame with only one copy of duplicate rows.

DropNa(how = NULL, thresh = NULL, subset = NULL)

Returns a new DataFrame with NULLs removed.

Usage: dataframe$DropNa(how = "any", thresh = 1,subset = "subsetone")
Arguments:

how : c("any", "all"), optional
If provided, "any" eliminates rows with any NULLs, and "all" eliminates rows that are entirely NULLs.
If neither how nor thresh are provided, how defaults to "any".
thresh: integer, optional
If provided, keep rows with at least thresh non-NULL values and drop rows with less.
how and thresh cannot both be provided.
subset: list of characters, optional
Columns to consider when looking for NULLs. Values in other columns will be ignored, NULL or not.
Defaults to all columns in the DataFrame.

Returns: DataFrame with a select statement that removes NULLs.

dtypes(subset.col = NULL)

Return column names and their data types as a list.

Usage: dataframe$dtypes()
Arguments:

subset.col: list of characters
Selected columns to show datatype.

Returns: List of column names and their data types .

FillNa(value, subset.dataframe = NULL)

Returns a DataFrame with NULLs replaced with the fill value. Only supports filling numeric columns.

Usage: dataframe$FillNa(0, "col1")
Arguments:

value: integer or double
Value to replace NULLs with. value should have type appropriate for the selected columns.
subset: character, optional
List of columns in which to replace NULLs. Defaults to all columns.

Returns: DataFrame, new DataFrame with NULLs filled.

Filter(condition)

Selects rows matching the given condition. The condition string is not sanity-checked in any way. Do not take condition strings from untrusted input, as this can easily be used for SQL injection.

Usage: dataframe$Filter("select * from test where col1 = 'A'")
Arguments:

condition: character
Condition to filter on.
This should be in the format of a SQL WHERE clause test (not including the word "WHERE").

Returns: DataFrame with only rows matching the given condition.

GenerateColname(prefix = "GEN_COL")

Generates a new column name for the DataFrame.
Usage: dataframe$GenerateColname("COL1")
Arguments:

prefix: character, optional
Name of the column. If no name if provided, it creates a default column named "GEN_COL".

Returns: character, newly generated column name.

GetDf(select.statement, name = NULL)

Creates a new DataFrame.

Usage: dataframe$GetDf("SELECT * FROM TEMP;", NAME = "DF1")
Arguments:

select.statement: character
Dataframe sql query.
name: character, optional
Dataframe name

Returns: DataFrame

GetDfCounter()

Returns the number of DataFrame.
Usage: dataframe$GetDfCounter()

Returns: integer.

GetTableStructure()

Returns the struecture of the table.
Usage: dataframe$GetTableStructure()

Returns: list.

GetNRows()

Sets the value of DataFrame's nrows.df.

Usage: dataframe$GetNRows()
Returns: No return value.

Has(col)

Returns TRUE if a column is in the DataFrame.

Usage: dataframe$Has("col1")
Arguments:

col:character
Name of column to search in the projection list of this DataFrame.

Returns: logical TRUE if the column exists in the DataFrame's projection list.

Head(n=1)

Returns a DataFrame of the first n rows in the current DataFrame.
Usage: dataframe$Head(5)
Arguments:

n: integer, optional
The number of rows returned. Defaults to 1.

Returns: A new DataFrame of the first n rows of the current DataFrame.

Join(other, on.expression, how = "inner")

Returns a new DataFrame that is a join of this DataFrame with another DataFrame.

Usage: dataframe$Join(other = DF1, on.expression = "col", how = "outer")
Arguments:

other: DataFrame
The DataFrame to join with.
on.expression: character
Join expression
how: c("inner", "left", "right", "outer"), Optional
Type of join. Defaults to "inner".

Returns: DataFrame
A new DataFrame object that joins the current DataFrame with another DataFrame.

PivotTable(values, index, columns, aggfunc = "avg")

Returns a DataFrame that gives the pivoted table. aggfunc is identical to SAP HANA aggregate functions.

Usage: dataframe$PivotTable(values="C2", index="C1", columns="C3", aggfunc="max")
Arguments:

values: character or list of characters
The targeted values for pivoting.
index: character or list of characters
The index of the DataFrame.
columns: character or list of characters
The pivoting columns.
aggfunc: c("avg", "max", "min", ...), optional
aggfunc is identical to SAP HANA aggregate functions. Defaults to "avg".

Returns: DataFrame
A pivoted DataFrame.

rename.columns(new.col.names)

Updates the column name.

Usage: dataframe$rename.columns(list("A", "C"))
Arguments:

new.col.names: list of characters
List of new columns' name.

Returns: DataFrame with rename columns.

RunQuery(Query)

Performs the query.
Usage: b <- dataframe$RunQuery('select "target" from IRIS')
Arguments:

Query: character
SQL statement.

Returns: DataFrame
New DataFrame generated by sql Query.

save(table, table.type = NULL, force = TRUE, schema = NULL, append = FALSE, data.lake = FALSE, data.lake.container = "SYSRDL#CG")

Creates a table holding this DataFrame's data.

Usage: Save("TAB1", "ROW")
Arguments:

table: character Table name. save() will fail if a conflicting table already exists.
table.type: character, optional
What kind of table to create, case-insensitive.
Can be one of the following:
"ROW", "COLUMN", "HISTORY COLUMN", "GLOBAL TEMPORARY", "GLOBAL TEMPORARY COLUMN", "LOCAL TEMPORARY", or "LOCAL TEMPORARY COLUMN".
Defaults to "LOCAL TEMPORARY COLUMN" if table starts with "#" and "COLUMN" otherwise.
force: logical, optional
If TRUE, the existed table will be replaced.
Defaults to TRUE.
schema: character, optional
Schema name. save() will fail if a conflicting table already exists.
append: logical, optional
If set as TRUE, it will use the existing table and append data to it.
Defaults to FALSE.
data.lake: logical, optional
If TRUE, it will save the table to SAP HANA data lake.
Defaults to FALSE.
data.lake.container, character, optional
Name of the SAP HANA data lake container for saving the table.
Valid only when data.lake is set as TRUE.
Defaults to "SYSRDL#CG".

Returns: DataFrame representing the new table.

Select(cols)

Returns a new DataFrame with columns derived from the current DataFrame.

Usage: dataframe$Select("col1") OR
col.list <- list("*", "select")
cols <- sets::as.tuple(x = col.list)
dataframe$Select(cols)
Arguments:

cols: character or (character, character) tuple
Columns of the new DataFrame.
A string is treated as the name of a column to select; a (character, character) tuple is treated as (SQL expression, alias).
As a special case, "*" is expanded to all columns of the original DataFrame.

Returns: DataFrame
New DataFrame object with the specified columns projected.

Sort(cols, desc = FALSE)

Returns a new DataFrame sorted by the specified columns.

Usage: dataframe$Sort("COL1")
Arguments:

cols: list of characters
List of columns to sort by.
Must be a list, even for sorting by one column.
desc: logical, Optional
TRUE to sort in descending order, FALSE for ascending order.
Defaults to FALSE.

Returns: DataFrame
New DataFrame object with rows in sorted order.

WithColumnRenamed(original, newName)

Returns a DataFrame with a new name for one column.

Usage: dataframe$WithColumnRenamed("col1", "colnew")
Arguments:

original: character
Original column name.
newName: character
New column name.

Returns: DataFrame
The same data as this DataFrame, with one changed column name.