This module represents a database query as a DataFrame. Most operations are designed to never bring data back from the database unless explicitly asked for.

hanaml.DataFrame(
  connection.context = NULL,
  select.statement = NULL,
  name = NULL
)

Arguments

connection.context

ConnectionContext
SAP HANA Database connection object.

select.statement

character, optional
The sql query for the DataFrame.

name

character, optional
Name of the DataFrame.

Value

Object of R6Class with methods for DataFrame that is backed by a database sql statement.

Methods

AddId(id, ref.col=NULL, order="ASC")

Adds an ID column based on ROW_NUMBER() as the first column, with possibly specified reference column and order.

Usage: dataframe$AddId(id="NEW_ID")
Arguments:

  • id: character
    Name of the added ID column.

  • ref.col, character, optional
    Name name of the reference column.
    No default value.

  • order, c("ASC", "DESC")
    The specified order for sorting w.r.t. the reference column.
    Valid only when ref.col is specified.
    Defaults to "ASC".

Returns: DataFrame with an ID column.

Alias(aliasName)

Returns a new DataFrame with an alias set.

Usage: NewDf <- dataframe$Alias("TABLE1")
Arguments:

  • aliasName: character
    Aalias name of the DataFrame.

Returns: DataFrame with an alias set.

cast(cols, new.type)

Converts columns from one datatype to another specified datatype.

Usage: dataframe$cast("ID", new.type = "DOUBLE")
Arguments:

  • cols: list of characters
    The columns to be converted.

  • new.type: character
    The datatype to convert expression to.

Returns: DataFrame with new datatype.

Collect()

Copies this DataFrame to an R DataFrame.

Usage: dataframe$Collect()
Returns: R DataFrame containing this DataFrame's data.

Count()

Computes the number of rows in a DataFrame.

Usage: dataframe$Count()
Returns: integer, number of rows in the DataFrame.

Describe(cols=NULL)

Generate descriptive statistics that summarize the central tendency,
dispersion and shape of a dataset’s distribution.

Usage: dataframe$Describe()
Arguments:

  • cols: list of characters, optional
    The columns to be summarized.
    Defaults to summmarize all columns.

Returns: DataFrame with descriptive statistics.

distinct(cols=NULL)

Return distinct values.

Usage: dataframe$distinct()
Arguments:

  • cols: list of characters, optional
    Name of columns which return distinct values.

Returns: DataFrame with distinct values.

Drop(cols)

Returns a new DataFrame after removing specified columns.

Usage: dataframe$Drop("colList")
Arguments:

  • cols: list of characters
    List of column names to drop.

Returns: DataFrame
New DataFrame retaining only columns not in cols.

DropDuplicates(subset.dataframe=NULL)

Returns DataFrame with duplicate rows removed.

Usage: dataframe$DropDuplicates("subsetList")
Arguments:

  • subset.dataframe: list of characters, optional
    List of columns to consider when deciding whether rows are duplicates of each other. Defaults to all columns.

Returns: DataFrame with only one copy of duplicate rows.

DropNa(how = NULL, thresh = NULL, subset = NULL)

Returns a new DataFrame with NULLs removed.

Usage: dataframe$DropNa(how = "any", thresh = 1,subset = "subsetone")
Arguments:

  • how : c("any", "all"), optional
    If provided, "any" eliminates rows with any NULLs, and "all" eliminates rows that are entirely NULLs.
    If neither how nor thresh are provided, how defaults to "any".

  • thresh: integer, optional
    If provided, keep rows with at least thresh non-NULL values and drop rows with less.
    how and thresh cannot both be provided.

  • subset: list of characters, optional
    Columns to consider when looking for NULLs. Values in other columns will be ignored, NULL or not.
    Defaults to all columns in the DataFrame.

Returns: DataFrame with a select statement that removes NULLs.

dtypes(subset.col = NULL)

Return column names and their data types as a list.

Usage: dataframe$dtypes()
Arguments:

  • subset.col: list of characters
    Selected columns to show datatype.

Returns: List of column names and their data types .

FillNa(value, subset.dataframe = NULL)

Returns a DataFrame with NULLs replaced with the fill value. Only supports filling numeric columns.

Usage: dataframe$FillNa(0, "col1")
Arguments:

  • value: integer or double
    Value to replace NULLs with. value should have type appropriate for the selected columns.

  • subset: character, optional
    List of columns in which to replace NULLs. Defaults to all columns.

Returns: DataFrame, new DataFrame with NULLs filled.

Filter(condition)

Selects rows matching the given condition. The condition string is not sanity-checked in any way. Do not take condition strings from untrusted input, as this can easily be used for SQL injection.

Usage: dataframe$Filter("select * from test where col1 = 'A'")
Arguments:

  • condition: character
    Condition to filter on.
    This should be in the format of a SQL WHERE clause test (not including the word "WHERE").

Returns: DataFrame with only rows matching the given condition.

GenerateColname(prefix = "GEN_COL")

Generates a new column name for the DataFrame.
Usage: dataframe$GenerateColname("COL1")
Arguments:

  • prefix: character, optional
    Name of the column. If no name if provided, it creates a default column named "GEN_COL".

Returns: character, newly generated column name.

GetDf(select.statement, name = NULL)

Creates a new DataFrame.

Usage: dataframe$GetDf("SELECT * FROM TEMP;", NAME = "DF1")
Arguments:

  • select.statement: character
    Dataframe sql query.

  • name: character, optional
    Dataframe name

Returns: DataFrame

GetDfCounter()

Returns the number of DataFrame.
Usage: dataframe$GetDfCounter()

Returns: integer.

GetTableStructure()

Returns the struecture of the table.
Usage: dataframe$GetTableStructure()

Returns: list.

GetNRows()

Sets the value of DataFrame's nrows.df.

Usage: dataframe$GetNRows()
Returns: No return value.

Has(col)

Returns TRUE if a column is in the DataFrame.

Usage: dataframe$Has("col1")
Arguments:

  • col:character
    Name of column to search in the projection list of this DataFrame.

Returns: logical TRUE if the column exists in the DataFrame's projection list.

Head(n=1)

Returns a DataFrame of the first n rows in the current DataFrame.
Usage: dataframe$Head(5)
Arguments:

  • n: integer, optional
    The number of rows returned. Defaults to 1.

Returns: A new DataFrame of the first n rows of the current DataFrame.

Join(other, on.expression, how = "inner")

Returns a new DataFrame that is a join of this DataFrame with another DataFrame.

Usage: dataframe$Join(other = DF1, on.expression = "col", how = "outer")
Arguments:

  • other: DataFrame
    The DataFrame to join with.

  • on.expression: character
    Join expression

  • how: c("inner", "left", "right", "outer"), Optional
    Type of join. Defaults to "inner".

Returns: DataFrame
A new DataFrame object that joins the current DataFrame with another DataFrame.

PivotTable(values, index, columns, aggfunc = "avg")

Returns a DataFrame that gives the pivoted table. aggfunc is identical to SAP HANA aggregate functions.

Usage: dataframe$PivotTable(values="C2", index="C1", columns="C3", aggfunc="max")
Arguments:

  • values: character or list of characters
    The targeted values for pivoting.

  • index: character or list of characters
    The index of the DataFrame.

  • columns: character or list of characters
    The pivoting columns.

  • aggfunc: c("avg", "max", "min", ...), optional
    aggfunc is identical to SAP HANA aggregate functions. Defaults to "avg".

Returns: DataFrame
A pivoted DataFrame.

rename.columns(new.col.names)

Updates the column name.

Usage: dataframe$rename.columns(list("A", "C"))
Arguments:

  • new.col.names: list of characters
    List of new columns' name.

Returns: DataFrame with rename columns.

RunQuery(Query)

Performs the query.
Usage: b <- dataframe$RunQuery('select "target" from IRIS')
Arguments:

  • Query: character
    SQL statement.

Returns: DataFrame
New DataFrame generated by sql Query.

save(table, table.type = NULL, force = TRUE, schema = NULL, append = FALSE, data.lake = FALSE, data.lake.container = "SYSRDL#CG")

Creates a table holding this DataFrame's data.

Usage: Save("TAB1", "ROW")
Arguments:

  • table: character Table name. save() will fail if a conflicting table already exists.

  • table.type: character, optional
    What kind of table to create, case-insensitive.
    Can be one of the following:
    "ROW", "COLUMN", "HISTORY COLUMN", "GLOBAL TEMPORARY", "GLOBAL TEMPORARY COLUMN", "LOCAL TEMPORARY", or "LOCAL TEMPORARY COLUMN".
    Defaults to "LOCAL TEMPORARY COLUMN" if table starts with "#" and "COLUMN" otherwise.

  • force: logical, optional
    If TRUE, the existed table will be replaced.
    Defaults to TRUE.

  • schema: character, optional
    Schema name. save() will fail if a conflicting table already exists.

  • append: logical, optional
    If set as TRUE, it will use the existing table and append data to it.
    Defaults to FALSE.

  • data.lake: logical, optional
    If TRUE, it will save the table to SAP HANA data lake.
    Defaults to FALSE.

  • data.lake.container, character, optional
    Name of the SAP HANA data lake container for saving the table.
    Valid only when data.lake is set as TRUE.
    Defaults to "SYSRDL#CG".

Returns: DataFrame representing the new table.

Select(cols)

Returns a new DataFrame with columns derived from the current DataFrame.

Usage: dataframe$Select("col1") OR
col.list <- list("*", "select")
cols <- sets::as.tuple(x = col.list)
dataframe$Select(cols)
Arguments:

  • cols: character or (character, character) tuple
    Columns of the new DataFrame.
    A string is treated as the name of a column to select; a (character, character) tuple is treated as (SQL expression, alias).
    As a special case, "*" is expanded to all columns of the original DataFrame.

Returns: DataFrame
New DataFrame object with the specified columns projected.

Sort(cols, desc = FALSE)

Returns a new DataFrame sorted by the specified columns.

Usage: dataframe$Sort("COL1")
Arguments:

  • cols: list of characters
    List of columns to sort by.
    Must be a list, even for sorting by one column.

  • desc: logical, Optional
    TRUE to sort in descending order, FALSE for ascending order.
    Defaults to FALSE.

Returns: DataFrame
New DataFrame object with rows in sorted order.

WithColumnRenamed(original, newName)

Returns a DataFrame with a new name for one column.

Usage: dataframe$WithColumnRenamed("col1", "colnew")
Arguments:

  • original: character
    Original column name.

  • newName: character
    New column name.

Returns: DataFrame
The same data as this DataFrame, with one changed column name.