In this end-to-end scenario, we wish to do an analysis of the cash flow of an investment required to create a new product. Projected estimates are given for the product revenue, product costs, overheads, and capital investment for each year of the analysis, from which the cash flow can be calculated. For capital investment appraisal the cash flows are summed for each year and discounted for future values, in other words the net present value of the cash flow is derived as a single value measuring the benefit of the investment.

The projected estimates are single point estimates of each data point and the analysis provides a single point value of project net present value (NPV). This is referred to as deterministic modeling, which is in contrast to probabilistic modeling whereby we examine the probability of outcomes, for example, what is the probability of a NPV greater than zero. Probabilistic modeling is also called Monte Carlo Simulation.

Monte Carlo Simulation is used in our example to estimate the net present value (NPV) of the investment. The equations used in the simulation are:

For each year i=0, 1, …, k

Product margin(i) = product revenue(i) – product cost(i)

Total profit(i) = product margin(i) – overhead(i)

Cash flow(i) = total profit(i) – capital investment(i)

Suppose the simulation covers k years’ time periods and the discount rate is r, the net present value of the investment is defined as:

1. Technology Background

Monte Carlo Simulation is a computational algorithm that repeatedly generates random samples to compute numerical results based on a formula or model in order to obtain the unknown probability distribution of an event or outcome.

In hana.ml.r, the Random Distribution Sampling(hanaml.normal, hanaml.uniform), Distribution Fitting(hanaml.DistributionFit), and Cumulative Distribution algorithms(hanaml.CDF) may be used for Monte Carlo Simulation.

2. Implementation Steps

Step 1: Setup the Connection to SAP HANA

Create a jdbc or odbc connection called conn to access a SAP HANA. You could modified the parameters for the connection string in the following line, or read the information from a txt file (in our case, we store the connection information in config.R)

library(hana.ml.r)
# # Create a jdbc connection
# > conn <- hanaml.ConnectionContext(dsn = <host>:<port>,
#                                    username = 'xxxx',
#                                    password = 'xxxx',
#                                    odbc = FALSE,
#                                    jdbcDriverPath = <path to jdbcDriver>, 
#                                    ...)
#                                    
# # OR, create an odbc connection
# > conn <- hanaml.ConnectionContext(dsn = <ODBC data source name>,
#                                    username = 'xxxx',
#                                    password = 'xxxx',
#                                    odbc = TRUE,
#                                    ...)
source("config.R")
## WARN [2023-03-10 17:01:29] The HANA version is 4.50.000.00.1663300048 and note that some classes, functions and their parameters are different between HANA versions and please refer to the documentation of the specific function and class!

Connection status:

conn$connection
## <JDBCConnection>

Step 2: Dataset Generation

Input the given estimates (single point deterministic values) for product revenue, product costs, overheads, and capital investment. In this example, the time periods are 5 (from year 1 to year 5).

The probability distribution for each variable is assumed as follows:

Product Revenue: Normal distribution and the mean and standard deviation are listed in the following table.

Product Costs: Normal distribution and the mean and standard deviation are listed in the following table.

Overheads: Uniform distribution and min and max values are listed in the following table.

Capital Investment (for year 1 and year 2) Gamma distribution and shape and scale values are listed in the following table.

Run the Random Distribution Sampling algorithm for each variable and generate 5,000 sample sets. The number of sample sets is a choice for the analysis. The larger the value then the more smooth the output distribution and the closer it will be to a normal distribution.

The first year

# Product Revenue:
revenue.1 <- hanaml.normal (conn, m = 5000, mean = 0, sd = 0.01, seed = 0)
revenue.1 <- revenue.1$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE REVENUE_1_TBL")
revenue.1$save('REVENUE_1_TBL')

# Product Costs:
cost.1 <- hanaml.normal (conn, m = 5000, mean = 1000, sd = 75, seed = 0)
cost.1 <- cost.1$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE COST_1_TBL")
cost.1$save('COST_1_TBL')

# Overheads:
overheads.1 <- hanaml.uniform(conn, m = 5000, min = 1400, max = 1500, seed = 0)
overheads.1 <- overheads.1$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE OVERHEAD_1_TBL")
overheads.1$save('OVERHEAD_1_TBL')

# Capital Investment (for year 1 and year 2)
investment.1 <- hanaml.normal (conn, m = 5000, mean = 10000, sd = 500, seed = 0)
investment.1 <- investment.1$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE INVESTMENT_1_TBL")
investment.1$save('INVESTMENT_1_TBL')

sqlQueryMix(conn$connection, "DROP TABLE PAL_CASHFLOW_YEAR1;")
sqlQueryMix(conn$connection, "CREATE COLUMN TABLE PAL_CASHFLOW_YEAR1(ID INTEGER, CASH DOUBLE);")
sqlQueryMix(conn$connection, "INSERT INTO PAL_CASHFLOW_YEAR1
    SELECT REVENUE_1_TBL.ID,
           REVENUE_1_TBL.RANDOM - COST_1_TBL.RANDOM - OVERHEAD_1_TBL.RANDOM - INVESTMENT_1_TBL.RANDOM
    FROM REVENUE_1_TBL
    LEFT JOIN COST_1_TBL ON REVENUE_1_TBL.ID = COST_1_TBL.ID
    LEFT JOIN OVERHEAD_1_TBL ON REVENUE_1_TBL.ID = OVERHEAD_1_TBL.ID
    LEFT JOIN INVESTMENT_1_TBL ON REVENUE_1_TBL.ID = INVESTMENT_1_TBL.ID;")

Look at the first 5 rows of tables:

print(conn$table('REVENUE_1_TBL')$Head(5))
##   ID       RANDOM
## 1  0  0.006954552
## 2  1 -0.019170460
## 3  2 -0.009855327
## 4  3 -0.002734050
## 5  4  0.011777093
print(conn$table('COST_1_TBL')$Head(5))
##   ID    RANDOM
## 1  0  942.9842
## 2  1  951.6509
## 3  2  896.9596
## 4  3  964.5393
## 5  4 1031.8147
print(conn$table('OVERHEAD_1_TBL')$Head(5))
##   ID   RANDOM
## 1  0 1422.175
## 2  1 1483.666
## 3  2 1404.360
## 4  3 1407.938
## 5  4 1430.083
print(conn$table('INVESTMENT_1_TBL')$Head(5))
##   ID    RANDOM
## 1  0 10073.168
## 2  1  9988.186
## 3  2 10156.738
## 4  3 10519.089
## 5  4  9509.806
print(conn$table('PAL_CASHFLOW_YEAR1')$Head(5))
##   ID      CASH
## 1  0 -12438.32
## 2  1 -12423.52
## 3  2 -12458.07
## 4  3 -12891.57
## 5  4 -11971.69

The second year

# Product Revenue:
revenue.2 <- hanaml.normal (conn, m = 5000, mean = 3000, sd = 300, seed = 0)
revenue.2 <- revenue.2$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE REVENUE_2_TBL")
revenue.2$save('REVENUE_2_TBL')

# Product Costs:
cost.2 <- hanaml.normal (conn, m = 5000, mean = 1000, sd = 75, seed = 0)
cost.2 <- cost.2$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE COST_2_TBL")
cost.2$save('COST_2_TBL')

# Overheads:
overheads.2 <- hanaml.uniform(conn, m = 5000, min = 1800, max = 2200, seed = 0)
overheads.2 <- overheads.2$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE OVERHEAD_2_TBL")
overheads.2$save('OVERHEAD_2_TBL')

# Capital Investment (for year 1 and year 2)
investment.2 <- hanaml.normal (conn, m = 5000, mean = 2000, sd = 100, seed = 0)
investment.2 <- investment.2$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE INVESTMENT_2_TBL")
investment.2$save('INVESTMENT_2_TBL')

sqlQueryMix(conn$connection, "DROP TABLE PAL_CASHFLOW_YEAR2;")
sqlQueryMix(conn$connection, "CREATE COLUMN TABLE PAL_CASHFLOW_YEAR2(ID INTEGER, CASH DOUBLE);")
sqlQueryMix(conn$connection, "INSERT INTO PAL_CASHFLOW_YEAR2
    SELECT REVENUE_2_TBL.ID,
           REVENUE_2_TBL.RANDOM - COST_2_TBL.RANDOM - OVERHEAD_2_TBL.RANDOM -  INVESTMENT_2_TBL.RANDOM
    FROM REVENUE_2_TBL
    LEFT JOIN COST_2_TBL ON REVENUE_2_TBL.ID = COST_2_TBL.ID 
    LEFT JOIN OVERHEAD_2_TBL ON REVENUE_2_TBL.ID = OVERHEAD_2_TBL.ID
    LEFT JOIN INVESTMENT_2_TBL ON REVENUE_2_TBL.ID = INVESTMENT_2_TBL.ID;")

Look at the first 5 rows of tables:

print(conn$table('REVENUE_2_TBL')$Head(5))
##   ID   RANDOM
## 1  0 3060.429
## 2  1 3229.179
## 3  2 2922.798
## 4  3 3265.830
## 5  4 2750.084
print(conn$table('COST_2_TBL')$Head(5))
##   ID    RANDOM
## 1  0 1037.3202
## 2  1  907.6158
## 3  2 1024.5959
## 4  3  909.4458
## 5  4  921.7164
print(conn$table('OVERHEAD_2_TBL')$Head(5))
##   ID   RANDOM
## 1  0 2007.740
## 2  1 2142.766
## 3  2 1827.785
## 4  3 2172.880
## 5  4 2095.272
print(conn$table('INVESTMENT_2_TBL')$Head(5))
##   ID   RANDOM
## 1  0 1801.343
## 2  1 2002.804
## 3  2 2169.303
## 4  3 2083.665
## 5  4 1998.571
print(conn$table('PAL_CASHFLOW_YEAR2')$Head(5))
##   ID      CASH
## 1  0 -1785.974
## 2  1 -1824.007
## 3  2 -2098.886
## 4  3 -1900.160
## 5  4 -2265.475

The third year

# Product Revenue:
revenue.3 <- hanaml.normal (conn, m = 5000, mean = 8000, sd = 800, seed = 0)
revenue.3 <- revenue.3$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE REVENUE_3_TBL")
revenue.3$save('REVENUE_3_TBL')

# Product Costs:
cost.3 <- hanaml.normal (conn, m = 5000, mean = 2500, sd = 187.5, seed = 0)
cost.3 <- cost.3$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE COST_3_TBL")
cost.3$save('COST_3_TBL')

# Overheads:
overheads.3 <- hanaml.uniform(conn, m = 5000, min = 2200, max = 2800, seed = 0)
overheads.3 <- overheads.3$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE OVERHEAD_3_TBL")
overheads.3$save('OVERHEAD_3_TBL')

sqlQueryMix(conn$connection, "DROP TABLE PAL_CASHFLOW_YEAR3;")
sqlQueryMix(conn$connection, "CREATE COLUMN TABLE PAL_CASHFLOW_YEAR3(ID INTEGER, CASH DOUBLE);")
sqlQueryMix(conn$connection, "INSERT INTO PAL_CASHFLOW_YEAR3
    SELECT REVENUE_3_TBL.ID,
           REVENUE_3_TBL.RANDOM - COST_3_TBL.RANDOM - OVERHEAD_3_TBL.RANDOM
    FROM REVENUE_3_TBL
    LEFT JOIN COST_3_TBL ON REVENUE_3_TBL.ID = COST_3_TBL.ID
    LEFT JOIN OVERHEAD_3_TBL ON REVENUE_3_TBL.ID = OVERHEAD_3_TBL.ID;")

Look at the first 5 rows of tables:

print(conn$table('REVENUE_3_TBL')$Head(5))
##   ID   RANDOM
## 1  0 6672.423
## 2  1 7714.473
## 3  2 7455.507
## 4  3 7298.212
## 5  4 9533.099
print(conn$table('COST_3_TBL')$Head(5))
##   ID   RANDOM
## 1  0 2519.216
## 2  1 2409.549
## 3  2 2463.624
## 4  3 2617.808
## 5  4 2348.721
print(conn$table('OVERHEAD_3_TBL')$Head(5))
##   ID   RANDOM
## 1  0 2235.145
## 2  1 2513.903
## 3  2 2618.118
## 4  3 2583.625
## 5  4 2733.483
print(conn$table('PAL_CASHFLOW_YEAR3')$Head(5))
##   ID     CASH
## 1  0 1918.061
## 2  1 2791.020
## 3  2 2373.765
## 4  3 2096.779
## 5  4 4450.895

The fourth year

# Product Revenue:
revenue.4 <- hanaml.normal (conn, m = 5000, mean = 18000, sd = 1800, seed = 0)
revenue.4 <- revenue.4$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE REVENUE_4_TBL")
revenue.4$save('REVENUE_4_TBL')

# Product Costs:
cost.4 <- hanaml.normal (conn, m = 5000, mean = 7000, sd = 525, seed = 0)
cost.4 <- cost.4$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE COST_4_TBL")
cost.4$save('COST_4_TBL')

# Overheads:
overheads.4 <- hanaml.uniform(conn, m = 5000, min = 2600, max = 3400, seed = 0)
overheads.4 <- overheads.4$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE OVERHEAD_4_TBL")
overheads.4$save('OVERHEAD_4_TBL')

sqlQueryMix(conn$connection, "DROP TABLE PAL_CASHFLOW_YEAR4;")
sqlQueryMix(conn$connection, "CREATE COLUMN TABLE PAL_CASHFLOW_YEAR4(ID INTEGER, CASH DOUBLE);")
sqlQueryMix(conn$connection, "INSERT INTO PAL_CASHFLOW_YEAR4
    SELECT REVENUE_4_TBL.ID,
           REVENUE_4_TBL.RANDOM - COST_4_TBL.RANDOM - OVERHEAD_4_TBL.RANDOM
    FROM REVENUE_4_TBL
    LEFT JOIN COST_4_TBL ON REVENUE_4_TBL.ID = COST_4_TBL.ID
    LEFT JOIN OVERHEAD_4_TBL ON REVENUE_4_TBL.ID = OVERHEAD_4_TBL.ID;")

Look at the first 5 rows of tables:

print(conn$table('REVENUE_4_TBL')$Head(5))
##   ID   RANDOM
## 1  0 18620.16
## 2  1 19468.35
## 3  2 16604.65
## 4  3 17220.54
## 5  4 19373.28
print(conn$table('COST_4_TBL')$Head(5))
##   ID   RANDOM
## 1  0 6402.568
## 2  1 6941.156
## 3  2 7520.091
## 4  3 7550.489
## 5  4 6478.548
print(conn$table('OVERHEAD_4_TBL')$Head(5))
##   ID   RANDOM
## 1  0 2873.664
## 2  1 3078.128
## 3  2 3345.697
## 4  3 3010.185
## 5  4 3213.639
print(conn$table('PAL_CASHFLOW_YEAR4')$Head(5))
##   ID     CASH
## 1  0 9343.927
## 2  1 9449.070
## 3  2 5738.867
## 4  3 6659.863
## 5  4 9681.097

The fifth year

# Product Revenue:
revenue.5 <- hanaml.normal (conn, m = 5000, mean = 30000, sd = 3000, seed = 0)
revenue.5 <- revenue.5$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE REVENUE_5_TBL")
revenue.5$save('REVENUE_5_TBL')

# Product Costs:
cost.5 <- hanaml.normal (conn, m = 5000, mean = 10000, sd = 750, seed = 0)
cost.5 <- cost.5$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE COST_5_TBL")
cost.5$save('COST_5_TBL')

# Overheads:
overheads.5 <- hanaml.uniform(conn, m = 5000, min = 3000, max = 4000, seed = 0)
overheads.5 <- overheads.5$rename.columns(list('ID', 'RANDOM'))
sqlQueryMix(conn$connection, "DROP TABLE OVERHEAD_5_TBL")
overheads.5$save('OVERHEAD_5_TBL')

sqlQueryMix(conn$connection, "DROP TABLE PAL_CASHFLOW_YEAR5;")
sqlQueryMix(conn$connection, "CREATE COLUMN TABLE PAL_CASHFLOW_YEAR5(ID INTEGER, CASH DOUBLE);")
sqlQueryMix(conn$connection, "INSERT INTO PAL_CASHFLOW_YEAR5
    SELECT REVENUE_5_TBL.ID,
           REVENUE_5_TBL.RANDOM - COST_5_TBL.RANDOM - OVERHEAD_5_TBL.RANDOM
    FROM REVENUE_5_TBL
    LEFT JOIN COST_5_TBL ON REVENUE_5_TBL.ID = COST_5_TBL.ID
    LEFT JOIN OVERHEAD_5_TBL ON REVENUE_5_TBL.ID = OVERHEAD_5_TBL.ID;")

Look at the first 5 rows of tables:

print(conn$table('REVENUE_5_TBL')$Head(5))
##   ID   RANDOM
## 1  0 30354.89
## 2  1 30288.28
## 3  2 32577.13
## 4  3 31485.83
## 5  4 29749.88
print(conn$table('COST_5_TBL')$Head(5))
##   ID    RANDOM
## 1  0  9517.617
## 2  1  9773.098
## 3  2 10145.334
## 4  3  9403.476
## 5  4  9246.825
print(conn$table('OVERHEAD_5_TBL')$Head(5))
##   ID   RANDOM
## 1  0 3433.873
## 2  1 3817.703
## 3  2 3723.137
## 4  3 3625.617
## 5  4 3501.280
print(conn$table('PAL_CASHFLOW_YEAR5')$Head(5))
##   ID     CASH
## 1  0 17403.40
## 2  1 16697.48
## 3  2 18708.66
## 4  3 18456.74
## 5  4 17001.78

Step 3: Net Present Value Calculation

Calculate the net present value of the investment by the following equation for each sampling.

sqlQueryMix(conn$connection, "DROP TABLE NPV;")
sqlQueryMix(conn$connection, "CREATE COLUMN TABLE NPV ( NPVALUE DOUBLE);")
sqlQueryMix(conn$connection, "INSERT INTO NPV SELECT PAL_CASHFLOW_YEAR1.CASH + 
                       PAL_CASHFLOW_YEAR2.CASH/1.05 +
                       PAL_CASHFLOW_YEAR3.CASH/POWER(1.05,2) + 
                       PAL_CASHFLOW_YEAR4.CASH/POWER(1.05,3) +
                       PAL_CASHFLOW_YEAR5.CASH/POWER(1.05,4)
                       FROM PAL_CASHFLOW_YEAR1 
                       LEFT JOIN PAL_CASHFLOW_YEAR2 ON PAL_CASHFLOW_YEAR1.ID = PAL_CASHFLOW_YEAR2.ID 
                       LEFT JOIN PAL_CASHFLOW_YEAR3 ON PAL_CASHFLOW_YEAR1.ID = PAL_CASHFLOW_YEAR3.ID 
                       LEFT JOIN PAL_CASHFLOW_YEAR4 ON PAL_CASHFLOW_YEAR1.ID = PAL_CASHFLOW_YEAR4.ID
                       LEFT JOIN PAL_CASHFLOW_YEAR5 ON PAL_CASHFLOW_YEAR1.ID = PAL_CASHFLOW_YEAR5.ID;")

Look at the first 5 rows of table “NPV”:

print(conn$table("NPV")$Head(5))
##     NPVALUE
## 1  9989.944
## 2 10270.387
## 3  8045.181
## 4  8138.038
## 5 12258.105

Step 4: Model Fitting

Plot the distribution of the net present value of the investment and run Distribution Fitting to fit a normal distribution to the NPV of the investment as. (The Central Limit theorem states that the output distribution will be a normal distribution.)

npv <- conn$table("NPV")
result.npv <- hanaml.DistributionFit(data = npv, distr.type = "normal",
                                     optimal.method = "maximum.likelihood", censored=FALSE)
print(result.npv)
## [[1]]
##               NAME   VALUE
## 1 DISTRIBUTIONNAME  NORMAL
## 2             MEAN 8835.17
## 3               SD  3185.7
## 
## [[2]]
##       STAT_NAME STAT_VALUE
## 1 LOGLIKELIHOOD  -47426.83

Step 5: CDF

According to the fitted model, run the Cumulative Distribution function to obtain the probability of having an NPV of investment smaller than or equal to a given NPV of the investment.

Prepare the data:

sqlQueryMix(conn$connection, "DROP TABLE PAL_DISTRPROB_DATA_TBL")
sql <- 'CREATE COLUMN TABLE PAL_DISTRPROB_DATA_TBL (\"DATACOL\" DOUBLE);'
sqlQueryMix(conn$connection, sql)

data.list <- c(7000, 8000, 9000, 10000, 11000)

for (table.data in data.list) {
  sqlQueryMix(conn$connection, sprintf(
    "INSERT INTO %s VALUES (%s)", "PAL_DISTRPROB_DATA_TBL",
    paste(table.data, collapse = ", ")))
}

distr.prob.df <- conn$table("PAL_DISTRPROB_DATA_TBL")

Invoke hanaml.CDF and fetch the mean and variance from result.npv:

mean <- as.double(result.npv[[1]]$Collect()$VALUE[[2]])
sd <- as.double(result.npv[[1]]$Collect()$VALUE[[3]])
variance <- sd*sd
result <- hanaml.CDF(distr.prob.df, distr.info = list('Normal', mean, variance), complementary = FALSE)
print(result$Head(5))
##   DATACOL PROBABILITY
## 1    7000   0.2822856
## 2    8000   0.3965983
## 3    9000   0.5206324
## 4   10000   0.6426843
## 5   11000   0.7516041

3. Clear Data and Close the Connection

tbls <- list("REVENUE_1_TBL","REVENUE_2_TBL", "REVENUE_3_TBL", 
             "REVENUE_4_TBL", "REVENUE_5_TBL",
             "COST_1_TBL","COST_2_TBL", "COST_3_TBL", 
             "COST_4_TBL", "COST_5_TBL",
             "OVERHEADS_1_TBL","OVERHEADS_2_TBL", "OVERHEADS_3_TBL", 
             "OVERHEADS_4_TBL", "OVERHEADS_5_TBL",
             "INVESTMENT_1_TBL","INVESTMENT_2_TBL",
             "PAL_CASHFLOW_YEAR1","PAL_CASHFLOW_YEAR2", "PAL_CASHFLOW_YEAR3", 
             "PAL_CASHFLOW_YEAR4", "PAL_CASHFLOW_YEAR5",
             "NPV", "PAL_DISTRPROB_DATA_TBL")
for (tbl in tbls) {
  query <- sprintf("DROP TABLE %s", tbl)
  sqlQueryMix(conn$connection, query)
}
conn$close()