Show TOC

Function documentationScrape HTML Page Locate this document in the navigation structure

 

This SAP Manufacturing Integration and Intelligence (SAP MII) action is used to do the following:

  • Retrieve an HTML page

  • Look for text patterns in the source

  • Return data elements in the pattern

To retrieve a pattern that spans multiple lines, you can use the symbol {WS} to ignore white space, line breaks, and so on. Use curly brackets, {}, to surround the element value that you want to return.

Integration

To scrape data from an HTML page, use the HTML Loader action to load it. You can link its StringContent property to the Sourceproperty of this action.

Features

The properties for this action are listed in the following table:

Property

Data Type

Access

Description

Source

String

In and out

The HTML page source.

Pattern

String

In and out

The pattern used to find data in the HTML source.

Output

String

In and out

An XML document in SAP MII XML format.

Success

Boolean

Out

Indicates whether the action succeeded or failed. If it failed, errors are displayed in the server trace log.

Example

The following source HTML exists:

<TABLE ALIGN="CENTER" BORDER="5">

<TR>

<TD width="90" align="center"><B>Interface</B></TD>

<TD width="100" align="center"><B>Actual flow</B></TD>

<TD width="100" align="center"><B>Warning Level</B></TD>

<TD width="100" align="center"><B>Transfer Limit</B></TD>

</TR>

<TR>

<TD>EAST</TD>

<TD align="right">4824</TD>

<TD align="right">5473</TD>

<TD align="right">5761</TD>

</TR>

<TR>

<TD>CENTRAL</TD>

<TD align="right">3698</TD>

<TD align="right">4169</TD>

<TD align="right">4388</TD>

</TR>

<TR>

<TD>WEST</TD>

<TD align="right">5383</TD>

<TD align="right">5919</TD>

<TD align="right">6230</TD>

</TR>

<TR>

<TD>APSOUTH</TD>

<TD align="right">2902</TD>

<TD align="right">3034</TD>

<TD align="right">3194</TD>

</TR>

<TR>

<TD>BED-BLA</TD>

<TD align="right">1809</TD>

<TD align="right">1788</TD>

<TD align="right">1882</TD>

</TR>

</TABLE>

It appears in the following way in your browser:

Interface

Actual Flow

Warning Level

Transfer Limit

EAST

4824

5473

5761

CENTRAL

3698

4169

4388

WEST

5383

5919

6230

APSOUTH

2902

3034

3194

BED-BLA

1809

1788

1882

To return each row of data, use the following match pattern:

<TR>{WS}<TD>{INTERFACE}</TD>{WS}<TD align="right">{ACTUAL}</TD>{WS}<TD align="right">{WARNING}</TD>{WS}<TD align="right">{LIMIT}</TD>{WS}</TR>

The data in the pattern you want to retrieve is replaced by a variable name in curly brackets. For example, the Interface column is in the following pattern:

<TR>

<TD>EAST</TD>

The match pattern to return that piece of data is:

<TR>{WS}<TD>{INTERFACE}</TD>

Where <TR> is followed by the white space symbol to ignore the line break, and the EAST value that you want returned is replaced by a variable named INTERFACE. The variable is declared to the action by placing it in curly brackets. The data value EAST is placed into the variable INTERFACE. This setup allows you to return all matches in the table.

The resulting XML document is the output of the action and is in standard SAP MII XML format. It can be sent to an applet through a transaction variable, linked to another document, or written to a database.