SAP Help Home SAP Intelligent RPA Help Portal SAP Intelligent RPA Community

Module - Advanced PDF Activities

Collection of advanced functions to work with PDF documents.

Author:
  • SAP Intelligent RPA R&D team

Activities

Get Text Items (PDF)

Return a list of text items. A text item describes the content, position, and size of a text box in the PDF. This helps to identify the exact position of certain text items.


Technical Name Type Minimal Agent Version
getTextItems synchronous WIN-2.0.0 (WIN for Windows)

Output Parameters:

Name Type Description
textItems Array.<irpa_pdf.textItem> Text items.


Get Text Items in Area (PDF)

Return the text items in a specified area.


Technical Name Type Minimal Agent Version
getTextItemsInArea synchronous WIN-2.0.0 (WIN for Windows)

Input Parameters:

Name Type Attributes Default Description
pageNum number mandatory Provide the page number in which the text item exists. By default: 1.
top number mandatory Provide the top dimension of the required text item.
left number mandatory Provide the left dimension of the required text item.
width number mandatory Provide the width of the required text item.
height number mandatory Provide the height of the required text item.

Output Parameters:

Name Type Description
textItem Array.<irpa_pdf.textItem> Text items in the specified area.


Search Text Items (PDF)

Search for text items and return the text items that exactly match the search string.


Technical Name Type Minimal Agent Version
searchTextItems synchronous WIN-2.0.0 (WIN for Windows)

Input Parameters:

Name Type Attributes Default Description
searchString string mandatory String to search in the PDF document.

Output Parameters:

Name Type Description
textItems Array.<irpa_pdf.textItem> Returns the text items that exactly match the search string.


Get Text in Area (PDF)

Returns the text from a specified area of a PDF page. This is a convenient alternative to using the “getTextItems” method with a filter parameter. All input parameters must be used (page, top and so forth). The retrieval area or bounding box is defined by a top and left offset, a width and a height.


Technical Name Type Minimal Agent Version
getTextInArea synchronous WIN-2.0.0 (WIN for Windows)

Input Parameters:

Name Type Attributes Default Description
pageNum number mandatory Provide the page number in which the text item exists. By default: 1.
top number mandatory Provide the top dimension of the required text item.
left number mandatory Provide the left dimension of the required text item.
width number mandatory Provide the width of the required text item.
height number mandatory Provide the height of the required text item.

Output Parameters:

Name Type Description
outputValue string Returns the text from a specified area of the PDF page.


Get Table Column Entries (PDF)

Return table column entries by specifying the column header and searching in the area just below the column header. Works well for rows that only have a single line. You should specify some text below the table so the activity knows where the column ends. Without some specified text it is assumed that the column extends until the end of the page.


Technical Name Type Minimal Agent Version
getColumnEntries synchronous WIN-2.0.0 (WIN for Windows)

Input Parameters:

Name Type Attributes Default Description
columnHeader string mandatory Provides the column header to search the column entries.
textBelowTable string optional Specifies some text below the table so the activity knows where the column ends. Without the specified text it is assumed that the column extends until the end of the page.
leftOffset number optional Provides extend column filter to the left.
rightOffset number optional Provides extend column filter to the right.

Output Parameters:

Name Type Description
columnEntries Array. Returns the list of column entries.


Get Text After Multiple Search (PDF)

Return the text located after multiple searched strings. With this activity, you can return the text after the first matching search. You can extract multiple words after the string using the “numWords” parameter.


Technical Name Type Minimal Agent Version
getTextAfterWithMultipleSearchStrings synchronous WIN-2.0.0 (WIN for Windows)

Input Parameters:

Name Type Attributes Default Description
searchStringList Array. mandatory References the list of words to extract the text after.
numWords number mandatory Number of words to return after the matching string.

Output Parameters:

Name Type Description
outputValue string Extracted text.


Get Text Before Multiple Search (PDF)

Return the text located before multiple searched strings. With this activity, you can return the text before the first matching search. You can extract multiple words before the string using the “numWords” parameter.


Technical Name Type Minimal Agent Version
getTextBeforeWithMultipleSearchStrings synchronous WIN-2.0.0 (WIN for Windows)

Input Parameters:

Name Type Attributes Default Description
searchStringList Array. mandatory References the list of words to extract the text before.
numWords number mandatory Number of words to return before the matching string.

Output Parameters:

Name Type Description
outputValue string Extracted text.