Module - Advanced PDF Activities
Collection of advanced functions to work with PDF documents.
- Author:
-
- SAP Intelligent RPA R&D team
Activities
Return a list of text items. A text item describes the content, position, and size of a text box in the PDF. This helps to identify the exact position of certain text items.
Technical Name |
Type |
Minimal Agent Version |
getTextItems |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Output Parameters:
Get Text Items in Area (PDF) |
Return the text items in a specified area.
Technical Name |
Type |
Minimal Agent Version |
getTextItemsInArea |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Input Parameters:
Name |
Type |
Attributes |
Default |
Description |
pageNum |
number |
mandatory
|
|
Provide the page number in which the text item exists. By default: 1. |
top |
number |
mandatory
|
|
Provide the top dimension of the required text item. |
left |
number |
mandatory
|
|
Provide the left dimension of the required text item. |
width |
number |
mandatory
|
|
Provide the width of the required text item. |
height |
number |
mandatory
|
|
Provide the height of the required text item. |
Output Parameters:
Name |
Type |
Description |
textItem |
Array.<irpa_pdf.textItem> |
Text items in the specified area. |
Search for text items and return the text items that exactly match the search string.
Technical Name |
Type |
Minimal Agent Version |
searchTextItems |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Input Parameters:
Name |
Type |
Attributes |
Default |
Description |
searchString |
string |
mandatory
|
|
String to search in the PDF document. |
Output Parameters:
Name |
Type |
Description |
textItems |
Array.<irpa_pdf.textItem> |
Returns the text items that exactly match the search string. |
Returns the text from a specified area of a PDF page. This is a convenient alternative to using the “getTextItems” method with a filter parameter. All input parameters must be used (page, top and so forth). The retrieval area or bounding box is defined by a top and left offset, a width and a height.
Technical Name |
Type |
Minimal Agent Version |
getTextInArea |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Input Parameters:
Name |
Type |
Attributes |
Default |
Description |
pageNum |
number |
mandatory
|
|
Provide the page number in which the text item exists. By default: 1. |
top |
number |
mandatory
|
|
Provide the top dimension of the required text item. |
left |
number |
mandatory
|
|
Provide the left dimension of the required text item. |
width |
number |
mandatory
|
|
Provide the width of the required text item. |
height |
number |
mandatory
|
|
Provide the height of the required text item. |
Output Parameters:
Name |
Type |
Description |
outputValue |
string |
Returns the text from a specified area of the PDF page. |
Get Table Column Entries (PDF) |
Return table column entries by specifying the column header and searching in the area just below the column header. Works well for rows that only have a single line. You should specify some text below the table so the activity knows where the column ends. Without some specified text it is assumed that the column extends until the end of the page.
Technical Name |
Type |
Minimal Agent Version |
getColumnEntries |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Input Parameters:
Name |
Type |
Attributes |
Default |
Description |
columnHeader |
string |
mandatory
|
|
Provides the column header to search the column entries. |
textBelowTable |
string |
optional
|
|
Specifies some text below the table so the activity knows where the column ends. Without the specified text it is assumed that the column extends until the end of the page. |
leftOffset |
number |
optional
|
|
Provides extend column filter to the left. |
rightOffset |
number |
optional
|
|
Provides extend column filter to the right. |
Output Parameters:
Name |
Type |
Description |
columnEntries |
Array. |
Returns the list of column entries. |
Get Text After Multiple Search (PDF) |
Return the text located after multiple searched strings. With this activity, you can return the text after the first matching search. You can extract multiple words after the string using the “numWords” parameter.
Technical Name |
Type |
Minimal Agent Version |
getTextAfterWithMultipleSearchStrings |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Input Parameters:
Name |
Type |
Attributes |
Default |
Description |
searchStringList |
Array. |
mandatory
|
|
References the list of words to extract the text after. |
numWords |
number |
mandatory
|
|
Number of words to return after the matching string. |
Output Parameters:
Name |
Type |
Description |
outputValue |
string |
Extracted text. |
Get Text Before Multiple Search (PDF) |
Return the text located before multiple searched strings. With this activity, you can return the text before the first matching search. You can extract multiple words before the string using the “numWords” parameter.
Technical Name |
Type |
Minimal Agent Version |
getTextBeforeWithMultipleSearchStrings |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Input Parameters:
Name |
Type |
Attributes |
Default |
Description |
searchStringList |
Array. |
mandatory
|
|
References the list of words to extract the text before. |
numWords |
number |
mandatory
|
|
Number of words to return before the matching string. |
Output Parameters:
Name |
Type |
Description |
outputValue |
string |
Extracted text. |