Module - PDF Core
Collection of functions to work on PDF documents.
- Author:
-
- SAP Intelligent RPA R&D team
Activities
MANDATORY activity to drop first when using a PDF document. This activity opens an instance of a PDF document. Once a PDF instance is opened, other activities can be used later.
Technical Name |
Type |
Minimal Agent Version |
openPdf |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Input Parameters:
Name |
Type |
Attributes |
Default |
Description |
pdfPath |
string |
mandatory
|
|
Full path of the existing PDF document. |
password |
string |
optional
|
|
Password to open password protected PDF documents. |
reOrderByPosition |
boolean |
optional
|
|
Reorder text boxes according to their position in the document. To use when you think that the text items in the document are displayed in random order. Default: False. |
Errors:
Error Class |
Package |
Description |
SequenceError |
irpa_core |
Another PDF file is already opened |
Close a PDF document and release the resources. Before reading a second PDF document, you must release the first one using this activity.
Technical Name |
Type |
Minimal Agent Version |
releasePDF |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Return the number of pages in a PDF document.
Technical Name |
Type |
Minimal Agent Version |
getPageNum |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Output Parameters:
Name |
Type |
Description |
pageNum |
number |
Total number of pages. |
Get Page Dimensions (PDF) |
Return the dimensions of a specified page.
Technical Name |
Type |
Minimal Agent Version |
getPageDimensions |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Input Parameters:
Name |
Type |
Attributes |
Default |
Description |
pageNum |
number |
optional
|
|
Page number (default: 1). |
Output Parameters:
Errors:
Error Class |
Package |
Description |
InvalidArgument |
irpa_core |
Invalid page number |
Retrieve the complete text in a PDF document or in the subset of a PDF document defined by the Filter parameter, if supplied.
Technical Name |
Type |
Minimal Agent Version |
getText |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Output Parameters:
Name |
Type |
Description |
textContent |
string |
Text in the PDF document. |
Extract a string from a page or a specific page area defined by filters to return the first match found. This activity also supports the use of capturing groups in the regular expression. If a capturing group was used, the first capturing group is returned as a result.
Technical Name |
Type |
Minimal Agent Version |
extractTextWithRegEx |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Input Parameters:
Name |
Type |
Attributes |
Default |
Description |
sRegex |
string |
mandatory
|
|
Regular expression. |
Output Parameters:
Name |
Type |
Description |
extractedText |
string |
Extracted text. |
Errors:
Error Class |
Package |
Description |
InvalidArgument |
irpa_core |
sRegex is mandatory to perform this activity |
Retrieve the text located after a searched string. You can extract multiple words after the string using the “numWords” parameter.
Technical Name |
Type |
Minimal Agent Version |
getTextAfter |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Input Parameters:
Name |
Type |
Attributes |
Default |
Description |
searchString |
string |
mandatory
|
|
Reference word to extract the text after. |
numWords |
number |
mandatory
|
|
Number of words to return after the matching string. |
Output Parameters:
Name |
Type |
Description |
outputValue |
string |
Extracted text. |
Retrieve the text located before a searched string. You can extract multiple words before the string using the “numWords” parameter.
Technical Name |
Type |
Minimal Agent Version |
getTextBefore |
synchronous
|
WIN-2.0.0 (WIN for Windows)
|
Input Parameters:
Name |
Type |
Attributes |
Default |
Description |
searchString |
string |
mandatory
|
|
Reference word to extract the text before. |
numWords |
number |
mandatory
|
|
Number of words to return before the matching string. |
Output Parameters:
Name |
Type |
Description |
outputValue |
string |
Extracted text. |