Module - PDF Core
Collection of functions to work on PDF documents.
Activities
MANDATORY activity to drop first when using a PDF document. This activity opens an instance of a PDF document. Once a PDF instance is opened, other activities can be used later.
| Technical Name |
Type |
Minimal Agent Version |
| openPdf |
synchronous
|
WIN-3.24, MAC-3.24, CLOUD-3.34
|
Input Parameters:
| Name |
Type |
Attributes |
Default |
Description |
| pdfPath |
string |
mandatory
|
|
Full path of the existing PDF document. |
| password |
string |
optional
|
|
Password to open password protected PDF documents. |
| reOrderByPosition |
boolean |
optional
|
|
Reorder text boxes according to their position in the document. To use when you think that the text items in the document are displayed in random order. Default: False. |
Errors:
| Error Class |
Package |
Description |
| SequenceError |
irpa_core |
Another PDF file is already opened |
Close a PDF document and release the resources. Before reading a second PDF document, you must release the first one using this activity.
| Technical Name |
Type |
Minimal Agent Version |
| releasePDF |
synchronous
|
WIN-3.24, MAC-3.24, CLOUD-3.34
|
Return the number of pages in a PDF document.
| Technical Name |
Type |
Minimal Agent Version |
| getPageNum |
synchronous
|
WIN-3.24, MAC-3.24, CLOUD-3.34
|
Output Parameters:
| Name |
Type |
Description |
| pageNum |
number |
Total number of pages. |
| Get Page Dimensions (PDF) |
Return the dimensions of a specified page.
| Technical Name |
Type |
Minimal Agent Version |
| getPageDimensions |
synchronous
|
WIN-3.24, MAC-3.24, CLOUD-3.34
|
Input Parameters:
| Name |
Type |
Attributes |
Default |
Description |
| pageNum |
number |
optional
|
|
Page number (default: 1). |
Output Parameters:
Errors:
| Error Class |
Package |
Description |
| InvalidArgument |
irpa_core |
Invalid page number |
Retrieve the complete text in a PDF document or in the subset of a PDF document defined by the Filter parameter, if provided. Text content is detected in chunks and joined using a separator.
| Technical Name |
Type |
Minimal Agent Version |
| getText |
synchronous
|
WIN-3.24, MAC-3.24, CLOUD-3.34
|
Input Parameters:
| Name |
Type |
Attributes |
Default |
Description |
| separator |
string |
optional
|
|
The separator used to join the detected chunks of text. Default: ' ' |
Output Parameters:
| Name |
Type |
Description |
| textContent |
string |
The text content of the PDF document |
Extract a string from a page or a specific page area defined by filters to return the first match found. This activity also supports the use of capturing groups in the regular expression. If a capturing group was used, the first capturing group is returned as a result.
| Technical Name |
Type |
Minimal Agent Version |
| extractTextWithRegEx |
synchronous
|
WIN-3.24, MAC-3.24, CLOUD-3.34
|
Input Parameters:
| Name |
Type |
Attributes |
Default |
Description |
| sRegex |
string |
mandatory
|
|
Regular expression. |
Output Parameters:
| Name |
Type |
Description |
| extractedText |
string |
Extracted text. |
Errors:
| Error Class |
Package |
Description |
| InvalidArgument |
irpa_core |
sRegex is mandatory to perform this activity |
Retrieve the text located after a searched string. You can extract multiple words after the string using the 'numWords' parameter.
| Technical Name |
Type |
Minimal Agent Version |
| getTextAfter |
synchronous
|
WIN-3.24, MAC-3.24, CLOUD-3.34
|
Input Parameters:
| Name |
Type |
Attributes |
Default |
Description |
| searchString |
string |
mandatory
|
|
Reference word to extract the text after. |
| numWords |
number |
mandatory
|
|
Number of words to return after the matching string. |
Output Parameters:
| Name |
Type |
Description |
| outputValue |
string |
Extracted text. |
Retrieve the text located before a searched string. You can extract multiple words before the string using the 'numWords' parameter.
| Technical Name |
Type |
Minimal Agent Version |
| getTextBefore |
synchronous
|
WIN-3.24, MAC-3.24, CLOUD-3.34
|
Input Parameters:
| Name |
Type |
Attributes |
Default |
Description |
| searchString |
string |
mandatory
|
|
Reference word to extract the text before. |
| numWords |
number |
mandatory
|
|
Number of words to return before the matching string. |
Output Parameters:
| Name |
Type |
Description |
| outputValue |
string |
Extracted text. |