SAP Help Home SAP Intelligent RPA Help Portal SAP Intelligent RPA Community

Module - PDF Core

Collection of functions to work on PDF documents.

Author:
  • SAP Intelligent RPA R&D team

Activities

Open PDF

MANDATORY activity to drop first when using a PDF document. This activity opens an instance of a PDF document. Once a PDF instance is opened, other activities can be used later.


Technical Name Type Minimal Agent Version
openPdf synchronous WIN-2.0.0 (WIN for Windows)

Input Parameters:

Name Type Attributes Default Description
pdfPath string mandatory Full path of the existing PDF document.
password string optional Password to open password protected PDF documents.
reOrderByPosition boolean optional Reorder text boxes according to their position in the document. To use when you think that the text items in the document are displayed in random order. Default: False.

Errors:

Error Class Package Description
SequenceError irpa_core Another PDF file is already opened


Close and Release PDF

Close a PDF document and release the resources. Before reading a second PDF document, you must release the first one using this activity.


Technical Name Type Minimal Agent Version
releasePDF synchronous WIN-2.0.0 (WIN for Windows)


Get Total Pages (PDF)

Return the number of pages in a PDF document.


Technical Name Type Minimal Agent Version
getPageNum synchronous WIN-2.0.0 (WIN for Windows)

Output Parameters:

Name Type Description
pageNum number Total number of pages.


Get Page Dimensions (PDF)

Return the dimensions of a specified page.


Technical Name Type Minimal Agent Version
getPageDimensions synchronous WIN-2.0.0 (WIN for Windows)

Input Parameters:

Name Type Attributes Default Description
pageNum number optional Page number (default: 1).

Output Parameters:

Name Type Description
pageDimensions irpa_pdf.pageDimensions Dimensions of a page.

Errors:

Error Class Package Description
InvalidArgument irpa_core Invalid page number


Get Text (PDF)

Retrieve the complete text in a PDF document or in the subset of a PDF document defined by the Filter parameter, if supplied.


Technical Name Type Minimal Agent Version
getText synchronous WIN-2.0.0 (WIN for Windows)

Output Parameters:

Name Type Description
textContent string Text in the PDF document.


Extract Text w/ Reg. Expr. (PDF)

Extract a string from a page or a specific page area defined by filters to return the first match found. This activity also supports the use of capturing groups in the regular expression. If a capturing group was used, the first capturing group is returned as a result.


Technical Name Type Minimal Agent Version
extractTextWithRegEx synchronous WIN-2.0.0 (WIN for Windows)

Input Parameters:

Name Type Attributes Default Description
sRegex string mandatory Regular expression.

Output Parameters:

Name Type Description
extractedText string Extracted text.

Errors:

Error Class Package Description
InvalidArgument irpa_core sRegex is mandatory to perform this activity


Get Text After (PDF)

Retrieve the text located after a searched string. You can extract multiple words after the string using the “numWords” parameter.


Technical Name Type Minimal Agent Version
getTextAfter synchronous WIN-2.0.0 (WIN for Windows)

Input Parameters:

Name Type Attributes Default Description
searchString string mandatory Reference word to extract the text after.
numWords number mandatory Number of words to return after the matching string.

Output Parameters:

Name Type Description
outputValue string Extracted text.


Get Text Before (PDF)

Retrieve the text located before a searched string. You can extract multiple words before the string using the “numWords” parameter.


Technical Name Type Minimal Agent Version
getTextBefore synchronous WIN-2.0.0 (WIN for Windows)

Input Parameters:

Name Type Attributes Default Description
searchString string mandatory Reference word to extract the text before.
numWords number mandatory Number of words to return before the matching string.

Output Parameters:

Name Type Description
outputValue string Extracted text.