Profile Data

Profile your data to learn more about your data, and where it might be lacking information.

Context

You can access the profiled data in several ways.
  • From the Cockpit Overview page, click the charts in the Discovery tile under Service KPIs.
  • From the Cockpit Overview page, click Recent Profile Results in the Discovery tile under Quick Links.
  • From the Discovery page, you can see the profiled files.
  • From the Discovery page, click Profiled Data.
Note
Profiling Vora tables requires one of the following:
  • a cloud profiling storage location
  • a Vora Catalog connection and a local HDFS connection associated with the SAP Vora system where the Data Hub adapter is installed. The HDFS connection must have a user called 'vora'.
Note
Profiling the S3 connection requires one of the following:
  • a cloud profiling storage location
  • a local HDFS connection to the system that has the S3 connection. Any user name is allowed.
Note
To profile SAP Vora systems, there must be a Vora Data Pipeline connection on the same system as the connection being profiled. The name of the Vora Data Pipeline connection must end with the '_DEFAULT' suffix.
Note
Depending on the policy configuration, you may have rights to see the connection, but may not have rights to see certain paths within the connection. Therefore, you may not see all of the objects. Likewise, you may be able to browse on an object, but may not be able to profile, view metadata, or view the fact sheet. Contact your system administrator to set the appropriate rights and permissions. They may be able to modify your resource access in Policy Management. For more information about Policy Management, see Policy ManagementPolicy Management grants resource access to a user. in the SAP Data Hub Administration Guide.
At this time, profiling is only supported for these systems and connections. To view the supported cloud connections, see Setup Cloud Profiling Storage.
System Connection
SAP HANA SAP HANA SQL
SAP Vora

HDFS

Vora Catalog

Cloud connections (S3, GCS, WASB, and so on)

Procedure

  1. From the Cockpit Overview page under Quick Links, click Explore Connections under the Discovery tile.
  2. Navigate to the connection and folder that contains the object that you want to profile. Select an object within a connection, and then click Start of the navigation pathProfiling Next navigation step Start ProfilingEnd of the navigation path. Click the refresh icon in the upper, right corner to update the status of the profiling.
    Note
    To profile a folder, the folder must contain a partition file (the data of the file is split among multiple files) of type CSV, Parquet, or ORC. If the folder contains a nonpartitioned file, for example if the data is contained in a single CSV file, or if the folder contains additional nonsupported files, then the contents of the folder cannot be profiled. In this case, profile the supported files individually.
    Note
    If you want to stop a profiling task before it is finished, select the object, and then click Start of the navigation pathProfiling Next navigation step Cancel ProfilingEnd of the navigation path. You can stop one profiling task at a time. The object will have a status of Error after you cancel processing.
    Note
    You can sort the data in the table by clicking the column headers. Click the icons in the Profiling Status column to receive more information about the objects being profiled. For details about the additional information shown in the Profiling Status dialog, see View Profiled Results.
  3. (Optional) You can click the Refresh icon to see the status of your Profiled tasks. Click the status to provide additional information.

    In the Profiling Status dialog, depending on the status, you may find some of this information.

    Profiling Status Description
    Profiling The object is in process for the first time.
    Profiled The object profiling is complete.
    Profiled (not loaded) The object profiling is complete, and the fact sheet has not been viewed.
    Profiled (out-of-date) The object profiling is complete, and the object has been modified since the last profile.
    Profiled (not loaded, out-of-date) The object profiling is complete, the fact sheet has not been viewed, and the object has been modified since the last profile.
    Not Profiled The object can be profiled, but has not been profiled yet.
    Not Supported The object cannot be profiled.
    Regenerating The object being profiled after having been profiled previously.
    Error The object profiling was canceled by the user, or could not be completed due to one or more errors.
    Option Description
    Profiling Status Indicates whether the object has been profiled.
    Name Name of the object.
    Type Type of object.
    Started During processing, the date and time when the object began profiling is listed.
    Completed If the object was profiled, the date and time is listed.
    Runtime The amount of time it took to process the object.
    Profile History Shows up to five of the previous runs. If the indicator is green, then the object was profiled and loaded into the fact sheet. When orange, then the object was profiled, but may be out of date, or the fact sheet was not loaded. When red, then there was an error during processing, or the user canceled profiling. Error messages are shown at the bottom of the dialog. Click the message for more information.
    Supported If the object is not a supported profiling type, then you will see 'No'. If you believe that the object is supported, click Check Support.
    Fact Sheet Indicates whether the fact sheet has been loaded.
    Start Profiling Begin profiling the object.
    Note
    During processing, an icon is shown in the upper-right corner that shows the object is processing. The dialog is refreshed every 6 seconds. If the profiling process is not complete within 20 minutes, the automatic refresh stops. To check the profiling status if the profiling attempt was not completed in 20 minutes, close the dialog and click the refresh icon in the Browse page or on the Profiled Data window.
    Cancel Profiling When an object is processing, click to stop the profiling.
    Clear History The profiling status of up to five previous processing attempts is shown in the Profile History option. Selecting this option clears the entire history associated with the profiling attempts on this object. If you clear the history, you cannot view the trend chart in the fact sheet.
    Check Support If the Supported option indicates that the object is not able to be profiled, click to double-check. For example, if you have an object named customer_data.scv and the file extension is supposed to be .csv, then clicking this option may change the Supported option to 'Yes'.
    Note
    You may need to click this option each time you go in to profile the object.
    View Fact Sheet Loads and displays the fact sheet, which includes metadata about the profiled object.
    Note
    After the fact sheet is loaded, all users can view the fact sheet.
  4. Click Profiled Data.
  5. In the Date Range Filter option, choose one of the options to filter the amount of profiled data that is displayed. The applied filter is shown at the top of the page.
    • Last Week: Last seven days of profiling (default).
    • Last Month: Last 30 days of profiling.
    • Custom Date Range: Type a starting and ending date, or click the calendar icon to choose dates and times (down to the minute) from a calendar.
    • All: All profiled data from the most recent profile to the first object that was profiled.
  6. Filter the results by status (click the Completed, Active, or Error tabs) or by typing the object name in Filter Names.
  7. To view detailed information about each profiled object, select one object and then click Fact Sheet. Likewise, you can view the Metadata, Preview the data, Cancel an active profiling task, and launch another Profile task by clicking those buttons.
    Note
    When selecting a new fact sheet, a message appears for you to confirm that it is OK for all users to view the results of this profiling task. Each time you reprofile the object and open the fact sheet, this message is shown.