Skip to content

Optical Character Recognition (OCR)

SAP BTP SDK for Android provides a number of ways to integrate OCR into your app. You can either:

  • Integrate MlKitTextDetectionView, which manages the lifecycle of the camera, image processing, such as rotating and cropping the image, and running OCR detection using the MLKit library.
  • Customize the process (in part or completely). For example, you can customize the image processing stage or choose a different OCR library.

TextBlockTopology

TextBlockTopology is a crucial data structure that is pivotal to processing the generated results of OCR processing. Any OCR library, such as MLKit, provides detected texts and their corresponding positions in the image. However, there are no guarantees about the order and structuring of the detected words. For example, words in the same line could be detected as two different lines. The TextBlockTopology class processes these detected words by their positions and structures them by row-major order, column-major order, and matrices. This enables us to provide easy-to-use APIs for querying and consuming the detections in more meaningful ways. Consider the example of building an app for scanning Gold membership cards.

Gold card sample

As shown in the sample card, we will receive the detected words and the positions of the corresponding rectangles. However, it is difficult to find the member number and expiration date, especially because detected words can't be categorized as entities without additional processing. In such scenarios TextBlockTopology provides an easy way to access the relevant information. "Valid" or "Valid Thru" are keywords that will be common to all cards. This assumption allows us to use "Valid" as anchor. Now, we can access the member number printed above the anchor i.e. "Valid" by calling the getPrevElementInCol API on TextBlockTopology.

You can also access all the rows and columns detected by calling the getRows and getCols APIs.

MlKitTextDetectionView

MlKitTextDetectionView is a View that performs OCR using MLKit library. You can use this View like any ordinary View in your activity layout.

<?xml version="1.0" encoding="utf-8"?>
<Framelayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    android:id="@+id/camera_root"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    tools:context=".ocr.GoldCardDetectionActivity">

    <com.sap.cloud.mobile.fiori.ocr.MlKitTextDetectionView
        android:id="@+id/text_detection_view"
        android:layout_width="match_parent"
        android:layout_height="match_parent" />
</Framelayout>

This View will open the camera if your app has the relevant permissions (Camera Permission). The parent activity should request camera access permissions if required.

Once the camera is opened, MlKitTextDetectionView will start processing images. In order to receive the detected topology (TextBlockTopology) you can add TopologyDetectionObserver on the MlKitTextDetectionView using the setTopologyDetectionObserver API.

TopologyDetectionObserver

The TopologyDetectionObserver class is designed to consume the detected TextBlockTopology.

For example, to process the generated topology for the Gold card, we can use:

MlKitTextDetectionView mTextDetectionView = findViewById(R.id.text_detection_view);
mTextDetectionView.setTopologyDetectionObserver(new TopologyDetectionObserver() {
            @Override
            public void onTopologyDetected(TextBlockTopology topology) {
                if (topology != null) {
                    FioriOcrObservation.Element keywordElement = null;
                    for (FioriOcrObservation.Element e : topology.getElements()) {
                        if (e.getText().equalsIgnoreCase("valid")) {
                            keywordElement = e;
                            break;
                        }
                    }

                    if (keywordElement != null) {
                        FioriOcrObservation.Element validThroughDetailElement = topology.getNextElementInColumn(keywordElement);
                        FioriOcrObservation.Element membershipNumberElement = topology.getPrevElementInColumn(keywordElement);
                        if (validThroughDetailElement != null && membershipNumberElement != null) {
                            FioriOcrObservation.Element firstName = topology.getPrevElementInColumn(membershipNumberElement);
                            if (!membershipNumberElement.getText().isEmpty() && !validThroughDetailElement.getText().isEmpty()) {
                                // TODO: process the values, such as opening another activity,
                                // stop the camera and detection now as we have found the values
                                mTextDetectionView.stop();
                            }
                        }
                    }
                }
            }
        });

Last update: February 29, 2024