Optical Character Recognition (OCR)¶
SAP BTP SDK for Android provides a number of ways to integrate OCR into your app. You can either:
- Integrate
MlKitTextDetectionView
, which manages the lifecycle of the camera, image processing, such as rotating and cropping the image, and running OCR detection using the MLKit library. - Customize the process (in part or completely). For example, you can customize the image processing stage or choose a different OCR library.
TextBlockTopology
¶
TextBlockTopology
is a crucial data structure that is pivotal to processing the generated results of OCR processing. Any OCR library, such as MLKit, provides detected texts and their corresponding positions in the image. However, there are no guarantees about the order and structuring of the detected words. For example, words in the same line could be detected as two different lines. The TextBlockTopology
class processes these detected words by their positions and structures them by row-major order, column-major order, and matrices. This enables us to provide easy-to-use APIs for querying and consuming the detections in more meaningful ways. Consider the example of building an app for scanning Gold membership cards.
As shown in the sample card, we will receive the detected words and the positions of the corresponding rectangles. However, it is difficult to find the member number and expiration date, especially because detected words can't be categorized as entities without additional processing. In such scenarios TextBlockTopology
provides an easy way to access the relevant information. "Valid" or "Valid Thru" are keywords that will be common to all cards. This assumption allows us to use "Valid" as anchor.
Now, we can access the member number printed above the anchor i.e. "Valid" by calling the getPrevElementInCol
API on TextBlockTopology.
You can also access all the rows and columns detected by calling the getRows
and getCols
APIs.
MlKitTextDetectionView
¶
MlKitTextDetectionView
is a View that performs OCR using MLKit library.
You can use this View like any ordinary View in your activity layout.
<?xml version="1.0" encoding="utf-8"?>
<Framelayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
android:id="@+id/camera_root"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context=".ocr.GoldCardDetectionActivity">
<com.sap.cloud.mobile.fiori.ocr.MlKitTextDetectionView
android:id="@+id/text_detection_view"
android:layout_width="match_parent"
android:layout_height="match_parent" />
</Framelayout>
This View will open the camera if your app has the relevant permissions (Camera Permission). The parent activity should request camera access permissions if required.
Once the camera is opened, MlKitTextDetectionView
will start processing images. In order to receive the detected topology (TextBlockTopology
) you can add TopologyDetectionObserver
on the MlKitTextDetectionView
using the setTopologyDetectionObserver
API.
TopologyDetectionObserver
¶
The TopologyDetectionObserver
class is designed to consume the detected TextBlockTopology
.
For example, to process the generated topology for the Gold card, we can use:
MlKitTextDetectionView mTextDetectionView = findViewById(R.id.text_detection_view);
mTextDetectionView.setTopologyDetectionObserver(new TopologyDetectionObserver() {
@Override
public void onTopologyDetected(TextBlockTopology topology) {
if (topology != null) {
FioriOcrObservation.Element keywordElement = null;
for (FioriOcrObservation.Element e : topology.getElements()) {
if (e.getText().equalsIgnoreCase("valid")) {
keywordElement = e;
break;
}
}
if (keywordElement != null) {
FioriOcrObservation.Element validThroughDetailElement = topology.getNextElementInColumn(keywordElement);
FioriOcrObservation.Element membershipNumberElement = topology.getPrevElementInColumn(keywordElement);
if (validThroughDetailElement != null && membershipNumberElement != null) {
FioriOcrObservation.Element firstName = topology.getPrevElementInColumn(membershipNumberElement);
if (!membershipNumberElement.getText().isEmpty() && !validThroughDetailElement.getText().isEmpty()) {
// TODO: process the values, such as opening another activity,
// stop the camera and detection now as we have found the values
mTextDetectionView.stop();
}
}
}
}
}
});