Table of Contents Table of Contents
Previous Page  150 / 577 Next Page
Information
Show Menu
Previous Page 150 / 577 Next Page
Page Background

150

PUBLIC

© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Search

The following steps are executed on unstructured text :

File format filtering

Converts any binary document format to text/HTML

Language detection

Identifies language to apply appropriate tokenization

and stemming

Tokenization

Decomposes word sequences

E.g.

“card-based payment systems”

“card” “based” “payment”

“systems”

Stemming

Normalizes tokens to linguistic base form

E.g.

houses

house

;

ran

run

Full-text index

‘Attaches’ to the table column