150
PUBLIC
© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ
Search
The following steps are executed on unstructured text :
File format filtering
Converts any binary document format to text/HTML
Language detection
Identifies language to apply appropriate tokenization
and stemming
Tokenization
Decomposes word sequences
E.g.
“card-based payment systems”
“card” “based” “payment”
“systems”
Stemming
Normalizes tokens to linguistic base form
E.g.
houses
house
;
ran
run
Full-text index
‘Attaches’ to the table column




