Show TOC

File Formats Supported by TREXLocate this document in the navigation structure

Use

Documents whose content and attributes can be indexed and searched by TREX can exist in numerous different file formats. The TREX preprocessor converts the document text and attributes of the different file formats into UTF-8 encoded HTML. The file filters of a special filter software are used to enable the subsequent searching and indexing of all prevalent file formats such as MS WORD; MS PowerPoint, PDF, and HTML.

Features

The table below lists all file formats that are currently supported by TREX.

Supported File Formats (May 2006/Version 8.1 of Filter Software)

File formats for text processing - generic

Versions

ASCII Text (7 & 8 bit versions available)

All versions

ANSI Text (7 & 8 bit)

All versions

EBCDIC (Extended Binary Coded Decimal Interchange Code)

All versions

HTML

Versions up to and including 3.0

IBM Revisable Form Text

All versions

IBM FFT

All versions

Microsoft Rich Text Format (RTF)

All versions

MHTML (MIME Encapsulation of Aggregate HTML Documents)

No specific version

Text Mail (MIME)

No specific version

Unicode Text

All versions

UUEncode

WML

Compatible with WML specification 5.2

XML

No specific version

Special Features of HTML Files and XML Files

TREX processes HTML files and XML files without filtering, because the conversion to HTML is not necessary. In principle, the lexicon software integrated in TREX ignores the text of the mark-up elements of the actual HTML and XML code, which is located between the tag brackets ( <...>). In this way, texts such as font size, color, and so on within the tag <font size="7" color="#FF0000"> are not passed on for indexing because this information occurs in many HTML files and thus is not characteristic for the respective document content.

Using the mark-up elements, you can configure which texts within HTML and XML documents should not be indexed. For example, this makes sense in the case of JavaScript program code, which is marked in HTML by the tags <script type="text/javascript"...> ... </script>. The JavaScript program code itself does not contain any characteristic content for the document in question and can thus be ignored.

File formats for text processing - DOS

Versions

DEC WPS Plus (DX)

Versions up to and including 4.0

DEC WPS Plus (WPL)

Versions up to and including 4.1

DisplayWrite 2 & 3 (TXT)

All versions

DisplayWrite 4 & 5

Versions up to and including Release 2.0

Enable

Versions 3.0, 4.0, and 4.5

First Choice

Versions up to and including 3.0

Framework

Version 3.0

IBM Writing Assistant

Version 1.01

Lotus Manuscript

Versions up to and including 2.0

MASS11

Versions up to and including 8.0

Microsoft Word

Versions up to and including 6.0

Microsoft Works

Versions up to and including 2.0

MultiMate

Versions up to and including 4.0

Navy DIF

All versions

Nota Bene

Version 3.0

Novell WordPerfect

Versions up to and including 6.1

Office Writer

Versions 4.0 to 6.0

PC-File Letter

Versions up to and including 5.0

PC-File+ Letter

Versions up to and including 3.0

PFS:Write

Versions A, B, and C

Professional Write

Versions up to and including 2.1

Q&A

Version 2.0

Samna Word

Versions up to and including Samna Word IV+

SmartWare II

Versions up to and including Samna Word IV+

Sprint

Version 1.0

Total Word

Version 1.2

Volkswriter 3 & 4

Versions up to and including 1.0

Wang PC (IWP)

Versions up to and including 2.6

WordMARC

Versions up to and including Composer Plus

WordStar

Versions up to and including 7.0

WordStar 2000 (DOS)

Versions up to and including 3.0

XyWrite

Versions up to and including III Plus

File formats for text processing - Windows

Versions

Adobe FrameMaker (MIF)

Up to and including version 6.0

Corel/Novell WordPerfect for Windows

Versions up to and including 10

Corel WordPerfect Suite for Windows

Version 12.0

Hangul

Version 97, 2002 (text only)

JustSystems Ichitaro

Versions 5.0, 6.0, 8.0, 9.0, 10.0, 13.0, and 2004

JustWrite

Versions up to and including 3.0

Legacy

Versions up to and including 1.1

Lotus AMI/AMI Professional

Versions up to and including 3.1

Lotus Word Pro (non-Windows)

Version 96 -- Millennium Edition 9.6, text only

Lotus Word Pro (non-Windows)

Microsoft Works for Windows

Versions up to and including 4.0

Microsoft Windows Write

Versions up to and including 3.0

Microsoft Word for Windows

Versions up to and including 2003

Microsoft WordPad

All versions

Novell Perfect Works

Version 2.0

Professional Write Plus

Version 1.0

Q&A Write for Windows

Version 3.0

StarOffice Writer for Windows and UNIX

Version 5.2, 6.X, 7.X; text only

OpenOffice

Version 1.1

WordStar for Windows

Version 1.0

File formats for text processing - Macintosh

Versions

MacWrite II

Version 1.1

Microsoft Word for Mac

Versions 3.0 - 4.0, 98, 2001, 2004, and v.X

Microsoft Works for Mac

Versions up to and including 2.0

Novell WordPerfect

Version 1.02 up to and including 3.0

Table Calculation Formats

Versions

Enable

Versions 3.0, 4.0, and 4.5

First Choice

Versions up to and including 3.0

Framework

Version 3.0

Lotus 1-2-3 (DOS & Windows)

Versions up to and including 5.0

Lotus 1-2-3 (OS/2)

Versions up to and including 2.0

Lotus 1-2-3 Charts (DOS & Windows)

Versions up to and including 5.0

Lotus 1-2-3 for SmartSuite

SmartSuite 97, Millennium and Millennium 9.6

Lotus Symphony

Versions 1.0, 1.1, and 2.0

Microsoft Excel Charts

Versions 2.x - 7.0

Microsoft Excel Macintosh

Versions 3.0 - 98, 2004, and v.X

Microsoft Excel Windows

Version 2.2 up to and including 2003

Microsoft Multiplan

Version 4.0

Microsoft Works (DOS)

Versions up to and including 2.0

Microsoft Works (Mac)

Versions up to and including 2.0

Microsoft Works for Windows

Versions up to and including 4.0

Mosaic Twin

Version 2.5

Novell Perfect Works

Version 2.0

PFS:Professional Plan

Version 1.0

QuattroPro for DOS

Versions up to and including 5.0

QuattroPro for Windows

Versions up to and including version 12

SmartWare II

Version 1.02

StarOffice Calc for Windows and UNIX

Version 5.2, 6.X, 7.X; text only

OpenOffice

Version 1.1

SuperCalc 5

Version 4.0

VP Planner 3D

Version 1.0

Database Formats

Versions

Access

Versions up to and including 2.0

dBASE

Versions up to and including 5.0

DataEase

Version 4.x

dBXL

Version 1.3

Enable

Versions 3.0, 4.0, and 4.5

First Choice

Versions up to and including 3.0

FoxBase

Version 2.1

Framework

Version 3.0

Microsoft Works (DOS)

Versions up to and including 2.0

Microsoft Works (Mac)

Versions up to and including 2.0

Microsoft Works for Windows

Versions up to and including 4.0

Paradox (DOS)

Versions up to and including 4.0

Paradox (Windows)

Versions up to and including 1.0

Personal R:BASE

Version 1.0

R:BASE 5000

Versions up to and including 3.1

R:BASE System V

Version 1.0

Reflex

Version 2.0

Q & A

Versions up to and including 2.0

SmartWare II

Version 1.02

Presentation Formats

Versions

Corel/Novell Presentations

Versions up to and including 12

Harvard Graphics for DOS

Versions 2.x & 3.x

Harvard Graphics for Windows

Windows versions

Freelance for Windows

Versions up to and including Millennium Edition 9.6

Freelance for OS/2

Versions up to and including 2.0

Microsoft PowerPoint for Macintosh

Versions 4.0 up to and including 2004 and v.X

Microsoft PowerPoint for Windows

Versions 3.0 up to and including 2003

StarOffice Impress for Windows and UNIX

Versions 5.2 (text only), 6.X - 7.X (full support)

OpenOffice

Version 1.1 (text only)

Graphic Formats

Versions

In most cases, only the graphic type and name of the file is displayed for graphic formats. Only maintained properties are indexed for some graphic formats. Text inside graphics cannot be indexed.

Adobe FrameMaker Graphics (FMV)

Version 5.0

Adobe Illustrator

Versions up to and including 9.0

Adobe Photoshop (PSD)

Version 4.0

Adobe Portable Document Format (PDF)

Note

Text inside PDF documents can normally be indexed. Text inside graphics cannot be indexed. Some postscript fonts for text inside PDFs cannot be indexed.

For more information, refer to SAP Note.

622419 Embedded Fonts in PDF and Postscript Documents. Versions up to and including 6.0 (incl. PDF 1.5)

AmiDraw (SDW)

Ami Draw

AutoCAD Interchange and Native Drawing Formats (DXF and DWG)

V. 2.5 - 2.6, 9.0 - 14.0, 2000i - 2002

AutoShade Rendering (RND)

Version 2.0

Binary Group 3 Fax

All versions

Bitmap (BMP, RLE, ICO, CUR, OS/2, DIB & WARP)

Windows

CALS Raster (GP4)

Type I and Type II

Corel Clipart format (CMX)

Versions 5 - 6

Corel Draw (CDR)

Versions 3.0 - 8.0

Corel Draw (CDR with TIFF header)

Versions 2.0 - 9.0

Computer Graphics Metafile (CGM)

ANSI, CALS NIST version 3.0

Encapsulated PostScript (EPS)

TIFF header only

GEM Paint (IMG)

All versions

Graphics Environment Mgr. (GEM)

Bitmap & Vector

Graphics Interchange Format (GIF)

All versions

Hewlett Packard Graphics Language (HPGL)

Version 2

IBM Graphics Data Format (GDF)

Version 1.0

IBM Graphics Data Format (GDF)

Version 1.0

IBM Picture Interchange Format (PIF)

Version 1.0

Initial Graphics Exchange Spec (IGES)

Version 5.1

JBIG2 (Joint Bi-level Image Experts Group)

JBIG2 graphic embeddings in PDF

JFIF (JPEG not in TIFF format)

All versions

JPEG (incl. EXIF)

All versions

Kodak Flash Pix (FPX)

All versions

Kodak Photo CD (PCD)

Version 1.0

Lotus PIC

All versions

Lotus Snapshot

All versions

Macintosh PICT1 & PICT2

Bitmap only

MacPaint (PNTG)

No specific version

MacroMedia Flash

Macromedia Flash 6.x and 7.x, and Macromedia Flash Lite

Micrografx Draw (DRW)

Versions up to and including 4.0

Micrografx Designer (DW)

Versions up to and including 3.1

Micrografx Designer (DSF)

Windows 95, Version 6.0

Novell PerfectWorks (Draw)

Version 2.0

OS/2 Bitmap

All versions

OS/2 PM Metafile (MET)

Version 3.0

Paint Shop Pro (PSP)

Versions 5.0 and 5.01

Paint Shop Pro 6 (PSP)

Win32 only

PC Paintbrush (PCX and DCX)

No specific version

Portable Bitmap (PBM)

All versions

Portable Graymap (PGM)

No specific version

Portable Network Graphics (PNG)

Version 1.0

Portable Pixmap (PPM)

No specific version

Postscript (PS)

Levels 1 - 2

Progressive JPEG

No specific version

StarOffice Draw for Windows and UNIX

Versions 2, 6.x, 7.x

Sun Raster (SRS)

No specific version

TIFF

Versions up to and including 6

TIFF CCITT Group 3 & 4

Versions up to and including 6

Truevision TGA (TARGA)

Version 2

Visio (Preview)

Version 4

Visio

Versions 5, 2000, 2002, and 2003

WBMP

No specific version

Windows Enhanced Metafile (EMF)

No specific version

Windows Metafile (WMF)

No specific version

WordPerfect Graphics (WPG & WPG2)

Versions up to and including 2.0

X-Windows Bitmap (XBM)

x10 compatible

X-Windows Dump (XDM)

x10 compatible

X-Windows Pixmap (XPM)

x10 compatible

Compressed File Formats

Versions

GZIP

No specific version

LZA Self Extracting Compress

No specific version

LZH Compress

No specific version

Microsoft Binder

Versions 7.0-97

MIME-encoded mail messages

No specific version

UNIX Compress

No specific version

UNIX TAR

No specific version

ZIP

PKWARE versions up to and including 2.04g

Special Features of Compressed File Formats (Archives)

The document content of files that are contained in an archive can only be indexed if TREX knows the file format of the files in question. The system uses the filter software to identify the type of files in the archive and filters the file content according to the file type identified. All files in an archive are handled as one large document.

The filter software may sometimes incorrectly assign file types that it does not recognize in an archive to the wrong file type and filter them as such. For example, binary files (*.bin), the content of which was filtered by accident and then indexed, fill the index created with a large number of terms that make no sense.

You can respond to this issue in two ways:

  1. You can exclude compressed file formats (archives) from processing by the preprocessor by removing the corresponding MIME type (for example, application/zip) from the TREXValidMimeTypes.ini configuration file.

    Note

    For more information about this procedure, see Excluding File Formats from Processing.

  2. You can modify the filter software configuration file, default.tpt, in such a way that the names, but not the file content of the files that the archive contains are indexed.

    Note

    For more information about this procedure, see SAP Note 900742.

    Other File Formats

    Versions

    Executables (EXE, DLL)

    No specific version

    Executables for Windows NT

    No specific version

    Microsoft Office 2003 for Windows

    Version 2003

    Microsoft Outlook Message (MSG)

    Text and HTML; codepage CP1252 (ISO 8859-1) and Unicode

    Microsoft Project

    Versions 2003/2002/2000/1998, text only

    MP3 ID3 (Identify an MP3) Information

    Signature

    Version 1.0

    vCalender

    No specific version

    vCard Electronic Business Card

    Version 2.1

    Yahoo! Instant Messenger

    Versions 6.x and 7.x