Documents whose content and attributes can be indexed and searched by TREX can exist in numerous different file formats. The TREX preprocessor converts the document text and attributes of the different file formats into UTF-8 encoded HTML. The file filters of a special filter software are used to enable the subsequent searching and indexing of all prevalent file formats such as MS WORD; MS PowerPoint, PDF, and HTML.
The table below lists all file formats that are currently supported by TREX.
Supported File Formats (May 2006/Version 8.1 of Filter Software)
File formats for text processing - generic |
Versions |
ASCII Text (7 & 8 bit versions available) |
All versions |
ANSI Text (7 & 8 bit) |
All versions |
EBCDIC (Extended Binary Coded Decimal Interchange Code) |
All versions |
HTML |
Versions up to and including 3.0 |
IBM Revisable Form Text |
All versions |
IBM FFT |
All versions |
Microsoft Rich Text Format (RTF) |
All versions |
MHTML (MIME Encapsulation of Aggregate HTML Documents) |
No specific version |
Text Mail (MIME) |
No specific version |
Unicode Text |
All versions |
UUEncode |
|
WML |
Compatible with WML specification 5.2 |
XML |
No specific version |
Special Features of HTML Files and XML Files
TREX processes HTML files and XML files without filtering, because the conversion to HTML is not necessary. In principle, the lexicon software integrated in TREX ignores the text of the mark-up elements of the actual HTML and XML code, which is located between the tag brackets ( <...>). In this way, texts such as font size, color, and so on within the tag <font size="7" color="#FF0000"> are not passed on for indexing because this information occurs in many HTML files and thus is not characteristic for the respective document content.
Using the mark-up elements, you can configure which texts within HTML and XML documents should not be indexed. For example, this makes sense in the case of JavaScript program code, which is marked in HTML by the tags <script type="text/javascript"...> ... </script>. The JavaScript program code itself does not contain any characteristic content for the document in question and can thus be ignored.
For more information, see Excluding Parts of XML and HTML Files From Indexing.
File formats for text processing - DOS |
Versions |
DEC WPS Plus (DX) |
Versions up to and including 4.0 |
DEC WPS Plus (WPL) |
Versions up to and including 4.1 |
DisplayWrite 2 & 3 (TXT) |
All versions |
DisplayWrite 4 & 5 |
Versions up to and including Release 2.0 |
Enable |
Versions 3.0, 4.0, and 4.5 |
First Choice |
Versions up to and including 3.0 |
Framework |
Version 3.0 |
IBM Writing Assistant |
Version 1.01 |
Lotus Manuscript |
Versions up to and including 2.0 |
MASS11 |
Versions up to and including 8.0 |
Microsoft Word |
Versions up to and including 6.0 |
Microsoft Works |
Versions up to and including 2.0 |
MultiMate |
Versions up to and including 4.0 |
Navy DIF |
All versions |
Nota Bene |
Version 3.0 |
Novell WordPerfect |
Versions up to and including 6.1 |
Office Writer |
Versions 4.0 to 6.0 |
PC-File Letter |
Versions up to and including 5.0 |
PC-File+ Letter |
Versions up to and including 3.0 |
PFS:Write |
Versions A, B, and C |
Professional Write |
Versions up to and including 2.1 |
Q&A |
Version 2.0 |
Samna Word |
Versions up to and including Samna Word IV+ |
SmartWare II |
Versions up to and including Samna Word IV+ |
Sprint |
Version 1.0 |
Total Word |
Version 1.2 |
Volkswriter 3 & 4 |
Versions up to and including 1.0 |
Wang PC (IWP) |
Versions up to and including 2.6 |
WordMARC |
Versions up to and including Composer Plus |
WordStar |
Versions up to and including 7.0 |
WordStar 2000 (DOS) |
Versions up to and including 3.0 |
XyWrite |
Versions up to and including III Plus |
File formats for text processing - Windows |
Versions |
Adobe FrameMaker (MIF) |
Up to and including version 6.0 |
Corel/Novell WordPerfect for Windows |
Versions up to and including 10 |
Corel WordPerfect Suite for Windows |
Version 12.0 |
Hangul |
Version 97, 2002 (text only) |
JustSystems Ichitaro |
Versions 5.0, 6.0, 8.0, 9.0, 10.0, 13.0, and 2004 |
JustWrite |
Versions up to and including 3.0 |
Legacy |
Versions up to and including 1.1 |
Lotus AMI/AMI Professional |
Versions up to and including 3.1 |
Lotus Word Pro (non-Windows) |
Version 96 -- Millennium Edition 9.6, text only |
Lotus Word Pro (non-Windows) |
|
Microsoft Works for Windows |
Versions up to and including 4.0 |
Microsoft Windows Write |
Versions up to and including 3.0 |
Microsoft Word for Windows |
Versions up to and including 2003 |
Microsoft WordPad |
All versions |
Novell Perfect Works |
Version 2.0 |
Professional Write Plus |
Version 1.0 |
Q&A Write for Windows |
Version 3.0 |
StarOffice Writer for Windows and UNIX |
Version 5.2, 6.X, 7.X; text only |
OpenOffice |
Version 1.1 |
WordStar for Windows |
Version 1.0 |
File formats for text processing - Macintosh |
Versions |
MacWrite II |
Version 1.1 |
Microsoft Word for Mac |
Versions 3.0 - 4.0, 98, 2001, 2004, and v.X |
Microsoft Works for Mac |
Versions up to and including 2.0 |
Novell WordPerfect |
Version 1.02 up to and including 3.0 |
Table Calculation Formats |
Versions |
Enable |
Versions 3.0, 4.0, and 4.5 |
First Choice |
Versions up to and including 3.0 |
Framework |
Version 3.0 |
Lotus 1-2-3 (DOS & Windows) |
Versions up to and including 5.0 |
Lotus 1-2-3 (OS/2) |
Versions up to and including 2.0 |
Lotus 1-2-3 Charts (DOS & Windows) |
Versions up to and including 5.0 |
Lotus 1-2-3 for SmartSuite |
SmartSuite 97, Millennium and Millennium 9.6 |
Lotus Symphony |
Versions 1.0, 1.1, and 2.0 |
Microsoft Excel Charts |
Versions 2.x - 7.0 |
Microsoft Excel Macintosh |
Versions 3.0 - 98, 2004, and v.X |
Microsoft Excel Windows |
Version 2.2 up to and including 2003 |
Microsoft Multiplan |
Version 4.0 |
Microsoft Works (DOS) |
Versions up to and including 2.0 |
Microsoft Works (Mac) |
Versions up to and including 2.0 |
Microsoft Works for Windows |
Versions up to and including 4.0 |
Mosaic Twin |
Version 2.5 |
Novell Perfect Works |
Version 2.0 |
PFS:Professional Plan |
Version 1.0 |
QuattroPro for DOS |
Versions up to and including 5.0 |
QuattroPro for Windows |
Versions up to and including version 12 |
SmartWare II |
Version 1.02 |
StarOffice Calc for Windows and UNIX |
Version 5.2, 6.X, 7.X; text only |
OpenOffice |
Version 1.1 |
SuperCalc 5 |
Version 4.0 |
VP Planner 3D |
Version 1.0 |
Database Formats |
Versions |
Access |
Versions up to and including 2.0 |
dBASE |
Versions up to and including 5.0 |
DataEase |
Version 4.x |
dBXL |
Version 1.3 |
Enable |
Versions 3.0, 4.0, and 4.5 |
First Choice |
Versions up to and including 3.0 |
FoxBase |
Version 2.1 |
Framework |
Version 3.0 |
Microsoft Works (DOS) |
Versions up to and including 2.0 |
Microsoft Works (Mac) |
Versions up to and including 2.0 |
Microsoft Works for Windows |
Versions up to and including 4.0 |
Paradox (DOS) |
Versions up to and including 4.0 |
Paradox (Windows) |
Versions up to and including 1.0 |
Personal R:BASE |
Version 1.0 |
R:BASE 5000 |
Versions up to and including 3.1 |
R:BASE System V |
Version 1.0 |
Reflex |
Version 2.0 |
Q & A |
Versions up to and including 2.0 |
SmartWare II |
Version 1.02 |
Presentation Formats |
Versions |
Corel/Novell Presentations |
Versions up to and including 12 |
Harvard Graphics for DOS |
Versions 2.x & 3.x |
Harvard Graphics for Windows |
Windows versions |
Freelance for Windows |
Versions up to and including Millennium Edition 9.6 |
Freelance for OS/2 |
Versions up to and including 2.0 |
Microsoft PowerPoint for Macintosh |
Versions 4.0 up to and including 2004 and v.X |
Microsoft PowerPoint for Windows |
Versions 3.0 up to and including 2003 |
StarOffice Impress for Windows and UNIX |
Versions 5.2 (text only), 6.X - 7.X (full support) |
OpenOffice |
Version 1.1 (text only) |
Graphic Formats |
Versions |
In most cases, only the graphic type and name of the file is displayed for graphic formats. Only maintained properties are indexed for some graphic formats. Text inside graphics cannot be indexed. |
|
Adobe FrameMaker Graphics (FMV) |
Version 5.0 |
Adobe Illustrator |
Versions up to and including 9.0 |
Adobe Photoshop (PSD) |
Version 4.0 |
Adobe Portable Document Format (PDF) Note
Text inside PDF documents can normally be indexed. Text inside graphics cannot be indexed. Some postscript fonts for text inside PDFs cannot be indexed. For more information, refer to SAP Note. |
622419 Embedded Fonts in PDF and Postscript Documents. Versions up to and including 6.0 (incl. PDF 1.5) |
AmiDraw (SDW) |
Ami Draw |
AutoCAD Interchange and Native Drawing Formats (DXF and DWG) |
V. 2.5 - 2.6, 9.0 - 14.0, 2000i - 2002 |
AutoShade Rendering (RND) |
Version 2.0 |
Binary Group 3 Fax |
All versions |
Bitmap (BMP, RLE, ICO, CUR, OS/2, DIB & WARP) |
Windows |
CALS Raster (GP4) |
Type I and Type II |
Corel Clipart format (CMX) |
Versions 5 - 6 |
Corel Draw (CDR) |
Versions 3.0 - 8.0 |
Corel Draw (CDR with TIFF header) |
Versions 2.0 - 9.0 |
Computer Graphics Metafile (CGM) |
ANSI, CALS NIST version 3.0 |
Encapsulated PostScript (EPS) |
TIFF header only |
GEM Paint (IMG) |
All versions |
Graphics Environment Mgr. (GEM) |
Bitmap & Vector |
Graphics Interchange Format (GIF) |
All versions |
Hewlett Packard Graphics Language (HPGL) |
Version 2 |
IBM Graphics Data Format (GDF) |
Version 1.0 |
IBM Graphics Data Format (GDF) |
Version 1.0 |
IBM Picture Interchange Format (PIF) |
Version 1.0 |
Initial Graphics Exchange Spec (IGES) |
Version 5.1 |
JBIG2 (Joint Bi-level Image Experts Group) |
JBIG2 graphic embeddings in PDF |
JFIF (JPEG not in TIFF format) |
All versions |
JPEG (incl. EXIF) |
All versions |
Kodak Flash Pix (FPX) |
All versions |
Kodak Photo CD (PCD) |
Version 1.0 |
Lotus PIC |
All versions |
Lotus Snapshot |
All versions |
Macintosh PICT1 & PICT2 |
Bitmap only |
MacPaint (PNTG) |
No specific version |
MacroMedia Flash |
Macromedia Flash 6.x and 7.x, and Macromedia Flash Lite |
Micrografx Draw (DRW) |
Versions up to and including 4.0 |
Micrografx Designer (DW) |
Versions up to and including 3.1 |
Micrografx Designer (DSF) |
Windows 95, Version 6.0 |
Novell PerfectWorks (Draw) |
Version 2.0 |
OS/2 Bitmap |
All versions |
OS/2 PM Metafile (MET) |
Version 3.0 |
Paint Shop Pro (PSP) |
Versions 5.0 and 5.01 |
Paint Shop Pro 6 (PSP) |
Win32 only |
PC Paintbrush (PCX and DCX) |
No specific version |
Portable Bitmap (PBM) |
All versions |
Portable Graymap (PGM) |
No specific version |
Portable Network Graphics (PNG) |
Version 1.0 |
Portable Pixmap (PPM) |
No specific version |
Postscript (PS) |
Levels 1 - 2 |
Progressive JPEG |
No specific version |
StarOffice Draw for Windows and UNIX |
Versions 2, 6.x, 7.x |
Sun Raster (SRS) |
No specific version |
TIFF |
Versions up to and including 6 |
TIFF CCITT Group 3 & 4 |
Versions up to and including 6 |
Truevision TGA (TARGA) |
Version 2 |
Visio (Preview) |
Version 4 |
Visio |
Versions 5, 2000, 2002, and 2003 |
WBMP |
No specific version |
Windows Enhanced Metafile (EMF) |
No specific version |
Windows Metafile (WMF) |
No specific version |
WordPerfect Graphics (WPG & WPG2) |
Versions up to and including 2.0 |
X-Windows Bitmap (XBM) |
x10 compatible |
X-Windows Dump (XDM) |
x10 compatible |
X-Windows Pixmap (XPM) |
x10 compatible |
Compressed File Formats |
Versions |
GZIP |
No specific version |
LZA Self Extracting Compress |
No specific version |
LZH Compress |
No specific version |
Microsoft Binder |
Versions 7.0-97 |
MIME-encoded mail messages |
No specific version |
UNIX Compress |
No specific version |
UNIX TAR |
No specific version |
ZIP |
PKWARE versions up to and including 2.04g |
Special Features of Compressed File Formats (Archives)
The document content of files that are contained in an archive can only be indexed if TREX knows the file format of the files in question. The system uses the filter software to identify the type of files in the archive and filters the file content according to the file type identified. All files in an archive are handled as one large document.
The filter software may sometimes incorrectly assign file types that it does not recognize in an archive to the wrong file type and filter them as such. For example, binary files (*.bin), the content of which was filtered by accident and then indexed, fill the index created with a large number of terms that make no sense.
You can respond to this issue in two ways:
You can exclude compressed file formats (archives) from processing by the preprocessor by removing the corresponding MIME type (for example, application/zip) from the TREXValidMimeTypes.ini configuration file.
For more information about this procedure, see Excluding File Formats from Processing.
You can modify the filter software configuration file, default.tpt, in such a way that the names, but not the file content of the files that the archive contains are indexed.
For more information about this procedure, see SAP Note 900742.
Other File Formats |
Versions |
Executables (EXE, DLL) |
No specific version |
Executables for Windows NT |
No specific version |
Microsoft Office 2003 for Windows |
Version 2003 |
Microsoft Outlook Message (MSG) |
Text and HTML; codepage CP1252 (ISO 8859-1) and Unicode |
Microsoft Project |
Versions 2003/2002/2000/1998, text only |
MP3 ID3 (Identify an MP3) Information |
|
Signature |
Version 1.0 |
vCalender |
No specific version |
vCard Electronic Business Card |
Version 2.1 |
Yahoo! Instant Messenger |
Versions 6.x and 7.x |