You can use a list of MIME types in the configuration file TREXValidMimeTypes.ini to control which file formats are to be processed by TREX. MIME types for graphic formats such as image/jpeg, image/gif, and image/bmp are not listed in the configuration file although these formats are supported by the filter software integrated into TREX (seeSupported File Formats). This exclusion prevents TREX from being unnecessarily burdened by the processing of these formats, since it is not normally sensible to index images and graphics. There may be other scenarios where it makes sense to exclude certain file formats.
A company archives its financial statements in the form of PDF files. These files contain mostly figures, with hardly any relevant text information. The processing of these large files would unnecessarily hamper the performance of TREX but not simplify the indexing of the content. It therefore makes sense to exclude these files from processing.
Procedure
You exclude the document content of a particular file format from being processed by TREX by removing the corresponding MIME types from the configuration file TREXValidMimeTypes.ini. Proceed as follows to do this.
The configuration file TREXValidMimeTypes.ini is located in the TREX installation directory. The path to the directory is:
You do not want TREX to process PDF files because such files contain no relevant text information for your scenario. You remove the entry application/pdf from the list of MIME types in the configuration file TREXValidMimeTypes.ini.
List of MIME Types in the Configuration File TREXValidMimeTypes.ini
MIME Type |
File Extension | Application |
---|---|---|
application/andrew-inset |
ec |
|
application/dca-rft |
rft |
IBM Revisable Form Text |
application/excel |
xls |
MS EXCEL |
application/macwriteii |
MWII |
MacWrite II |
application/msword |
doc,dot |
MS Word |
application/oda |
oda |
CALS Raster (GP4) |
application/pdf |
|
Adobe PDF |
application/powerpoint |
ppt |
MS Powerpoint |
application/rtf |
rtf |
Rich Text Format |
application/smil |
smil, smi |
|
application/vnd.lotus-1-2-3 |
123, w4, w3, w1 |
Lotus 1-2-3 |
application/vnd.lotus-freelance |
prz, pre |
Lotus Freelance |
application/vnd.lotus-wordpro |
lwp, sam |
Lotus WordPro |
application/vnd.ms-excel |
xls, xlb |
MS EXCEL |
application/vnd.ms-powerpoint |
ppt, pps, pot |
MS PowerPoint |
application/vnd.ms-wpl |
wpl |
DEC WPS Plus (WPL) |
application/wordperfect5.1 |
wp5 |
Word Perfect 5.1 |
application/x-123 |
w1, wk3, wk4, wks |
Lotus 1-2-3 (DOS & Windows) |
application/x-cdlink |
vcd |
|
application/x-chess-pgn |
pgn |
|
application/x-compress |
UNIX compress |
|
application/x-csh |
csh |
UNIX CShell Script |
application/x-dvi |
dvi |
|
application/x-freelance |
pre |
Freelance for Windows |
application/x-gtar |
gtar |
GNU UNIX tar archive |
application/x-gzip |
gz, tgz |
GNU Zip compressed data |
application/x-httpd-php |
||
application/x-javascript |
js |
JavaScript |
application/x-latex |
latex |
LaTex |
application/x-maker |
frm, maker, frame, rm, fb, book, fbdoc |
Adobe FrameMaker |
application/x-mif |
mif |
Adobe FrameMaker (MIF) |
application/x-msdos-program |
dll |
Dynamic Link Library |
application/x-msexcel |
xls, xlb |
MS EXCEL |
application/x-msmetafile |
wmf |
MS Metafile |
application/x-netcdf |
nc, cdf |
|
application/x-ns-proxy-autoconfig |
pac |
Netscape Proxy Auto Config |
application/x-perl |
pl, pm |
Perl Program |
application/x-sh |
sh |
UNIX Bourne Shell Script |
application/x-tar |
tar |
UNIX tar Archive |
application/x-tcl |
tcl |
TCL Script |
application/x-tex |
tex |
|
application/x-texinfo |
texinfo, texi |
|
application/x-troff |
t, tr, troff |
UNIX troff document |
application/x-troff-man |
man |
UNIX man page |
application/x-troff-me |
me |
UNIX troff document |
application/x-troff-ms |
ms |
UNIX troff document |
application/x-ustar |
ustar |
|
application/x-wais-source |
src |
|
application/xlc |
xlc |
|
application/zip |
zip |
|
Note
File formats of the MIME types text/*, including HTML, XML, and plain text formats such as *.txt and *.rtf, are processed by TREX without being filtered. |
||
text/asp |
asp |
Active Server Pages |
text/css |
css |
Cascading Style Sheets |
text/html |
html, htm, shtml |
Hypertext Markup Language |
text/plain |
txt, c, ec, cpp, h, hpp, eml, sap |
|
text/richtext |
rtx |
|
text/rtf |
rtf |
|
text/src-c |
c |
|
text/src-c++ |
cpp |
|
text/src-java |
java |
|
text/src-perl |
perl |
|
text/src-tcl |
tcl |
|
text/tab-separated-values |
tsv |
|
text/thtml |
||
text/vnd.wap.wml |
wml |
|
text/wiki |
||
text/wml |
wml |
|
text/x-asm |
||
text/x-setext |
||
text/x-sgml |
||
text/x-ssi-html |
||
text/x-uil |
||
text/x-uuencode |
||
text/x-vCalendar |
||
text/x-vCard |
||
text/xml |
xml |
Extensible Markup Language |