Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 186479
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T15:39:15+00:00 2026-05-11T15:39:15+00:00

So I have about 4,000 word docs that I’m attempting to extract the text

  • 0

So I have about 4,000 word docs that I’m attempting to extract the text from and insert into a db table. This works swimmingly until the processor encounters a document with the *.doc file extension but determines the file is actually an RTF. Now I know POI doesn’t support RTFs which is fine, but I do need a way to determine if a *.doc file is actually an RTF so that I can choose to ignore the file and continue processing.

I’ve tried several techniques to overcome this, including using ColdFusion’s MimeTypeUtils, however, it seems to base its assumption of the mimetype on the file extension and still classifies the RTF as application/msword. Is there any other way to determine if a *.doc is an RTF? Any help would be hugely appreciated.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. 2026-05-11T15:39:15+00:00Added an answer on May 11, 2026 at 3:39 pm

    With CF8 and compatible:

    <cffunction name='IsRtfFile' returntype='Boolean' output='false'>     <cfargument name='FileName' type='String' />     <cfreturn Left(FileRead(Arguments.FileName),5) EQ '{\rtf' /> </cffunction> 

    For earlier versions:

    <cffunction name='IsRtfFile' returntype='Boolean' output='false'>     <cfargument name='FileName' type='String' />     <cfset var FileData = 0 />     <cffile variable='FileData' action='read' file='#Arguments.FileName#' />     <cfreturn Left(FileData,5) EQ '{\rtf' /> </cffunction> 

    Update: A better CF8/compatible answer. To avoid loading the whole file into memory, you can do the following to load just the first few characters:

    <cffunction name='IsRtfFile' returntype='Boolean' output='false'>     <cfargument name='FileName' type='String' />     <cfset var FileData = 0 />      <cfloop index='FileData' file='#Arguments.FileName#' characters='5'>         <cfbreak/>     </cfloop>      <cfreturn FileData EQ '{\rtf' /> </cffunction> 

    Based on the comments:
    Here’s a very quick way how you might do a generate ‘what format is this’ type of function. Not perfect, but it gives you the idea…

    <cffunction name='determineFileFormat' returntype='String' output='false'     hint='Determines format of file based on header of the file's data.'     >     <cfargument name='FileName' type='String'/>     <cfset var FileData = 0 />     <cfset var CurFormat = 0 />     <cfset var MaxBytes = 8 />     <cfset var Formats =         { WordNew  : 'D0,CF,11,E0,A1,B1,1A,E1'         , WordBeta : '0E,11,FC,0D,D0,CF,11,E0'         , Rtf      : '7B,5C,72,74,66' <!--- {\rtf --->         , Jpeg     : 'FF,D8'         }/>      <cfloop index='FileData' file='#Arguments.FileName#' characters='#MaxBytes#'>         <cfbreak/>     </cfloop>      <cfloop item='CurFormat' collection='#Formats#'>         <cfif Left( FileData , ListLen(Formats[CurFormat]) ) EQ convertToText(Formats[CurFormat]) >             <cfreturn CurFormat />         </cfif>     </cfloop>      <cfreturn 'Unknown'/> </cffunction>   <cffunction name='convertToText' returntype='String' output='false'>     <cfargument name='HexList' type='String' />     <cfset var Result = '' />     <cfset var CurItem = 0 />      <cfloop index='CurItem' list='#Arguments.HexList#'>         <cfset Result &= Chr(InputBaseN(CurItem,16)) />     </cfloop>      <cfreturn Result /> </cffunction> 

    Of course, worth pointing out that all this wont work on ‘headerless’ formats, including many common text-based ones (CFM,CSS,JS,etc).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a MySQL table with about 5,000,000 rows that are being constantly updated
I have about 200,000 text files that are placed in a bz2 file. The
I have about 20,000 records (coming from an SQLite db) that I need to
I have a huge string of raw text that is about 200,000 words long.
I have a table with about 5,000 rows which I build dynamically with jQyuery.
i have a table with about 200,000 records. i want to add a field
I have a mysql table with about 2,000,000 entries, with a primary key which
i have about 300,000 records in this spreadsheet. and there are a couple hundred
I have a database in (psql) that contains about 16,000 records; they are the
I have about 10,000 products in the product table. I want to retrieve one

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.