Jump to content


 


Register a free account to unlock additional features at BleepingComputer.com
Welcome to BleepingComputer, a free community where people like yourself come together to discuss and learn how to use their computers. Using the site is easy and fun. As a guest, you can browse and view the various discussions in the forums, but can not create a new topic or reply to an existing one unless you are logged in. Other benefits of registering an account are subscribing to topics and forums, creating a blog, and having no ads shown anywhere on the site.


Click here to Register a free account now! or read our Welcome Guide to learn how to use this site.

Photo

Converting .docx document to RTF format in Unix


  • Please log in to reply
3 replies to this topic

#1 reholmes

reholmes

  • Members
  • 1 posts
  • OFFLINE
  •  
  • Local time:03:09 PM

Posted 08 June 2018 - 09:15 AM

Hello,

 

I have .docx files on my Unix file system that I need to convert to RTF format.  Does anyone know of a program/utility that will do that conversion ACCURATELY?  I work for a health care company and we have an interface engine that will receive the files, convert them, insert the converted text into a message, then send that into our electronic medical record system.

 

Thanks for your help!



BC AdBot (Login to Remove)

 


#2 britechguy

britechguy

    Been there, done that, got the T-shirt


  • Moderator
  • 8,689 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Staunton, VA
  • Local time:03:09 PM

Posted 08 June 2018 - 09:31 AM

What are you using as your word processor on your machine?   Virtually all of them can save a .docx file to .rtf format with ease, and vice versa.

 

Libre Office has a command line interface that allows you to feed it files from a shell script to convert them from one format to another.  See: 

 

https://ask.libreoffice.org/en/question/438/how-to-batch-convert-microsoft-office-files-from-commandline/ 

 

If you do a web search on whatever your office suite happens to be along with "command line" and "convert" you'll likely pull up something. 


Edited by britechguy, 08 June 2018 - 09:32 AM.
The URL somehow got "eaten" the first time I posted

Brian  AKA  Bri the Tech Guy (website in my user profile) - Windows 10 Home, 64-Bit, Version 1803, Build 17134 

     . . . the presumption of innocence, while essential in the legal realm, does not mean the elimination of common sense outside it.  The willing suspension of disbelief has its limits, or should.

    ~ Ruth Marcus,  November 10, 2017, in Washington Post article, Bannon is right: It’s no coincidence The Post broke the Moore story


 

 

 

              

 


#3 JohnC_21

JohnC_21

  • Members
  • 24,311 posts
  • OFFLINE
  •  
  • Gender:Male
  • Local time:03:09 PM

Posted 08 June 2018 - 10:00 AM

I don't think you will get anything using a command line to get rtf. The best you would get is plain text. Because docx is a Microsoft standard even Libre Office cannot convert some docx files perfectly. docx is a zipped file. 

 

https://stackoverflow.com/questions/5671988/how-to-extract-just-plain-text-from-doc-docx-files-unix



#4 britechguy

britechguy

    Been there, done that, got the T-shirt


  • Moderator
  • 8,689 posts
  • OFFLINE
  •  
  • Gender:Male
  • Location:Staunton, VA
  • Local time:03:09 PM

Posted 08 June 2018 - 10:13 AM

There are, as the link to stackoverflow given by JohnC_21 shows, many possible document converters out there.

 

From what I can tell, though, and I am not pushing the Libre Office option as "the best" one, it has gotten better at these sorts of conversions.  Although this one is going "the opposite direction" it's a command line processing of an RTF file converting it to PDF.  https://listarchives.libreoffice.org/global/users/msg14695.html  

 

I agree that it would be far better were RTF not needed, period, and unless the system already exists that accepts only RTF files I definitely would not go that route.  Plain text is more reliably extracted and far more easily processed with Linux/Unix commands like sed and awk to get the bits you want from it.  RTF is mostly used when "pretty to the human eye" formatting is required, and if the RTFs are never destined to be used by humans then I'd dump 'em.


Brian  AKA  Bri the Tech Guy (website in my user profile) - Windows 10 Home, 64-Bit, Version 1803, Build 17134 

     . . . the presumption of innocence, while essential in the legal realm, does not mean the elimination of common sense outside it.  The willing suspension of disbelief has its limits, or should.

    ~ Ruth Marcus,  November 10, 2017, in Washington Post article, Bannon is right: It’s no coincidence The Post broke the Moore story


 

 

 

              

 





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users