Extracting Images, Embedded Files, and Text from Office 2007 or Newer

Businessman being picked up
Paper Boat Creative On Getty Images

Before we get started, note that this tutorial is for Office 2007 or newer.  If you have a copy of Microsoft Office from before  2007, then you will need to refer to our other tutorial on images extraction.

Getting Started

Take this example: A friend or colleague sends you a Word document filled with images. You really like these images and want to save them to your computer’s hard drive. That's why I decided to give you a guide about extracting images and more from Microsoft Word, PowerPoint, and Excel.

Even better, you can extract these embedded files, text, and images without having to save them one by one. You also don’t need to download any weird programs or apps.

Office XML-based file formats are just compressed archives that work very similar to zip files. These include .dotx, .xlsx, and .pptx formats. Open them as you would any .zip file and then you can extract the images, text and embedded files that you want. Microsoft Windows even has its own .zip support built right into the software. If you want, you can download 7-Zip or a similar app.

Extracting Items from Newer Files in Office (.docx, .xlsx, .pptx)

First of all, simply open the File Explorer (in Windows 7 this is the Windows Explorer,) and choose the file you want to extract from. Rename the file and its extension by pressing F2, but don’t edit the main part of the filename. To finish, hit “Enter” and click “Yes” when the Rename dialog box pops up.

By default, Windows will identify the file as a .zip. In order to extract the content, you need to right-click on the filename and choose “Extract All.” This will open a new dialog box where you can “Select a Destination and Extract Files.” There is an edit box underneath the heading “Files will be extracted to this folder,” and it should include the file you are extracting from.

If you want a different folder, click on “Browse” and choose a different one.

In order to see the extracted files in the File Explorer after you extract them, you need to click on “Show extracted files when complete.” There should be a check in the checkbox. Then, hit “Extract.”

Accessing Your Extracted Images

The extracted contents will include a “word” folder (or an “xls” or “ppt” folder depending on which Program you’re using.) Double-click on the folder to open it, then double-click on the folder titled “media.” This will show you all of the extracted images, which are raw originals from the document.

Accessing Your Extracted Text

What if you want to extracted content from a Word document, but you don’t have Microsoft Office on your computer? Well, you can access the extracted text via the “word” folder, which contains a “document.xml” file. This type of file can be accessed in Microsoft Word, Notepad, Wordpad, or another text editor, although the most compatible one is XML Notepad, which is free. You can get all of the text in the form of chunks of plain text, no matter the formatting style used in the original Word document. While you’re at, you should also download the free LibreOffice, which allows you to easily read Microsoft Office documents.

Extracting Embedded Attached Files and OLE Objects

If you don’t have Microsoft Word, but you want to open embedded files in a Word document, then you need to open the document in Windows’ built-in text editor, Wordpad. Although Wordpad might not display all of the embedded file icons, they are still there. Since Wordpad doesn’t support all of the features of Microsoft Word, some of the content might not display normally, or some embedded files may appear to have incomplete filenames. Even so, you can still access this content.

Right-click on the embedded files and you will see the option to “Open PDF Object.” Clicking this will open the file on your computer’s default PDF reader, and that is where you can save the content to your computer’s hard drive.

If Wordpad isn’t giving you that option, check its file type.

Is it a .docx file? Maybe it’s a .mp3 file? Navigate to the folder titled “Files from [Document]” and double-click on the “word” folder, followed by the “embeddings” folder.

You will notice that the filenames do not show the file types. Rather, they all end in the “.bin” file extension. If you have an idea of the content you want to extract, you may be able to guesstimate which file it is in based on how large the file is. For instance, an MP3 file is usually larger than a PDF file.

After extracting the items from the zipped file, you can change the file extension back to what it was before (.docx, .xlsx, or .pptx.) The file will remain properly formatted and will be accessible in its corresponding Office Program.