So, I figured I would use a Word.Document and try to read the data there. I have found NUMEROUS examples of how to manipulate data and create new Word documents using .NET, but I have not seen an example where I can just read the document line by line. This is important to my program.
I have also explored the Word.Document and Word.ApplicationClass functions and can't find an answer.
Do you know of a good way to READ in a Word document without that document being converted into another format from the user? I don't need to write or manipulate the data. Thank you.
One way of doing this is using the clipboard. You will need a reference to System.Windows.Forms, and then you can use something like the following to get the text version of the document:
Word.Application app = new Word.ApplicationClass(); //Refs to pass to the Open method for parameters we're not interested in. object nullobj = System.Reflection.Missing.Value; //Full file path. object file = @"C:\MyDoc.doc"; Word.Document doc = app.Documents.Open( ref file, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj); //Select and copy text to clipboard. doc.ActiveWindow.Selection.WholeStory(); doc.ActiveWindow.Selection.Copy(); IDataObject data = Clipboard.GetDataObject(); //Do whatever with the text. string text = data.GetData(DataFormats.Text).ToString(); Console.WriteLine(text); //Close doc and shutdown Word application. doc.Close(ref nullobj, ref nullobj, ref nullobj); app.Quit(ref nullobj, ref nullobj, ref nullobj);
Beware, though, that using these automation objects actually instantiates Word, therefore, I wouldn't recommend it for a high-traffic app. And, remember each instance created uses somewhere in between 10-20 MB of memory (Word XP in my case).
This was first published in October 2003