Ask the Expert

How can I get my app to read a Word doc line-by-line?

I have an ASP.NET app (using VB.NET) that allows a user to upload a file, and then the program should parse certain info out of the file and insert it into a database. The file is a Word document, so when I try to just use a FileStream, there is so much junk in the file.

So, I figured I would use a Word.Document and try to read the data there. I have found NUMEROUS examples of how to manipulate data and create new Word documents using .NET, but I have not seen an example where I can just read the document line by line. This is important to my program.

I have also explored the Word.Document and Word.ApplicationClass functions and can't find an answer.

Do you know of a good way to READ in a Word document without that document being converted into another format from the user? I don't need to write or manipulate the data. Thank you.
One way of doing this is using the clipboard. You will need a reference to System.Windows.Forms, and then you can use something like the following to get the text version of the document:

Word.Application app = new Word.ApplicationClass();
//Refs to pass to the Open method for parameters we're not interested in.
object nullobj = System.Reflection.Missing.Value;

//Full file path.
object file = @"C:\MyDoc.doc";

Word.Document doc = app.Documents.Open(
 ref file, ref nullobj, ref nullobj, 
 ref nullobj, ref nullobj, ref nullobj, 
 ref nullobj, ref nullobj, ref nullobj, 
 ref nullobj, ref nullobj, ref nullobj, 
 ref nullobj, ref nullobj, ref nullobj);

//Select and copy text to clipboard.
doc.ActiveWindow.Selection.WholeStory();
doc.ActiveWindow.Selection.Copy();

IDataObject data = Clipboard.GetDataObject();
//Do whatever with the text.
string text = data.GetData(DataFormats.Text).ToString();
Console.WriteLine(text);

//Close doc and shutdown Word application.
doc.Close(ref nullobj, ref nullobj, ref nullobj);
app.Quit(ref nullobj, ref nullobj, ref nullobj);

Beware, though, that using these automation objects actually instantiates Word, therefore, I wouldn't recommend it for a high-traffic app. And, remember each instance created uses somewhere in between 10-20 MB of memory (Word XP in my case).

This was first published in October 2003

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: