Problem solve Get help with specific problems with your technologies, process and projects.

How can I get my app to read a Word doc line-by-line?

I have an ASP.NET app (using VB.NET) that allows a user to upload a file, and then the program should parse certain info out of the file and insert it into a database. The file is a Word document, so when I try to just use a FileStream, there is so much junk in the file.

So, I figured I would use a Word.Document and try to read the data there. I have found NUMEROUS examples of how to manipulate data and create new Word documents using .NET, but I have not seen an example where I can just read the document line by line. This is important to my program.

I have also explored the Word.Document and Word.ApplicationClass functions and can't find an answer.

Do you know of a good way to READ in a Word document without that document being converted into another format from the user? I don't need to write or manipulate the data. Thank you.
One way of doing this is using the clipboard. You will need a reference to System.Windows.Forms, and then you can use something like the following to get the text version of the document:

Word.Application app = new Word.ApplicationClass();
//Refs to pass to the Open method for parameters we're not interested in.
object nullobj = System.Reflection.Missing.Value;

//Full file path.
object file = @"C:\MyDoc.doc";

Word.Document doc = app.Documents.Open(
 ref file, ref nullobj, ref nullobj, 
 ref nullobj, ref nullobj, ref nullobj, 
 ref nullobj, ref nullobj, ref nullobj, 
 ref nullobj, ref nullobj, ref nullobj, 
 ref nullobj, ref nullobj, ref nullobj);

//Select and copy text to clipboard.

IDataObject data = Clipboard.GetDataObject();
//Do whatever with the text.
string text = data.GetData(DataFormats.Text).ToString();

//Close doc and shutdown Word application.
doc.Close(ref nullobj, ref nullobj, ref nullobj);
app.Quit(ref nullobj, ref nullobj, ref nullobj);

Beware, though, that using these automation objects actually instantiates Word, therefore, I wouldn't recommend it for a high-traffic app. And, remember each instance created uses somewhere in between 10-20 MB of memory (Word XP in my case).

This was last published in October 2003

Dig Deeper on Win Development Resources

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.