I have an ASP.NET app (using VB.NET) that allows a user to upload a file, and then the program should parse certain info out of the file and insert it into a database. The file is a Word document, so when I try to just use a FileStream, there is so much junk in the file.
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
So, I figured I would use a Word.Document and try to read the data there. I have found NUMEROUS examples of how to manipulate data and create new Word documents using .NET, but I have not seen an example where I can just read the document line by line. This is important to my program.
I have also explored the Word.Document and Word.ApplicationClass functions and can't find an answer.
Do you know of a good way to READ in a Word document without that document being converted into another format from the user? I don't need to write or manipulate the data. Thank you.
One way of doing this is using the clipboard. You will need a reference to System.Windows.Forms, and then you can use something like the following to get the text version of the document:
Word.Application app = new Word.ApplicationClass(); //Refs to pass to the Open method for parameters we're not interested in. object nullobj = System.Reflection.Missing.Value; //Full file path. object file = @"C:\MyDoc.doc"; Word.Document doc = app.Documents.Open( ref file, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj, ref nullobj); //Select and copy text to clipboard. doc.ActiveWindow.Selection.WholeStory(); doc.ActiveWindow.Selection.Copy(); IDataObject data = Clipboard.GetDataObject(); //Do whatever with the text. string text = data.GetData(DataFormats.Text).ToString(); Console.WriteLine(text); //Close doc and shutdown Word application. doc.Close(ref nullobj, ref nullobj, ref nullobj); app.Quit(ref nullobj, ref nullobj, ref nullobj);
Beware, though, that using these automation objects actually instantiates Word, therefore, I wouldn't recommend it for a high-traffic app. And, remember each instance created uses somewhere in between 10-20 MB of memory (Word XP in my case).
Dig Deeper on Win Development Resources
Related Q&A from Daniel Cazzulino
Here Daniel Cazzulino explains how to load a DSL (domain specific language) domain model instance file programmatically. This requires the .NET type ...continue reading
Here we offer a glimpse at 12 of .NET development expert Danny Cazzulino's top ASP.NET questions and answers.continue reading
C# developers should NOT be modifying InitializeComponent method in the code-behind (or any of the variable definitions) by hand.continue reading
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.