Sunday, 22 February 2009

Day Two

In the morning I did a little experiment to see how other domains represent kinds of text documents with XML. Mike had mentioned that there is a way of looking at how Open Office “.odt” files are represented in XML using windows. The process was:

  1. Create an .odt file

  2. Rename it's extension to “.zip”

  3. Extract the file

  4. Look at “Content.xml

The resulting file looked like this:

http://www.megaupload.com/?d=1VZLOA04

Although the “Content.xml” file maybe a lot easier to read, I came to the conclusion that there were enough similarities with the XML file created from the PDF API that either process could be used as an example data-set for this problem. When the PDF API was re-visited it became apparent that there were some extra command line parameters which could produce extra useful elements in the XML, for example there were some which would include Hyperlinks in the XML.

As the reading of the XML was quite hard using a simple document viewer I discovered a specialist viewer called “XML Marker” this helped highlight to myself the thought behind how the PDF API represents a PDF file in XML. Before I had this tool I did not realise that there were “line” tags for blank lines and also that “word” tags between “text” tags meant that they were on the same line.

So this helped me reach the conclusion that the PDF API would be sufficient for the creation of the data set used for this project or until the IBM issues have or haven't been resolved.

Next I tried to find documentation which was large enough and presented how id imagine how the IBM documentation could possibly be presented I found this link:

www.reportlab.com/docs/PyRXP_Documentation.pdf

And processed it to see how it would handle it, it was fine please see the resulting XML at:

http://www.megaupload.com/?d=FKARW4FV

It's important to remember to make the program very flexible to changing of data-sets as it may be required that the IBM one needs to be substituted in for this one created by the XML. I know this is a principle of programming, but it is just a reminder to carry forward.

The second half of the day involved a trip to the library to get some books on requirements as I have not done any kind of requirements gathering since year to the books I found were:

  • Effective Requirements Practices – Ralph R.Young

  • Software Requirements and Specifications – Michael Jackson

(Add to Bibliography)

The remainder of the day was spend evaluating approaches to requirements.


No comments:

Post a Comment