Project Blog: Day Four

The first half of the day involved investigating XML manipulation API's the general view was that JDOM was the richest. I has issues installing it, but yet again realised that it belonged in java's \ext folder!

The API took a while to get familiar with as I have more experience using c#, however I found this example from IBM which was a huge help:

http://www.ibm.com/developerworks/library/x-injava2/

Then I was getting invalid XML errors which took a while to work out. It turns out that the API was putting certain closing tags in wrong places. Upon further investigation it turned out to be the "-font" parameter on the command line causing it, conveniently the tags were redundant anyway, so I now had valid XML.

This was the error:

Error on line 9 of document file:///C:/pdfoutput.xml: The element type "timesnewromanps-boldmt" must be terminated by the matching end-tag ""

Then I was faced with another problem, the program was reporting this exception:

Could not check

because Invalid byte 1 of 1-byte UTF-8 sequence.

It turned out the XML file was now not in UTF-8. To solve this I remembered a text editor which I used on placement called Ultra Edit. I opened the file up in Ultra Edit and re-saved it with UTF-8 encoding and it worked fine. So I now have a small XML parsing prototype.

The second half of the day was spent going through and making notes on the Academic documents highlighted in the project proposal:

http://www.iiit.net/~pkreddy/wdm03/wdm/vldb01.pdf

http://books.google.co.uk/books?hl=en&lr=&id=30UsZ8hy2ZsC&oi=fnd&pg=PR7&dq=cognitive+psychology+and+computing&ots=1LyL05Rj1i&sig=bX5q1YNTfqlTOANUSctLSS6f49g

http://books.google.co.uk/books?id=Rdwv-r5RlOcC&pg=PA88&dq=Attentional+models&lr=

http://books.google.co.uk/books?id=RZ-6cTL8YZsC&printsec=frontcover&dq=web+service&lr=

http://books.google.co.uk/books?id=hXTfWDkqnlkC&printsec=frontcover&dq=xml+indexing&lr=

So the Lit review has began!