The first half of the day involved investigating XML manipulation API's the general view was that JDOM was the richest. I has issues installing it, but yet again realised that it belonged in java's \ext folder!
The API took a while to get familiar with as I have more experience using c#, however I found this example from IBM which was a huge help:
http://www.ibm.com/developerworks/library/x-injava2/
Then I was getting invalid XML errors which took a while to work out. It turns out that the API was putting certain closing tags in wrong places. Upon further investigation it turned out to be the "-font" parameter on the command line causing it, conveniently the tags were redundant anyway, so I now had valid XML.
This was the error:
Error on line 9 of document file:///C:/pdfoutput.xml: The element type "timesnewromanps-boldmt" must be terminated by the matching end-tag ""
Then I was faced with another problem, the program was reporting this exception:
Could not check
because Invalid byte 1 of 1-byte UTF-8 sequence.
It turned out the XML file was now not in UTF-8. To solve this I remembered a text editor which I used on placement called Ultra Edit. I opened the file up in Ultra Edit and re-saved it with UTF-8 encoding and it worked fine. So I now have a small XML parsing prototype.
The second half of the day was spent going through and making notes on the Academic documents highlighted in the project proposal:
http://www.iiit.net/~pkreddy/wdm03/wdm/vldb01.pdf
http://books.google.co.uk/books?id=Rdwv-r5RlOcC&pg=PA88&dq=Attentional+models&lr=
http://books.google.co.uk/books?id=RZ-6cTL8YZsC&printsec=frontcover&dq=web+service&lr=
http://books.google.co.uk/books?id=hXTfWDkqnlkC&printsec=frontcover&dq=xml+indexing&lr=
So the Lit review has began!
No comments:
Post a Comment