Tuesday, 31 March 2009
XML Parser
Monday, 30 March 2009
Touching up
Sunday, 29 March 2009
The Crawler
Thursday, 26 March 2009
Carry On Coding
Tuesday, 24 March 2009
Continuation of coding
Monday, 23 March 2009
Coding Commences
Thursday, 19 March 2009
Researching XML Differencing Tools
looked at so that the most appropriate tool for the job is chosen.
The first tool I looked at was stylus studio:
http://www.stylusstudio.com/
The free version (only available for 30 days) doesn't seem to register API and doesn't look like it Will do what I want it to anyway However this does look like a very powerful XML manipulation tool.
The from google I came across this useful link:
It suggests a number of open source java XML tools, the XML Unit Java API looked the most useful so I began to research this.
http://xmlunit.sourceforge.net/
To run this software you will need: - Junit (http://www.junit.org/) - a JAXP compliant XML SAX and DOM parser (e.g. Apache Xerces) - a JAXP/Trax compliant XSLT engine (e.g. Apache Xalan) in your classpath.
In the java doc it mentions Diff and DetailedDiff classes:
"Diff and DetailedDiff provide simplified access to DifferenceEngine by implementing the ComparisonController and DifferenceListener interfaces themselves. They cover the two most common use cases for comparing two pieces of XML: checking whether the pieces are different (this is what Diff does) and finding all differences between them (this is what DetailedDiff does)."
This looks like exactly what I am after, I will have to test it first though. At the moment I have just managed to get it installed so the next stage will be to write a little prototype to see how well and how it compares two XML files.
Another XML manipulation tool which Mike mentioned was XMLSpy I will look into this also.
I had some thoughts on the actual comparison of files process too. I was thinking whether I need two copies of the whole document set in order to work out if they have changed surely this is highly inefficient? Is there another way? Also I thought if this were then case instead of trawling through
and comparing every document, if I had a table of hash values generated from each document then compared them, would this be more efficient to at least identify which ones have changed? Obviously this does have the tiny risk or a hash value being generated twice exactly the same. Just a couple of points to
think about.
Wednesday, 18 March 2009
Meeting With Mike
Day Fifteen
Friday, 13 March 2009
Day Thirteen/Fourteen
Tuesday, 10 March 2009
Day Twelve
Source Title
Source Topic
Author info (Discipline / Credentials)
Research Question(s)
Methodology
Result(s)
Relation to topic
Strength(s) / Weakness
Potential bias
Contextual Grounding (Location, Date etc.)
References
As well as this review of further academic documents continued. I feel I'm a bit behind on this, however I have been making progress with the implementation of the system, so it is fairly justified. However 100% focus will be needed on this until the end of the week as I have covered no where near enough academic documents.
Monday, 9 March 2009
Day Eleven
Friday, 6 March 2009
Day Ten
Wednesday, 4 March 2009
Day Nine
The lit review research continued today. Today I focused on the two psychology papers as the main focus of this project will be the attentional model. I actually found this papers interesting, having never studies psychology before.
The paper is:
http://www.infosci-online.com/downloadPDF/encyclopedias/IGR1854_G8WXb459Z3.pdf
The first paper was a good introduction to attentional theories it explained that attention was, “The processes by which we select information”
So in my project the idea will be select relevant words as a human would, this will involve elements of AI. Unfortunately the main focus of the paper in relation to computation attention was GUI's however I did take a lot from this paper.
The second paper went a lot more in depth into how the brain makes decisions
http://www.infosci-online.com/downloadPDF/pdf/ITJ4450_74ZWULUIHI.pdf
It went into mathematical models of the brain, I'm not sure that I fully understood it but again it was useful, I had to familiarise myself with things like unions again!
It explain how different people have gone about modelling the brain, a couple of the approaches were:
That the brain is layered an each layer has different characteristics, they explained the problem with this was that it was hard to determine how many layers the model needed - a LRMB.
Another approach with they seemed to favour was that the brain is a network an OAR, there were some useful directed graphs which may be very beneficial when it comes to designing the attentional model.
Tomorrows objectives will be get a prototype of Java web services working and more work on the lit review.
Tuesday, 3 March 2009
Day Eight
Today I carried on with the Lit review. This involved researching what a lit review should actually look like as well the evaluation of academic documents. Also as I spent the day in the library I discovered a few other documents which may be of use.
I started by reading an article on indexing
http://www.iiit.net/~pkreddy/wdm03/wdm/vldb01.pdf
It suggested a method of indexing which could be up tp ten times faster done their way, it introduced their idea of a process called XISS, this relates to not just XML indexing but also the storage of the data. The paper also goes on to explain indexing algorythms, this is a paper I have skimmed over previously but as I have started the Lit review properly now I chose to take notes on it. I found some very interesting idea s however I found it very hard going, so I've only been through half of the paper but it should probably be revisited. Although the indexing method in this project isn't actually that important a fairly novel approach should be used but this paper does give some good insight. The most important aspect of the project will be the attentional models, so as I found the indexing paper so hard going I moved on to the psychological aspect of the project and have came accross two papers which look very useful, so tomorrow I shall read through the rest of these making notes and hopefully it will give me an idea of how to approach the attentional model because at the moment I have no idea what one actually is!
The two papers are:
http://www.infosci-online.com/downloadPDF/encyclopedias/IGR1854_G8WXb459Z3.pdf
http://www.infosci-online.com/downloadPDF/pdf/ITJ4450_74ZWULUIHI.pdf
Monday, 2 March 2009
Day Seven
Completed the requirements specification, which I'm fairly pleased with as it looked fairly professional and it's been a while since I've done requirements. There may be some refinements needed at a later date as I research further into the project. The use case diagram may need revisiting and also some functional requirements may be better off migrating to a "Environmental" requirements section.
Next week objectives will be to do as much of the lit review as possible, start to look at java web services and how RSS feeds work and also may start a project plan.