Many applications in analytical domains often have the need to connect the dots i.e., query about the structure of data. In bioinformatics for example, it is typical to want to query about interactions between proteins. The aim of such queries is to extract relationships between entities i.e. paths from a data graph. Often, such queries will specify certain constraints that qualifying results must satisfy e.g. paths involving a set of mandatory nodes. Unfortunately, most present day Semantic Web query languages including the current draft of the anticipated recommendation SPARQL, lack the ability to express queries about arbitrary path structures in data.
Implemented using Java and Berkley DB and the memory store Brahms. Also mentions PSPARQL (part of Exmo), which in February reached version complete status.
The GRDDL Working Group has published GRDDL Use Cases: Scenarios of extracting RDF data from XML documents as a Working Group Note on the 6th of April. This complements the publication of the GRDDL Test Cases and the GRDDL Specification documents, both published recently.
Google's [WWW] Bigtable, a distributed storage system for structured data, is a very effective mechanism for storing very large amounts of data in a distributed environment.
Just as Bigtable leverages the distributed data storage provided by the [WWW] Google File System, Hbase will provide Bigtable-like capabilities on top of Hadoop.
Data is organized into tables, rows and columns, but a query language like SQL is not supported. Instead, an Iterator-like interface is available for scanning through a row range (and of course there is an ability to retrieve a column value for a specific key).
Any particular column may have multiple values for the same row key. A secondary key can be provided to select a particular value or an Iterator can be set up to scan through the key-value pairs for that column given a specific row key.
HBase uses a data model very similar to that of Bigtable. Users store data rows in labelled tables. A data row has a sortable key and an arbitrary number of columns. The table is stored sparsely, so that rows in the same table can have crazily-varying columns, if the user likes.
"OntoSphere3D is a Protégé plug-in for ontologies navigation and inspection using a 3-dimensional hyper-space where information is presented on a 3D view-port enriched by several visual cues (as the colour or the size of visualized entities)."
In my experience most developers—and even many security people—don’t really know what the same-origin policy is. Worse yet, the rise of AJAX and mash-ups seems to have turned same-origin into something developers are trying to break. Complicating the issue further are the weaknesses in most browsers’ implementations of same-origin, leaving open questions about the effectiveness of the policy itself.
I was surprised that not everyone understood this recently talking to Web 2.0 gurus. Points to the very clear definition of what exactly is the same origin. See also: Subverting Ajax, which includes things like using XSS to extend the XMLHttpRequest object to capture calls and to record the data being transmitted.
Here's the thing, we need a new kind of data store, a new kind of SQL, something that does for storing and querying large amounts of data what SQL did for normalized data.
Sure you can store a lot of data in a relational database, but when I say large, I mean really large; a billion or more records. I know we need this because I keep seeing people build it.
All this talk about making SPARQL behave like SQL maybe for nothing if people realize that's not what they need after all.
What a relief there's a demo that can turn someone onto the “semantic” Web, this has obviously been a long time coming. As long as you believe the hype, the way you get the Semantic Web is one A-list blogger at a time.
Well, everybody who's building the semantic web pretty much that I know are building systems take data from lots of places, but take data with an awareness of where those places are. So for example, suppose you're getting Geotags and the OS runs a service, lots of people in this country might trust the OS to say this point has a church with a spire - other people might say it's a great church to go to, other people might say it's a heathen church to go to... those are the other sources of data...
There was no let up from the press:
"But that was the basis for Google, and Google got poisoned... "
Shadbolt and Hendler stepped in to shield Sir Tim, but he was seething at the impertinence:
I remember a conference, we were discussing the Semantic Web, and someone asked what do you think is the worst thing that can happen and all the pencils come out. I know you two have been asking about "Woargh - I know the one about... what about the bad guys? Won't we be phished" There's a temptation to give readers about all the terrible things out there OK, and all the ways the web can become less usable.
At this point, your reporter wanted to remind Sir Tim that of all the problems the web has, a hostile press is not one of them. In fact, you can't pick up a newspaper or magazine without reading about how it's ushering in a New Age of Enlightenment. Time magazine gave "Person Of The Year" to every web user in America - or at least every one who looked at the mirror Time placed on its front cover.
He continued, cryptically:
Yes you'll find a bank that's less usable - ... I've never been phished.
So the Greatest Living Briton has never been phished, which is a relief. His answer to the Semantic Web didn't inspire much confidence for the rest of us: it would be used within the firewall, amongst trusted groups, "areas where one is much less worrying about the bad guys"."
The RIF WG released the first public working draft of the RIF Core specification at http://www.w3.org/TR/rif-core/. The RIF Core is a very basic interchange format for rules, based on horn logic. While this WD lays the foundation for the RIF Core, several issues remain unspecified and expected to be addressed in future drafts of the Core spec, for example: OWL and RDF compatibility How to interchange application data models used by rules Precisely how URIs will be used in RIF rules Whether and how much meta-data should be included in RIF translations, for e.g. preserving rules in "round trip" translations
So it begins with the punch that the last few chapters have been building up to:
Alexander’s story does not end with the publication of A Pattern Language in 1977. It went on and still goes on. And Alexander did not sit still after he wrote the patterns. Like any scientist he tried them out.
And they did not work.
The reason, quoted by Gabriel is that other processes other than architecture play a more fundamental role. That a building is not just the process of slapping together a bunch of very well thought out patterns based on emperical evidence. It's the result of many other processes like finance, zoning, construction, etc.
...one problem with the building process is lump-sum development. In such development few resources are brought to bear on the problems of repair and piecemeal growth. Instead, a large sum of money is dedicated to building a large artifact, and that artifact is allowed to deteriorate somewhat, and anything that is found lacking in the design or construction is ignored or minimally addressed until it is feasible to abandon the building and construct a replacement.
Gabriel gives an example from a previous chapter about how these processes can destroy the otherwise good qualities of development. For example, the process of getting a mortgage and paying it off directly influences the types of buildings built and used. Generally, you invest a large amount of money in a property, so large, that you usually can't afford to make piecemeal modifications because you can barely afford paying it off. This is okay as long as people are jumping from house to house. This avoids fixing up the problems there maybe with these houses until things get really bad and they are knocked down and rebuilt. Perhaps loosing what was wrong with them in the first place.
So if you retain these old processes you still get the old results. Now it's clear in software that this is also the case - a nicely architected and controlled software project does not necessarily lead to high quality software. How often have you worked on a software project that you knew was going to be re-written in a few years anyway?
Gabriel suggest that the answer lies in good code and coders not in the typical separation of analysis, design and implementation:
And isn’t the old-style software methodology to put design in the hands of analysts and designers and to put coding in the hands of lowly coders, sometimes offshore coders who can be paid the lowest wages to do the least important work?
Methodologists who insist on separating analysis and design from coding are missing the essential feature of design: The design is in the code, not in a document or in a diagram. Half a programmer’s time is spent exploring the code, not in typing it in or changing it. When you look at the code you see its design, and that’s most of what you’re looking at, and it’s mostly while coding that you’re designing.
And finally, that the typical software patterns are not following Christopher Alexander's original concept of a pattern language:
When I look at software patterns and pattern languages, I don’t see the quality without a name in them, either. Recall that Alexander said that both the patterns themselves and the pattern language have the quality. In many cases today the pattern languages are written quickly, sort of like students doing homework problems. I heard one neophyte pattern writer say that when writing patterns he just writes what he knows and can write four or five patterns at a sitting.
The answer is to build software piecemeal, incrementally, partially designed and reflecting - much more like a Turkish rugs and that's the next chapter, "The Bead Game, Rugs, and Beauty".
I promised myself I wasn't going to do this again, but I have one more, one more reason why the option of even having DISTINCT/LOOSE/CHOOSE in SPARQL is a bad idea. Part of this is stimulated because once more I'm sitting next to people trying to make the Semantic Web work and from my perspective SPARQL is letting them down.
It's not an new reason, it's one I wrote in 2004 which offers a pretty good reason why having this as an optional feature doesn't make sense for RDF:
"The other issue with the SPARQL is the lack of an implicit distinct. In my understanding of SQL, DISTINCT is optional because if your queries work on normalized data and joins are based on distinct keys then the returned results cannot be duplicated. If your query works on rows with repeated values on the same column then you apply DISTINCT.
In RDF's data model there isn't really this problem of duplicated data and normalization. SPARQL has the idea of matching statements in the graph. From my understanding, RDF's data model doesn't support the idea of multiple subject, predicates and/or objects with the same values.
In other words, it only seems valid that if a query matches one result in the graph it should return that one unique result not repeated multiple results."
This is on top of the other reasons I came up in "Bagging SPARQL". This could actually be seen as further discussion from the the initial response I got. Among other things, it was said that duplicates could arise by querying multiple graphs. I'd argue that forced distinct values provide the context to effectively count (or perform other aggregate values) across these multiple graphs.
It's three years on and they couldn't even allows users to declaratively count the number of statements in this mystical, future web of data.
A few papers related to extending SPARQL, I may update this as I find more.
Apart from, SPARQ2L and PSPARQL there's also at the ESWC 2007 conference: iSPARQL (which uses SimPack) and SPARQLeR (which supports some quite sophisticated path queries)
SPARQL-DL: SPARQL Query for OWL-DL An enhancement of SPARQL to support DL semantics. And they note: "SPARQL-DL is a step between RDF QLs that are too unstructured w.r.t. OWL-DL and DL QLs which are not as expressive. We believe SPARQL-DL would help interoperability on the Semantic Web as it bridges this gap. As part of future work, we intend to investigate other possible extensions to SPARQL-DL including (but not limited to) aggregation operators, epistemic operators (and negation as failure), and regular expressions on OWL properties."
SPARQL/Update Adding INSERT, MODIFY, DELETE and UPDATE to SPARQL. I'd seen this before but hadn't linked to it.
So I'm idly looking at the escalating real estate market and was considering ways my dog could finally make his living. So I thought back across jobs that I'd seen that could be done equally well by him:
A station master. While this sounds like a job requiring thumbs, really not so much. The station being mastered no longer received any trains but still took truck deliveries. Not very many though. Now, the station master is unable to load or unload these deliveries - this is done by the drivers. So the job mainly consisted of watching people drop things off and pick things up; a job for my dog.
Train driver. In recent memory an accident occurred that had the distinct posibility of being caused by both drivers doing things other than driving the train. The introduction of an automated systems was suggested to prevent it happening again. Driving an automated train is something that my dog could do quite well. Actually, he could do it twice as well and he wouldn't join the union. I'm not sure he'd stick to it as it would bore the heck out of him too.
Doorman. Now normally this would require the ability to open and close doors. But more recently, I noticed there was a guy at a building watching an automatic door open and close. Maybe he gave directions or something but the door watching seemed to be his prime activity. I've seen my dog look attentively at the front door - he's very qualified.
Network Administrator. Maybe the frontal lobes need to be developed further for this one but then again maybe not. The key activities undertaken for this position, that I've seen, revolved around attending meetings and playing Solitaire. I think my dog could do this job and he wouldn't even need a computer. Apparently, this is not an uncommon job amongst dogs.
Phone Handler (?). These are the people I've found at the front of buildings saying, "Please use that phone to contact people before going up". Usually replaced by a sign, I think an indicative paw would do as well.
W3C is pleased to announce the advancement of GRDDL to Candidate Recommendation and the publication of GRDDL Test Cases as a Last Call Working Draft. Implementation feedback and comments are welcome through 31 May. Comments on this document should be sent to public-grddl-comments@w3.org, a mailing list with a public archive.
Some ideas for static triple indexing "Most mature triplestores also index a 4th query element ‘graph’ or ‘context’. I intend to support this query type without expanding the index by using a trick: In my triples format the fact that the subjects are auto-generated and local to the graph means I can choose them to be sequential and effectively re-use them as graph indexes..."
PAGE a distributed triple store using DHT and YARS (the original).
Haskell and the Faith of Programming Languages Phillip Wadler gives a rather brilliant talk on programming languages. Covers Haskell, Java generics, combining different typed languages (weak, strong, very strong) as well as monads and Links.
Ted Neward talks about C# and Java. Also mentions the possible .NET backlash, Scala and LINQ. There's also a very impressive demo of LINQ (about 1/4 of the way through the code demo starts) - the struggle from imperative to declarative.
Recent progress has been made on some technical issues that have been raised during the design of the Core Rule Interchange Format include RDF Compatibility and Disjoint Names and Definition of URI. The RIF WG has decided to use IRIs. All externally visible symbols in a RIF ruleset will have IRIs as identifiers. The RIF WG has decided that predicates, functions, and individuals in RIF rules can be identified by the same symbol (unlike OWL DL, which requires that a URI identify precisely one of a class, a property, or an individual). The RIF WG has agreed to consider treating RDF data with a special quasi-object-oriented syntax (which has been referred to in the group as a slotted syntax, in which RDF nodes are treated as objects and the properties of a node as slots. Although this is not a formal decision, the group agreed to use this treatment of RDF in RIF Core Rules for the next working draft and solicit feedback.
Temporal extensions to Defeasible Logic. Non-monotonic reasoning is about adding more information over time to reach different conclusions. Rather than adding information adding temporal extensions actually removes when this information applies.
To save disk space for the on-disk indices, we compress the individual blocks using Huffman coding. Depending on the data values and the sorting order of the index, we achieve a compression rate of ≈ 90%. Although compression has a marginal impact on performance, we deem that the benefits of saved disk space for large index files outweighs the slight performance dip.
Figure 4 shows the correspondence between block size and lookup time, and also shows the impact of Huffman coding on the lookup performance; block sizes are measured pre-compression. The average lookup time for a data file with 100k entries (random lookups for all subjects in the index) using a 64k block size is approximately 1.1 ms for the uncompressed and 1.4 ms for the compressed data file. For 90k random lookups over a 7 GB data file with 420 million synthetically generated triples (more on that dataset in Section 7), we achieve an average seek time of 23.5 ms.
It good to see that text searching on literals now seems like a standard feature too (I can't think of the last several announcements where this wasn't the case). They used a spare index to create all 6 indices. They also hint out how reasoning is going to be performed by linking to, "Unifying Reasoning and Search to Web Scale", which suggests a tradeoff over time and trust.
Guice Talk by Google Listing all the good ways DI makes your code better - and yes testing is the main one, more OO is another, better than service locater (J2EE) etc. Not much new here, except for no XML, for people who have been doing this for a while.
Three reasons that REST is not RPC "Being able to do state transition processing at disparate locations is hugely powerful...A single process can span machines offering differing levels of scalability, reliability and security."
> It was introduced to provide > a limited form of negation, and one that interacts poorly with the > open-world assumption. We also now have minus, which is well > defined, corresponds closely to our intuitive understanding of the > operation, and is (I am told) what was actually required. > > If my memory is correct we should probably at least deprecate, if > not remove exclude entirely from mulgara. > > Do we agree that exclude should be removed?
Well *I* agree anyway. As you say, it doesn't do what anyone thinks it does.
> If it should be removed, when should this occur?
Another possible reason is Tucana/Kowari/Mulgara’s Jena support - originally put in to provide a migration path for companies looking to move on from research projects to scalable infrastructure - which as Jena is the defacto semweb tool of choice, people used to evaluate Kowari’s scalability. Jena’s lack of scaling hurt us several times, I can remember lots of frantic calls as some company wrote us off because of our Jena API.
I'm current still working in this area (maybe somewhat surprisingly) and Jena still dominates (all of the tools I'm currently looking at are Jena based). And I still haven't seen a Jena implementation that scales (see page 10). Maybe the decision to open source Kowari cost another round of funding too. Maybe this is why Garlick or Radar Network's triple stores are still behind closed doors.
The SWEO Interest Group is pleased to announce the first release of a frequently asked questions (FAQ) document about the Semantic Web. It provides comprehensive answers to questions covering Semantic Web standards and their usage. This is an evolving document that will continue to be updated over time. There is also a Wiki site where the community can contribute to the further evolution of this document, as well as an RSS1.0 feed to help tracking changes on the FAQ.
Research Project: Pig Given Yahoo's usage of Hadoop it's good to see them building a query layer on top of it. And it's not SQL - hence the name - it's relationally based (it uses bags and still has DISTINCT) because that scales. Hah! Some of the documentation is gold, "In a conventional database management system, SQL queries are translated into relational algebra expressions, which are in turn translated into physical evaluation plans. Pig Latin queries are already an algebra, so we're bypassing the first layer." It even has nested relations and a flatten operation (which removes the nesting). A good read at "Yahoo Pig and Google Sawzall". Notes that Google's Sawzall looks like Scala.
REST Compile/Describe & WADL together with I finally get REST. Wow. all link to WADL. In what seems like an age ago, I remember several enterprise architects asking what the REST equivalent to WSDL was. The later has one of the best one liners to describe REST, "state machine as node graph traversed via URI". An example of how to use is give in "REST Describe first working Beta released"
It's sort of this odd and I've always had this problem with the rationality of it. That the President says, "We are in the fight for a way of life. This is the greatest battle of our generation, and of the generations to come. "And, so what I'm going to do is you know, Iraq has to be won, or our way of life ends, and our children and our children's children all suffer. So, what I'm gonna do is send 10,000 more troops to Baghdad."
So, there's a disconnect there between — you're telling me this is fight of our generation, and you're going to increase troops by 10 percent. And that's gonna do it. I'm sure what he would like to do is send 400,000 more troops there, but he can't, because he doesn't have them. And the way to get that would be to institute a draft. And the minute you do that, suddenly the country's not so damn busy anymore. And then they really fight back, and then the whole thing falls apart. So, they have a really delicate balance to walk between keeping us relatively fearful, but not so fearful that we stop what we're doing and really examine how it is that they've been waging this.
And there was you know, this enormous amount of space and coverage to Virginia Tech, as there should have been. And I happened to catch, sort of a headline lower down, which was 200 people killed in four bomb attacks in Iraq. And I think my focus on what was happening here versus sort of this peripheral vision thing that caught my eye about, "Oh, right, there are lives--"
I was reading, Sun Tells Java Plans, of which I got to the second paragraph before noticing an Apple ad (in Flash with sound). Listened to the commercial, closed the tab, I only barely care what the article was about. So who does marketing better (actually yesterday there was one with the PC guy banging his head against the banner which was better)? Of course, I could just be responding in a Pavlovian (hmm dessert) way to the background jingle.