FU Berlin

<xmlcity:berlin>

Bundesministerium für Bildung und Forschung

Wachstumskerne

   
   
 

> Über das XML Clearinghouse
> Ansprechpartner
 

XML in Berlin und Brandenburg
> Firmen
> Forschungsvorhaben
> Lehrveranstaltungen
> Stellenangebote
 

Veranstaltungen des XML Clearinghouses
> XML Kolloquium
> Workshops
> Konferenzen
> Infotage
 

XML im deutsch-sprachigen Raum
> XML Veranstaltungen
> Standards und Entwicklungen
> Fördermöglichkeiten
> XML Reports
 

XML News
> Deutschsprachig
> Englischsprachig
> Semantic Web
> E-Business
> News einstellen
 

Online-Zertifikate
> XML-Grundlagen
> Electronic Business
> Semantic Web
 

Testlabor
> Ontology Reviewing System
 
 
 

Semantic Web News

Automatische Zusammenstellung von XML-News aus verschiedenen Quellen, eingebunden mit RSS. Alle Rechte verbleiben beim jeweiligen Anbieter.

Freitag, 11. Mai 2007

RDF Path Queries
Semantic Web (2007-05-11 14:09)
SPARQ2L: Towards Support For Subgraph Extraction Queries in RDF Databases

Many applications in analytical domains often have the need to connect the dots i.e., query about the structure of data. In bioinformatics for example, it is typical to want to query about interactions between proteins. The aim of such queries is to extract relationships between entities i.e. paths from a data graph. Often, such queries will specify certain constraints that qualifying results must satisfy e.g. paths involving a set of mandatory nodes. Unfortunately, most present day Semantic Web query languages including the current draft of the anticipated recommendation SPARQL, lack the ability to express queries about arbitrary path structures in data.


Implemented using Java and Berkley DB and the memory store Brahms. Also mentions PSPARQL (part of Exmo), which in February reached version complete status.
GRDDL Use Cases document published
Semantic Web (2007-05-11 14:09)
The GRDDL Working Group has published GRDDL Use Cases: Scenarios of extracting RDF data from XML documents as a Working Group Note on the 6th of April. This complements the publication of the GRDDL Test Cases and the GRDDL Specification documents, both published recently.
Lucene for the Semantic Web
Semantic Web (2007-05-11 14:09)
Hbase
Google's [WWW] Bigtable, a distributed storage system for structured data, is a very effective mechanism for storing very large amounts of data in a distributed environment.

Just as Bigtable leverages the distributed data storage provided by the [WWW] Google File System, Hbase will provide Bigtable-like capabilities on top of Hadoop.

Data is organized into tables, rows and columns, but a query language like SQL is not supported. Instead, an Iterator-like interface is available for scanning through a row range (and of course there is an ability to retrieve a column value for a specific key).

Any particular column may have multiple values for the same row key. A secondary key can be provided to select a particular value or an Iterator can be set up to scan through the key-value pairs for that column given a specific row key.


From the Hbase/HbaseArchitecture page:
HBase uses a data model very similar to that of Bigtable. Users store data rows in labelled tables. A data row has a sortable key and an arbitrary number of columns. The table is stored sparsely, so that rows in the same table can have crazily-varying columns, if the user likes.

A column name has the form ":


The example tables given are very similar to untyped relations. This has only just become part of the nightly build.

Via, Data Parallel.
Ontologies in 3D
Semantic Web (2007-05-11 14:09)
"OntoSphere3D is a Protégé plug-in for ontologies navigation and inspection using a 3-dimensional hyper-space where information is presented on a 3D view-port enriched by several visual cues (as the colour or the size of visualized entities)."

While I'm not usually a fan of 3D interfaces this is a fairly intuitive approach. It's certainly seems better than the neighbourhood view and hyperbolic view that they talk about in their paper. They offer a "global view" (projected onto a sphere) and "tree focus".

Java 3D can be pain to install, at least it was for me, as it wouldn't detect Protégé's JRE to install into.
State of Origin
Semantic Web (2007-05-11 14:09)
Same-Origin Policy Part 1: Why we’re stuck with things like XSS and XSRF/CSRF
In my experience most developers—and even many security people—don’t really know what the same-origin policy is. Worse yet, the rise of AJAX and mash-ups seems to have turned same-origin into something developers are trying to break. Complicating the issue further are the weaknesses in most browsers’ implementations of same-origin, leaving open questions about the effectiveness of the policy itself.


I was surprised that not everyone understood this recently talking to Web 2.0 gurus. Points to the very clear definition of what exactly is the same origin. See also: Subverting Ajax, which includes things like using XSS to extend the XMLHttpRequest object to capture calls and to record the data being transmitted.

Update: Just noticed that the lastest Crypto-gram highlights a similar attack overriding object creation.
A New SQL
Semantic Web (2007-05-11 14:09)
ETech '07 Summary - Part 2 - MegaData.

Here's the thing, we need a new kind of data store, a new kind of SQL, something that does for storing and querying large amounts of data what SQL did for normalized data.

Sure you can store a lot of data in a relational database, but when I say large, I mean really large; a billion or more records. I know we need this because I keep seeing people build it.


All this talk about making SPARQL behave like SQL maybe for nothing if people realize that's not what they need after all.

The back of the envelope scalability for an RDF store would be potentially 100s of billions of statements.

The key requirements highlighted are: distributed, joinless (no referential integrity at the store level), denormalized and transactionless.

I was aware of this because a comment linked to one of my previous posts about Kowari scalability (which I must of snuck through at some stage). Kowari got up to 10,000 triples/second later on its life.
The Killer Demo
Semantic Web (2007-05-11 14:09)
What a relief there's a demo that can turn someone onto the “semantic” Web, this has obviously been a long time coming. As long as you believe the hype, the way you get the Semantic Web is one A-list blogger at a time.

Other comments: "Scoble Gets the Semantic Web", "Scoble Gets the Semantic Web", "Nova Spivack sees to it that Robert Scoble finally gets the Semantic Web", "You say Tomato...", "Describing the Semantic Data Web (Take 2)" and "QOTD : Scobleized".
Defenders of the Web
Semantic Web (2007-05-11 14:09)
Tim Berners Lee goes postal on spam "So how are you going to stop the Semantic Web being poisoned?"

TBL, the GLB, replied:

Well, everybody who's building the semantic web pretty much that I know are building systems take data from lots of places, but take data with an awareness of where those places are. So for example, suppose you're getting Geotags and the OS runs a service, lots of people in this country might trust the OS to say this point has a church with a spire - other people might say it's a great church to go to, other people might say it's a heathen church to go to... those are the other sources of data...

There was no let up from the press:

"But that was the basis for Google, and Google got poisoned... "

Shadbolt and Hendler stepped in to shield Sir Tim, but he was seething at the impertinence:

I remember a conference, we were discussing the Semantic Web, and someone asked what do you think is the worst thing that can happen and all the pencils come out. I know you two have been asking about "Woargh - I know the one about... what about the bad guys? Won't we be phished" There's a temptation to give readers about all the terrible things out there OK, and all the ways the web can become less usable.

At this point, your reporter wanted to remind Sir Tim that of all the problems the web has, a hostile press is not one of them. In fact, you can't pick up a newspaper or magazine without reading about how it's ushering in a New Age of Enlightenment. Time magazine gave "Person Of The Year" to every web user in America - or at least every one who looked at the mirror Time placed on its front cover.

He continued, cryptically:

Yes you'll find a bank that's less usable - ... I've never been phished.

So the Greatest Living Briton has never been phished, which is a relief. His answer to the Semantic Web didn't inspire much confidence for the rest of us: it would be used within the firewall, amongst trusted groups, "areas where one is much less worrying about the bad guys"."
RIF publishes first technical working draft
Semantic Web (2007-05-11 14:09)
The RIF WG released the first public working draft of the RIF Core specification at http://www.w3.org/TR/rif-core/. The RIF Core is a very basic interchange format for rules, based on horn logic. While this WD lays the foundation for the RIF Core, several issues remain unspecified and expected to be addressed in future drafts of the Core spec, for example: OWL and RDF compatibility How to interchange application data models used by rules Precisely how URIs will be used in RIF rules Whether and how much meta-data should be included in RIF translations, for e.g. preserving rules in "round trip" translations
Patterns In Software - Part 4
Semantic Web (2007-05-11 14:09)
Patterns of Software: Tales from the Software Community (freely available in PDF). The chapter I'll be covering this time is: "The Failure of Pattern Languages".

So it begins with the punch that the last few chapters have been building up to:

Alexander’s story does not end with the publication of A Pattern Language in 1977. It went on and still goes on. And Alexander did not sit still after he wrote the patterns. Like any scientist he tried them out.

And they did not work.


The reason, quoted by Gabriel is that other processes other than architecture play a more fundamental role. That a building is not just the process of slapping together a bunch of very well thought out patterns based on emperical evidence. It's the result of many other processes like finance, zoning, construction, etc.

...one problem with the building process is lump-sum development. In such development few resources are brought to bear on the problems of repair and piecemeal growth. Instead, a large sum of money is dedicated to building a large artifact, and that artifact is allowed to deteriorate somewhat, and anything that is found lacking in the design or construction is ignored or minimally addressed until it is feasible to abandon the building and construct a replacement.


Gabriel gives an example from a previous chapter about how these processes can destroy the otherwise good qualities of development. For example, the process of getting a mortgage and paying it off directly influences the types of buildings built and used. Generally, you invest a large amount of money in a property, so large, that you usually can't afford to make piecemeal modifications because you can barely afford paying it off. This is okay as long as people are jumping from house to house. This avoids fixing up the problems there maybe with these houses until things get really bad and they are knocked down and rebuilt. Perhaps loosing what was wrong with them in the first place.

So if you retain these old processes you still get the old results. Now it's clear in software that this is also the case - a nicely architected and controlled software project does not necessarily lead to high quality software. How often have you worked on a software project that you knew was going to be re-written in a few years anyway?

Gabriel suggest that the answer lies in good code and coders not in the typical separation of analysis, design and implementation:

And isn’t the old-style software methodology to put design in the hands of analysts and designers and to put coding in the hands of lowly coders, sometimes offshore coders who can be paid the lowest wages to do the least important work?

Methodologists who insist on separating analysis and design from coding are missing the essential feature of design: The design is in the code, not in a document or in a diagram. Half a programmer’s time is spent exploring the code, not in typing it in or changing it. When you look at the code you see its design, and that’s most of what you’re looking at, and it’s mostly while coding that you’re designing.


And finally, that the typical software patterns are not following Christopher Alexander's original concept of a pattern language:

When I look at software patterns and pattern languages, I don’t see the quality without a name in them, either. Recall that Alexander said that both the patterns themselves and the pattern language have the quality. In many cases today the pattern languages are written quickly, sort of like students doing homework problems. I heard one neophyte pattern writer say that when writing patterns he just writes what he knows and can write four or five patterns at a sitting.


The answer is to build software piecemeal, incrementally, partially designed and reflecting - much more like a Turkish rugs and that's the next chapter, "The Bead Game, Rugs, and Beauty".
What no MOOSE?
Semantic Web (2007-05-11 14:09)
I promised myself I wasn't going to do this again, but I have one more, one more reason why the option of even having DISTINCT/LOOSE/CHOOSE in SPARQL is a bad idea. Part of this is stimulated because once more I'm sitting next to people trying to make the Semantic Web work and from my perspective SPARQL is letting them down.

It's not an new reason, it's one I wrote in 2004 which offers a pretty good reason why having this as an optional feature doesn't make sense for RDF:

"The other issue with the SPARQL is the lack of an implicit distinct. In my understanding of SQL, DISTINCT is optional because if your queries work on normalized data and joins are based on distinct keys then the returned results cannot be duplicated. If your query works on rows with repeated values on the same column then you apply DISTINCT.

In RDF's data model there isn't really this problem of duplicated data and normalization. SPARQL has the idea of matching statements in the graph. From my understanding, RDF's data model doesn't support the idea of multiple subject, predicates and/or objects with the same values.

In other words, it only seems valid that if a query matches one result in the graph it should return that one unique result not repeated multiple results."

This is on top of the other reasons I came up in "Bagging SPARQL". This could actually be seen as further discussion from the the initial response I got. Among other things, it was said that duplicates could arise by querying multiple graphs. I'd argue that forced distinct values provide the context to effectively count (or perform other aggregate values) across these multiple graphs.

It's three years on and they couldn't even allows users to declaratively count the number of statements in this mystical, future web of data.
Extending SPARQL
Semantic Web (2007-05-11 14:09)
A few papers related to extending SPARQL, I may update this as I find more.

  • Apart from, SPARQ2L and PSPARQL there's also at the ESWC 2007 conference: iSPARQL (which uses SimPack) and SPARQLeR (which supports some quite sophisticated path queries)

  • SPARQL-DL: SPARQL Query for OWL-DL An enhancement of SPARQL to support DL semantics. And they note: "SPARQL-DL is a step between RDF QLs that are too unstructured w.r.t. OWL-DL and DL QLs which are not as expressive. We believe SPARQL-DL would help interoperability on the Semantic Web as it bridges this gap. As part of future work, we intend to investigate other possible extensions to SPARQL-DL including (but not limited to) aggregation operators, epistemic operators (and negation as failure), and regular expressions on OWL properties."

  • SPARQL/Update Adding INSERT, MODIFY, DELETE and UPDATE to SPARQL. I'd seen this before but hadn't linked to it.

Jobs for my Dog
Semantic Web (2007-05-11 14:09)
So I'm idly looking at the escalating real estate market and was considering ways my dog could finally make his living. So I thought back across jobs that I'd seen that could be done equally well by him:

  1. A station master. While this sounds like a job requiring thumbs, really not so much. The station being mastered no longer received any trains but still took truck deliveries. Not very many though. Now, the station master is unable to load or unload these deliveries - this is done by the drivers. So the job mainly consisted of watching people drop things off and pick things up; a job for my dog.

  2. Train driver. In recent memory an accident occurred that had the distinct posibility of being caused by both drivers doing things other than driving the train. The introduction of an automated systems was suggested to prevent it happening again. Driving an automated train is something that my dog could do quite well. Actually, he could do it twice as well and he wouldn't join the union. I'm not sure he'd stick to it as it would bore the heck out of him too.

  3. Doorman. Now normally this would require the ability to open and close doors. But more recently, I noticed there was a guy at a building watching an automatic door open and close. Maybe he gave directions or something but the door watching seemed to be his prime activity. I've seen my dog look attentively at the front door - he's very qualified.

  4. Network Administrator. Maybe the frontal lobes need to be developed further for this one but then again maybe not. The key activities undertaken for this position, that I've seen, revolved around attending meetings and playing Solitaire. I think my dog could do this job and he wouldn't even need a computer. Apparently, this is not an uncommon job amongst dogs.

  5. Phone Handler (?). These are the people I've found at the front of buildings saying, "Please use that phone to contact people before going up". Usually replaced by a sign, I think an indicative paw would do as well.

GRDDL is a Candidate Recommendation
Semantic Web (2007-05-11 14:09)
W3C is pleased to announce the advancement of GRDDL to Candidate Recommendation and the publication of GRDDL Test Cases as a Last Call Working Draft. Implementation feedback and comments are welcome through 31 May. Comments on this document should be sent to public-grddl-comments@w3.org, a mailing list with a public archive.
An Efficient Link Store
Semantic Web (2007-05-11 14:09)
Romulus and Remus - C# and Java
Semantic Web (2007-05-11 14:09)
Ted Neward talks about C# and Java. Also mentions the possible .NET backlash, Scala and LINQ. There's also a very impressive demo of LINQ (about 1/4 of the way through the code demo starts) - the struggle from imperative to declarative.
RIF WG nears decision on RDF, Disjointness, and IRIs
Semantic Web (2007-05-11 14:09)
Recent progress has been made on some technical issues that have been raised during the design of the Core Rule Interchange Format include RDF Compatibility and Disjoint Names and Definition of URI. The RIF WG has decided to use IRIs. All externally visible symbols in a RIF ruleset will have IRIs as identifiers. The RIF WG has decided that predicates, functions, and individuals in RIF rules can be identified by the same symbol (unlike OWL DL, which requires that a URI identify precisely one of a class, a property, or an individual). The RIF WG has agreed to consider treating RDF data with a special quasi-object-oriented syntax (which has been referred to in the group as a slotted syntax, in which RDF nodes are treated as objects and the properties of a node as slots. Although this is not a formal decision, the group agreed to use this treatment of RDF in RIF Core Rules for the next working draft and solicit feedback.
Defeasible Logic Plus Time
Semantic Web (2007-05-11 14:09)
Temporal extensions to Defeasible Logic. Non-monotonic reasoning is about adding more information over time to reach different conclusions. Rather than adding information adding temporal extensions actually removes when this information applies.
YARS2
Semantic Web (2007-05-11 14:09)
With little fanfare the folks at DERI have announced YARS2. I know of at least 4 next generation RDF stores (you know who you are) with a few others on the drawing board. Storing data is cool again.

To save disk space for the on-disk indices, we compress the individual blocks using Huffman coding. Depending on the data values and the sorting order of the index, we achieve a compression rate of ≈ 90%. Although compression has a marginal impact on performance, we deem that the benefits of saved disk space for large index files outweighs the slight performance dip.

Figure 4 shows the correspondence between block size and lookup time, and also shows the impact of Huffman coding on the lookup performance; block sizes are measured pre-compression. The average lookup time for a data file with 100k entries (random lookups for all subjects in the index) using a 64k block size is approximately 1.1 ms for the uncompressed and 1.4 ms for the compressed data file. For 90k random lookups over a 7 GB data file with 420 million synthetically generated triples (more on that dataset in Section 7), we achieve an average seek time of 23.5 ms.


It good to see that text searching on literals now seems like a standard feature too (I can't think of the last several announcements where this wasn't the case). They used a spare index to create all 6 indices. They also hint out how reasoning is going to be performed by linking to, "Unifying Reasoning and Search to Web Scale", which suggests a tradeoff over time and trust.
You Are Here, Now
Semantic Web (2007-05-11 14:09)
Mistakes I've Made a Few
Semantic Web (2007-05-11 14:09)
So the ghosts of implementations past have come to haunt me in recent weeks mostly around Kowari/Tucana (now Mulgara).

Firstly, there was exclude and NAF in iTQL:

> It was introduced to provide
> a limited form of negation, and one that interacts poorly with the
> open-world assumption. We also now have minus, which is well
> defined, corresponds closely to our intuitive understanding of the
> operation, and is (I am told) what was actually required.
>
> If my memory is correct we should probably at least deprecate, if
> not remove exclude entirely from mulgara.
>
> Do we agree that exclude should be removed?

Well *I* agree anyway. As you say, it doesn't do what anyone thinks
it does.

> If it should be removed, when should this occur?

Yesterday.


And now adding the Jena API to Kowari cost customers:
Another possible reason is Tucana/Kowari/Mulgara’s Jena support - originally put in to provide a migration path for companies looking to move on from research projects to scalable infrastructure - which as Jena is the defacto semweb tool of choice, people used to evaluate Kowari’s scalability. Jena’s lack of scaling hurt us several times, I can remember lots of frantic calls as some company wrote us off because of our Jena API.


I'm current still working in this area (maybe somewhat surprisingly) and Jena still dominates (all of the tools I'm currently looking at are Jena based). And I still haven't seen a Jena implementation that scales (see page 10). Maybe the decision to open source Kowari cost another round of funding too. Maybe this is why Garlick or Radar Network's triple stores are still behind closed doors.
Frequently Asked Questions (FAQ) on Semantic Web
Semantic Web (2007-05-11 14:09)
The SWEO Interest Group is pleased to announce the first release of a frequently asked questions (FAQ) document about the Semantic Web. It provides comprehensive answers to questions covering Semantic Web standards and their usage. This is an evolving document that will continue to be updated over time. There is also a Wiki site where the community can contribute to the further evolution of this document, as well as an RSS1.0 feed to help tracking changes on the FAQ.
Planet RDF Roundup
Semantic Web (2007-05-11 14:09)

  • Research Project: Pig Given Yahoo's usage of Hadoop it's good to see them building a query layer on top of it. And it's not SQL - hence the name - it's relationally based (it uses bags and still has DISTINCT) because that scales. Hah! Some of the documentation is gold, "In a conventional database management system, SQL queries are translated into relational algebra expressions, which are in turn translated into physical evaluation plans. Pig Latin queries are already an algebra, so we're bypassing the first layer." It even has nested relations and a flatten operation (which removes the nesting). A good read at "Yahoo Pig and Google Sawzall". Notes that Google's Sawzall looks like Scala.

  • REST Compile/Describe & WADL together with I finally get REST. Wow. all link to WADL. In what seems like an age ago, I remember several enterprise architects asking what the REST equivalent to WSDL was. The later has one of the best one liners to describe REST, "state machine as node graph traversed via URI". An example of how to use is give in "REST Describe first working Beta released"

  • Stavanger, oil, and the Semantic Web Talks about the "Norwegian Semantic Web Days" which covered things such as the OBO foundary.

Just Fearful Enough
Semantic Web (2007-05-11 14:09)
Profile of Jon Stewart
It's sort of this odd and I've always had this problem with the rationality of it. That the President says, "We are in the fight for a way of life. This is the greatest battle of our generation, and of the generations to come. "And, so what I'm going to do is you know, Iraq has to be won, or our way of life ends, and our children and our children's children all suffer. So, what I'm gonna do is send 10,000 more troops to Baghdad."

So, there's a disconnect there between — you're telling me this is fight of our generation, and you're going to increase troops by 10 percent. And that's gonna do it. I'm sure what he would like to do is send 400,000 more troops there, but he can't, because he doesn't have them. And the way to get that would be to institute a draft. And the minute you do that, suddenly the country's not so damn busy anymore. And then they really fight back, and then the whole thing falls apart. So, they have a really delicate balance to walk between keeping us relatively fearful, but not so fearful that we stop what we're doing and really examine how it is that they've been waging this.

And there was you know, this enormous amount of space and coverage to Virginia Tech, as there should have been. And I happened to catch, sort of a headline lower down, which was 200 people killed in four bomb attacks in Iraq. And I think my focus on what was happening here versus sort of this peripheral vision thing that caught my eye about, "Oh, right, there are lives--"
Even the Banner Ads are More Interesting
Semantic Web (2007-05-11 14:09)
I was reading, Sun Tells Java Plans, of which I got to the second paragraph before noticing an Apple ad (in Flash with sound). Listened to the commercial, closed the tab, I only barely care what the article was about. So who does marketing better (actually yesterday there was one with the PC guy banging his head against the banner which was better)? Of course, I could just be responding in a Pavlovian (hmm dessert) way to the background jingle.

 
         
       
 
  Seite zuletzt geändert am: 04-Jul-2007 2:02:02
Webadmin: wwwadmin@xml-Clearinghouse.de
© XML Clearinghouse   Impressum
Haftungsausschluss