kafsemo.org

“Now!... Gimme that sweet sweet grant money!”

Hullo!

Sorting RDF for readable output

Thursday, 19 June 2014 (#)

As with rows in SQL, the tuples in RDF have no inherent ordering. However, when transcribing RDF in different notations, the ordering may influence the output. Selecting an ordering to take advantage of that can dramatically improve the readability of the final document.

Consider this example from Wikipedia’s RDF article. Here’s a description of the article on Tony Benn in Turtle:

@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc:   <http://purl.org/dc/elements/1.1/> .

<http://en.wikipedia.org/wiki/Tony_Benn>
    dc:publisher "Wikipedia" ;
    dc:title "Tony Benn" ;
    foaf:primaryTopic [
        a foaf:Person ;
        foaf:name "Tony Benn"
    ] .

This defines five statements, shown here as N-Triples:

_:genid1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
_:genid1 <http://xmlns.com/foaf/0.1/name> "Tony Benn" .
<http://en.wikipedia.org/wiki/Tony_Benn> <http://purl.org/dc/elements/1.1/publisher> "Wikipedia" .
<http://en.wikipedia.org/wiki/Tony_Benn> <http://purl.org/dc/elements/1.1/title> "Tony Benn" .
<http://en.wikipedia.org/wiki/Tony_Benn> <http://xmlns.com/foaf/0.1/primaryTopic> _:genid1 .

Semantically, these are identical. The ordering has no effect on the meaning. We can also write those same statements as RDF/XML:

<rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:nodeID="genid1">
    <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
  </rdf:Description>
  <rdf:Description rdf:nodeID="genid1">
    <foaf:name>Tony Benn</foaf:name>
  </rdf:Description>
  <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
    <dc:publisher>Wikipedia</dc:publisher>
  </rdf:Description>
  <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
    <dc:title>Tony Benn</dc:title>
  </rdf:Description>
  <rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
    <foaf:primaryTopic rdf:nodeID="genid1"/>
  </rdf:Description>
</rdf:RDF>

Despite containing the same information as that first form, this is clearly less human readable. The structure is no longer apparent and there’s more repetition. We’re not taking advantage of any syntactic sugar that RDF/XML provides.

Consider the <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>. With the correct namespaces defined, we can use a typed node element to write this as <foaf:Person/>. Multiple statements with the same subject can be grouped. Using Sesame, if we write that document using BufferedGroupingRDFHandler, it will sort the statements to take advantage of that syntax:

<rdf:RDF
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:foaf="http://xmlns.com/foaf/0.1/"
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
        <dc:publisher>Wikipedia</dc:publisher>
        <dc:title>Tony Benn</dc:title>
</rdf:Description>
<foaf:Person rdf:nodeID="node18qnobpi2x1">
        <foaf:name>Tony Benn</foaf:name>
</foaf:Person>
<rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
        <foaf:primaryTopic rdf:nodeID="node18qnobpi2x1"/>
</rdf:Description>

</rdf:RDF>

That’s better. We’re now taking advantage of a shorthand for the type, and we’re grouping statements about the same resource. We’re down from fifteen to ten lines of statements in the XML.

However, we’re still splitting the topic into two parts: the statement that this is the topic of the article, and the definition of the topic. We’re not taking advantage of striping, which lets us chain the first use of a resource as the object of a statement with the statements that use it as a subject.

Let’s sort topologically. That is, place all the statements about something at the point it’s first used. Essentially, where possible, we want a depth-first traversal of a tree.

<rdf:RDF
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:foaf="http://xmlns.com/foaf/0.1/"
        xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">
        <dc:publisher>Wikipedia</dc:publisher>
        <dc:title>Tony Benn</dc:title>
        <foaf:primaryTopic>
                <foaf:Person rdf:nodeID="node18qnos7a4x1">
                        <foaf:name>Tony Benn</foaf:name>
                </foaf:Person>
        </foaf:primaryTopic>
</rdf:Description>

</rdf:RDF>

Neat. In fact, it’s starting to look like the original Turtle document: this is the order you’d choose if you were authoring a document like this by hand.

Why?

I’m converting between two notations, and as part of that conversion, I create statements in an arbitrary order. Sorting them before rendering means I can separate generation and presentation and still get nice output.

Here’s RDFTripleTopologicalSorter.java and here’s an example of using it.

(Music: Future of the Left, “Donny of the Decks”)

Upgrading to Maven 3

Thursday, 26 July 2012 (#)

Jason van Zyl asks

Are there major issues anyone is having with upgrading to Maven 3.x from Maven 2.x? If there are still any blockers I'd like to fix them.

As someone who’s put some effort into upgrading to Maven 3 at work, I thought I’d mention one blocker.

Maven 1 to Maven 2 was a complete rewrite for builds — your project.xml became a pom.xml, for a clear separation. Maven 3 is largely an extension of the 2.x series and intended to be compatible. As a user, the compatibility is good, but the internals break a few things for plugin writers. And, like any users of Maven at scale, we have custom plugins.

mojo-executor was the problem. It’s a library to invoke Maven goals from within a mojo, so your build plugin can call out to other plugins for its implementation. Under Maven 2 the code to invoke Maven recursively used a PluginManager. However in Maven 3, although you can get access to an instance of PluginManager, it’s not going to help you:

    public void executeMojo( MavenProject project, MojoExecution execution, MavenSession session )
        throws MojoExecutionException, ArtifactResolutionException, MojoFailureException, ArtifactNotFoundException,
        InvalidDependencyVersionException, PluginManagerException, PluginConfigurationException
    {
        throw new UnsupportedOperationException();
    }

Under Maven 3 we need to get a BuildPluginManager instead. So, we need two paths in the code: one that tries to get a BuildPluginManager and a fallback that uses the PluginManager if that’s all we can get. So what happens when we load a mojo that uses the new class BuildPluginManager into Maven 2? It doesn’t know about it: we get a NoClassDefFoundError. Anyone who’s written Java code to deal with potentially missing APIs knows the next step: avoid static use of that class. Rather than defining a field and asking Maven to inject an instance, try to get hold of an instance at runtime with a lookup:

    /**
     * The Maven PluginManager component.
     *
     * @component
     * @required
     */
    private PluginManager pluginManager;

    public void execute() throws MojoExecutionException {
        ExecutionEnvironment env;

        try {
            Object o = mavenSession.lookup("org.apache.maven.plugin.BuildPluginManager");
            
            env = executionEnvironment(mavenProject, mavenSession, (BuildPluginManager) o);
        } catch (ComponentLookupException e) {
            env = executionEnvironment(mavenProject, mavenSession, pluginManager);
        }
...

So, if we can look up the Maven 3 version, use it. Otherwise, fall back on the Maven 2 version that was injected. This gives us a plugin that works with old and new Mavens, which is essential during a gradual migration.

That, along with any number of other tweaks, has worked well. My day-to-day Maven is mvn3 for most projects and I’m enjoying parallel builds, much better pom validation and a few years of bug fixes. It’s hugely frustrating to work around bugs that are already fixed by sticking with obsolete software. I don’t recommend it.

(Music: The Fatima Mansions, “Popemobile to Paraguay”)

<img src> SVG bugs

Monday, 20 February 2012 (#)

With constant development and releases it’s easy to forget what the bad old days of working around unfixed browser bugs were like. A great way to remind yourself is to use new features - in this case, SVG.

For compatibility with older browsers, even those with SVG support, it’s sometimes worth embedding images using <object> (as in this roundup of how to get consistent browser behaviour). But, with the current round of modern browsers, an <img> tag should work fine.

SVG’s process for determining the size an image should be rendered at is fairly involved. I should be able to leave the size in the SVG file unspecified and then specify width and height in my img tag:

Firefox is fine with that, but Safari on iOS (Mobile Safari?) gets confused — the result is distorted inconsistently:

So, let’s include a size in the underlying SVG. This won’t affect the display size, because we’re setting that in the HTML, but it works around that bug:

Problem solved!

Using <object> is a great way to get backwards compatibility. Make the SVG an object and embed an img tag pointing at a fallback bitmap:

PNG: SVG, with PNG fallback:

WebKit shows the image using the size of the underlying SVG rather than the <object> size, so you need a specific file for each size you’re using the image at. Chromium will also sometimes give you scrollbars even when the SVG and object sizes match, but hopefully that’ll be fixed somewhere in Chrome’s ruthless upgrade schedule.

(Music: Urge Overkill, “Touch to a Cut”)
Joseph Walton <joe@kafsemo.org>