kafsemo.org

Larry Sanders and Dr. Katz (SPARQL and Mivvi)

2006-01-23

Dr. Katz: Professional Therapist (coming to DVD this year?) and The Larry Sanders Show were both great '90s US comedies, albeit totally different in style. One similarity was the eclectic mix of comedians and actors as guests, a time capsule of significant performers; so who appeared on both shows?

The current data for Mivvi (introduction) includes, for some series, exactly this information, scraped from different sources but using common IMDb URIs for people.

SPARQL is an RDF query language, currently being prepared by the W3C’s RDF Data Access Working Group. (Disambiguation: Sparql is also the name of Danny Ayers’ cat.) The language is still under development, but there are many implementations of the various drafts. I chose Rasqal, with its Roqet front-end (available as rasqal-utils in Debian).

SPARQL Query

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX mvi: <http://mivvi.net/rdf#>
SELECT ?c, ?title, ?episode1, ?larryTitle, ?episode2, ?katzTitle
FROM <a.rdf>
WHERE {
	<http://en.wikipedia.org/wiki/The_Larry_Sanders_Show#> mvi:seasons ?s1.
	?s1 ?w ?season1.
	?season1 mvi:episodes ?es1.
	?es1 ?x ?episode1.
	?episode1 dc:contributor ?c.

	?episode1 dc:title ?larryTitle.

	<http://www.sassman.com/katz/#> mvi:seasons ?s2.
	?s2 ?y ?season2.
	?season2 mvi:episodes ?es2.
	?es2 ?z ?episode2.
	?episode2 dc:contributor ?c.

	?episode2 dc:title ?katzTitle.

	?c dc:title ?title.
}

The intent should be apparent, if not the syntax: find any chain, from season to episode, for both series. By requiring the same contributor, ?c, for both series, we will only get results where the same person appeared in both series. The output will be the variables that satisfied the match.

(You could also use inference to bring down an mvi:series predicate for each episode. This would make the query far simpler, at the expense of adding an extra processing step or requiring an RDF store with inference.)

The version of Rasqal I was using had no support for multiple FROM graphs, so I merged the RDF ahead of time (cwm dapcentral/dr-katz.rdf epguides/the-larry-sanders-show.rdf extras/the-larry-sanders-show_guests.rdf >a.rdf). SPARQL doesn’t appear to support rdf:Seq, so the single-letter dummy variables (w, x, y, z) are used as an approximation of rdf:_[0-9]+ to mean any indexed member of a sequence.

Presentation

SPARQL queries can result in tabular data or RDF graphs. For this query, and to present with XSLT, neither is perfect. Fresnel looks like it might be worth investigation but, for now, let’s go with tabular XML output (roqet -r xml-v1 multiple-appearances.sparql >contributors2.xml) and a whole load of XSLT munging.

Results

Performer The Larry Sanders Show Dr. Katz
Al Franken The Roast
Sharon Meyers
Andy Kindler Conflict of Interest Family Car
New Phone System
Mourning Person
Ben Stiller Make a Wish Ticket
Bob Goldthwait Life Behind Larry
Like No Business I Know
Studio Guy
Catherine O'Hara Talk Show Bakery Ben
Dave Chappelle Pilots and Pens Lost Electric Bike
David Duchovny The Bump
Everybody Loves Larry
Flip
Metaphors
Jake Johannsen Where Is the Love? Day Planner
Expert Witness
Jeff Goldblum Nothing Personal
Just the Perfect Blendship
Sissy Boy
Jon Stewart Everybody Loves Larry
The Roast
Another List
Flip
The Beginning of the End
Adolf Hankler
Guess Who
Walk for Hunger
Kevin Nealon Life Behind Larry
Larry's Sitcom
The New Writer
Earring
Larry Miller I Buried Sid Everybody's Got a Tushy
Richard Lewis Life Behind Larry
Undercover
Sandra Bernhard Larry's on Vacation
Arthur After Hours
A Journey for the Betterment of People
Steven Wright Life Behind Larry
Artie's Gone
Beverly's Secret
Bystander Ben
Mask
Teri Garr The Breakdown (2)
Pullman Square
Wendy Liebman Next Stop Bottom
Pretzelkins
Chain Letter
Winona Ryder Another List
Monte Carlo

(Due to methodology, none of the principals were included: both Jonathan Katz and Garry Shandling guested on each other’s shows, and Janeane Garofalo and Sarah Silverman were Sanders cast members who appeared on Katz.)

Conclusions

Jon Stewart and Steven Wright were the most significant cultural figures of 1990s television comedy. (Both also appeared in The Aristocrats; it’s no They Rule, but you might want to cross-reference that cast list.)

SPARQL is here and it works. Most RDF repositories had their own proprietary query languages before, but standardisation should make it easier to move between implementations.

The boundary between RDF and HTML still feels like an impedence mismatch at the structural level. I’m not sure which side needs to move, or if there’s simply a better approach that I’ve missed. It’s always possible to write code to get the presentation you need, but rarely desirable.

(Music: Paul Simon, “Graceland”)
(More from this year, or the front page? [K])