yesparql

0.2.0-beta3


YeSPARQL, a Yesql inspired SPARQL library

dependencies

org.clojure/clojure
1.7.0
instaparse
1.4.1
org.apache.jena/jena-arq
3.0.0
org.apache.jena/jena-text
3.0.0
org.apache.jena/jena-core
3.0.0
org.apache.jena/jena-querybuilder
3.0.0



(this space intentionally left almost blank)
 
(ns yesparql.core
  (:require [yesparql.util :refer [slurp-from-classpath]]
            [yesparql.generate :refer [generate-var]]
            [yesparql.queryfile-parser :refer [parse-tagged-queries]]))

Most of the non-SPARQL code is directly from Yesql by Kris Jenkins

Defines several query functions, as defined in the given SPARQL file. Each query in the file must begin with a -- name: <function-name> marker, followed by optional comment lines (which form the docstring), followed by the query itself.

(defn defqueries
  ([filename]
   (defqueries filename {}))
  ([filename options]
   (doall (->> filename
             slurp-from-classpath
             parse-tagged-queries
             (map #(generate-var % options))))))
(defn defquery*
  [name filename options]
  ;;; TODO Now that we have a better parser, this is a somewhat suspicious way of writing this code.
  (doall (->> filename
            slurp-from-classpath
            (format "-- name: %s\n%s" name)
            parse-tagged-queries
            (map #(generate-var % options)))))

Defines a query function, as defined in the given SPARQL file. Any comments in that file will form the docstring.

defquery is a macro solely because of the unquoted symbol it accepts as its first argument. It is tempting to deprecate defquery. There again, it makes things so easy to get started with yesql it might be worth keeping for that reason alone.

(defmacro defquery
  ([name filename]
   `(defquery ~name ~filename {}))
  ([name filename options]
   `(defquery* ~(str name) ~filename ~options)))
 
(ns yesparql.generate
  (:require
   [clojure.set :as set]
   [clojure.string :refer [join lower-case]]
   [yesparql.util :refer [create-root-var]]
   [yesparql.sparql :as sparql]
   [yesparql.types :refer [map->Query]])
  (:import [yesparql.types Query]
           [org.apache.jena.query ParameterizedSparqlString]))
(defn statement-handler
  [^String name ^ParameterizedSparqlString query]
  (let [sparql-fn
        (cond
          (= (last name) \!) sparql/update!
          :else sparql/query)]
    (fn [connection query call-options]
      (sparql-fn connection query call-options))))
(defn- connection-error
  [name]
  (format
   (join
    "\n"
    ["No database connection supplied to function '%s',"
     "Check the docs, and supply {:connection ...} as an option to the function call, or globally to the declaration."])
   name))

Generate a function to run a query - if the name ends with ! a SPARQL UPDATE will be executed - otherwise a SPARQL QUERY will be executed.

[FOR TESTING] you can override this behavior by passing a :query-fn at call or query time. query-fn is a function with the signature [data-set pq call-options & args] and will be used instead.

(defn generate-query-fn
  [{:keys [name docstring statement]
    :as query}
   query-options]
  (assert name      "Query name is mandatory.")
  (assert statement "Query statement is mandatory.")
  (let [global-connection (:connection query-options)
        query (sparql/parameterized-query statement)
        default-handler (or (:query-fn query-options) (statement-handler name query))
        real-fn
        (fn [call-options]
          (let [handler-fn (or (:query-fn call-options) default-handler)
                connection (or (:connection call-options) global-connection)]
            (assert connection (connection-error name))
            (handler-fn connection (.copy query false) call-options)))
        [display-args generated-fn]
        (let [global-args {:keys ['connection 'bindings]}]
          [(list [] [global-args])
           (fn query-wrapper-fn
             ([] (query-wrapper-fn {}))
             ([call-options] (real-fn call-options)))])]
    (with-meta generated-fn
      (merge {:name name
              :arglists display-args
              :tag 'java.lang.AutoCloseable
              ::source (str statement)}
             (when docstring
               {:doc docstring})))))
(defn generate-var [this options]
  (create-root-var (:name this)
                   (generate-query-fn this options)))
 
(ns yesparql.instaparse-util
  (:require [instaparse.core :as instaparse])
  (:import [java.io StringWriter]))
(defn process-instaparse-result
  [parse-results context]
  (if-let [failure (instaparse/get-failure parse-results)]
    (binding [*out* (StringWriter.)]
      (instaparse.failure/pprint-failure failure)
      (throw (ex-info (.toString *out*)
                      failure)))
    (if (second parse-results)
      (throw (ex-info "Ambiguous parse - please report this as a bug at https://github.com/joelkuiper/yesparql/issues"
                      {:variations (count parse-results)}))
      (first parse-results))))
 
(ns yesparql.queryfile-parser
  (:require [clojure.java.io :as io]
            [clojure.string :refer [join trim]]
            [instaparse.core :as instaparse]
            [yesparql.types :refer [map->Query]]
            [yesparql.util :refer [str-non-nil]]
            [yesparql.instaparse-util :refer [process-instaparse-result]]))
(def parser
  (let [url (io/resource "yesparql/queryfile.bnf")]
    (assert url)
    (instaparse/parser url)))
(def parser-transforms
  {:whitespace str-non-nil
   :non-whitespace str-non-nil
   :newline str-non-nil
   :any str-non-nil
   :line str-non-nil
   :comment (fn [& args]
              [:comment (apply str-non-nil args)])
   :docstring (fn [& comments]
                [:docstring (trim (join (map second comments)))])
   :statement (fn [& lines]
                [:statement (trim (join lines))])
   :query (fn [& args]
            (map->Query (into {} args)))
   :queries list})

Parses a string with Yesparql's defqueries syntax into a sequence of maps.

(defn parse-tagged-queries
  [text]
  (process-instaparse-result
   (instaparse/transform
    parser-transforms
    (instaparse/parses parser
                       (str text "\n") ;;; TODO This is a workaround for files with no end-of-line marker.
                       :start :queries))
   {}))
 
(ns yesparql.sparql
  (:refer-clojure :exclude [update])
  (:import
   [java.lang.IllegalArgumentException]
   [java.net URL URI]
   [org.apache.jena.graph Node]
   [org.apache.jena.update
    Update UpdateAction
    UpdateFactory UpdateProcessor
    UpdateRequest UpdateExecutionFactory]
   [org.apache.jena.rdf.model Model
    StmtIterator Statement Resource Property
    RDFNode Resource Literal]
   [org.apache.jena.query Dataset]
   [org.apache.jena.sparql.core DatasetGraph]
   [org.apache.jena.sparql.resultset RDFOutput]
   [ org.apache.jena.graph Node Node_Literal Node_Blank Node_NULL Node_URI]
   [org.apache.jena.query
    Query QuerySolution QueryExecution
    QueryExecutionFactory QueryFactory QuerySolutionMap
    ParameterizedSparqlString
    ResultSetFactory ResultSet ResultSetFormatter]))
(defn ^java.io.OutputStream output-stream []
  (java.io.ByteArrayOutputStream.))

Resets a RewindableResulSet

See: ResultSetRewindable.

(defn reset-if-rewindable!
  [^ResultSet result]
  (when (instance? org.apache.jena.query.ResultSetRewindable result)
    (.reset result)))

JavaScript-ism to return nil on an empty string.

(defn falsey-string
  [str]
  (if (empty? str) nil str))

Returns a copy of a ResultSet allowing it to be re-used.

Make sure to apply this function if you intend to re-use the ResultSet after initial traversal.

See also: reset-if-rewindable!

(defn copy-result-set
  [^ResultSet result]
  (ResultSetFactory/copyResults result))

Serializes a Model to a string

See: Jena Model Write formats.

(defn serialize-model
  [^Model model ^String format]
  (with-open [w (java.io.StringWriter.)]
    (.write model w format)
    (str w)))
(defn model->rdf+xml [^Model model] (serialize-model model "RDF/XML"))
(defn model->ttl [^Model model] (serialize-model model "TTL"))
(defn model->json-ld [^Model model] (serialize-model model "JSONLD"))

Serializes a Result to a string

(defmacro serialize-result
  [method result]
  `(let [output# (output-stream)]
     (try
       (do
         (reset-if-rewindable! ~result)
         (~method ^java.io.OutputStream output# ^ResultSet ~result)
         (.toString output# "UTF-8"))
       (finally (.close output#)))))
(defn result->json [^ResultSet result] (serialize-result ResultSetFormatter/outputAsJSON result))
(defn result->text [^ResultSet result] (ResultSetFormatter/asText result))
(defn result->xml [^ResultSet result] (serialize-result ResultSetFormatter/outputAsXML result))
(defn result->csv [^ResultSet result] (serialize-result ResultSetFormatter/outputAsCSV result))
(defn result->tsv [^ResultSet result] (serialize-result ResultSetFormatter/outputAsTSV result))

Converts ResultSet to a Model.

NOTE: CONSTRUCT and DESCRIBE queries are better suited for conversion to Model.

(defn result->model
  [^ResultSet result]
  (let [^RDFOutput rdf (RDFOutput.)]
    (reset-if-rewindable! result)
    (.asModel rdf ^ResultSet result)))
(def ^Model default-model (org.apache.jena.rdf.model.ModelFactory/createDefaultModel))
(defn keyword-str [kw] (if (keyword? kw) (name kw) kw))
(defn ^Literal clj->literal
  [{:keys [value type lang]}]
  (cond
    type (.createTypedLiteral default-model value (org.apache.jena.datatypes.BaseDatatype. (str type)))
    lang (.createLiteral default-model (str value) (keyword-str lang))
    :else (.createTypedLiteral default-model value)))
(defn ^ParameterizedSparqlString parameterized-query
  [^String statement]
  (ParameterizedSparqlString. statement))

The query can be provided with a map of bindings. Each binding is a String->URL, String->URI, String->Node or String->RDFNode, or a int->URL, int->URI, int->Node or int->RDFNode, for positional parameters. Any other type (e.g. String, Float) will be set as Literal.

Alternatively, you can supply a map of {:type (optional, uri or string), :lang (optional, str or keyword), :value} which will be coerced to the appropriate Literal automatically.

Does not warn when setting a binding that does not exist.

(defn ^ParameterizedSparqlString query-with-bindings
  [^ParameterizedSparqlString pq bindings]
  (doall
   (map
    (fn [[var resource]]
      (let [subs (cond
                   (string? var) var
                   (integer? var) (int var)
                   :else
                   (throw (java.lang.IllegalArgumentException.
                           "ParameterizedSparqlString binding keys must be strings or integers")))]
        (if (map? resource)
          (.setLiteral pq subs (clj->literal resource))
          (condp instance? resource
            URL (.setIri pq subs ^URL resource)
            URI (.setIri pq subs ^String (str resource))
            Node (.setParam pq subs ^Node resource)
            RDFNode (.setParam pq subs ^RDFNode resource)
            (.setLiteral pq subs resource)))))
    bindings))
  pq)
(defn- with-type
  [f ^Node_Literal literal]
  (if-let [lang (falsey-string (.getLiteralLanguage literal))]
    {:type (.getLiteralDatatypeURI literal)
     :value (f literal)
     :lang (keyword lang)}
    {:type (.getLiteralDatatypeURI literal)
     :value (f literal)}))
(defmulti node->clj (fn [^Node_Literal literal] (.getLiteralDatatypeURI literal)))
(defmethod node->clj nil [^Node_Literal literal]
  {:value (.getLiteralValue literal)})
(defmethod node->clj :default [^Node_Literal literal]
  (try
    (with-type #(.getLiteralValue %) literal)
    (catch org.apache.jena.shared.JenaException e
      {:value (.getLiteralLexicalForm literal)
       :type (.getLiteralDatatypeURI literal)})))
(defprotocol INodeConvertible
  (convert [^Node this]))
(extend-protocol INodeConvertible
  org.apache.jena.graph.Node_Blank
  (convert [this] (.getBlankNodeId this))
  org.apache.jena.graph.Node_Literal
  (convert [this] (node->clj this))
  org.apache.jena.graph.Node_NULL
  (convert [this] nil)
  org.apache.jena.graph.Node_URI
  (convert [this] (.getURI this)))
(defn- result-binding->clj
  [^org.apache.jena.sparql.core.ResultBinding result-binding]
  (let [binding (.getBinding result-binding)]
    (into {} (map (fn [v] [(.getVarName v) (convert (.get binding v))])
                  (iterator-seq (.vars binding))))))
(deftype CloseableResultSet [^QueryExecution qe ^ResultSet rs]
  clojure.lang.Seqable
  (seq [_]
    (when-let [iseq (seq (iterator-seq rs))]
      (map result-binding->clj iseq)))
  java.lang.AutoCloseable
  (close [_] (.close qe)))

Returns the underlying QueryExecution from the query results

(defn ->query-execution
  [r] (.qe r))

Returns the underlying ResultSet from the query results

See also: copy-result-set for a re-usable ResultSet

(defn ->result
  [^ResultSet r]
  (.rs r))
(defrecord Triple [s p o])
(defn triple->clj
  [^org.apache.jena.graph.Triple t]
  (apply ->Triple (map convert [(.getSubject t) (.getPredicate t) (.getObject t)])))
(defn statement->clj
  [^Statement s]
  (triple->clj s))
(deftype CloseableModel [^QueryExecution qe ^java.util.Iterator t]
  clojure.lang.Seqable
  (seq [this]
    (when-let [iseq (seq (iterator-seq t))]
      (map statement->clj iseq)))
  java.lang.AutoCloseable
  (close [this] (.close qe)))

Generate as Model from the stream of Triple. The stream is consumed in the process, and cannot be traversed again.

NOTE: closes the underlying QueryExecution.

(defn ->model
  [^CloseableModel closeable-model]
  (with-open [model closeable-model]
    (let [^Model m
          (org.apache.jena.rdf.model.ModelFactory/createDefaultModel)
          ^java.util.List statements
          (java.util.ArrayList.
           (doall (map #(.asStatement m %) (iterator-seq (.t model)))))]
      (.add m statements)
      m)))

Returns the unconverted Jena Iterator of org.apache.jena.graph.Triple

(defn ->triples
  [^CloseableModel m]
  (.t m))
(defmulti query-exec (fn [connection _] (class connection)))
(defmethod query-exec String [connection query]
  (QueryExecutionFactory/sparqlService
   ^String connection
   ^Query query))
(defn- query-exec*
  [connection query]
  (QueryExecutionFactory/create
   ^Query query
   connection))
(defmethod query-exec Dataset [connection query] (query-exec* connection query))
(defmethod query-exec DatasetGraph [connection query] (query-exec* connection query))
(defmethod query-exec Model [connection query] (query-exec* connection query))
(defn- query-type
  [^Query q]
  (cond
    (.isSelectType q) "execSelect"
    (.isConstructType q) "execConstructTriples"
    (.isAskType q) "execAsk"
    (.isDescribeType q) "execDescribeTriples"))
(defmulti query* (fn [^QueryExecution q-exec] (query-type (.getQuery q-exec))))
(defmethod query* "execSelect" [^QueryExecution q-exec] (.execSelect q-exec))
(defmethod query* "execAsk" [^QueryExecution q-exec] (.execAsk q-exec))
(defmethod query* "execConstructTriples" [^QueryExecution q-exec] (.execConstructTriples q-exec))
(defmethod query* "execDescribeTriples" [^QueryExecution q-exec] (.execDescribeTriples q-exec))
(defn- ->execution
  [connection ^ParameterizedSparqlString pq {:keys [bindings timeout]}]
  (let [^Query q (.asQuery pq)
        ^QueryExecution query-execution (query-exec connection q)]
    (when timeout (.setTimeout query-execution timeout))
    query-execution))
(defn- set-additional-fields
  [^Query query call-options]
  (do
    (when-let [offset (:offset call-options)]
      (.setOffset query (long offset)))
    (when-let [limit (:limit call-options)]
      (.setLimit query (long limit)))
    query))

Executes a SPARQL SELECT, ASK, DESCRIBE or CONSTRUCT based on the query type against the connection. connection can be a String for a SPARQL endpoint URL or Dataset, Model, or DatasetGraph.

Returns a lazy-seq of results that can be traversed iteratively. SELECT returns a seq of ResultBindings in a native Clojure format. DESCRIBE and CONSTRUCT return a seq of Triples (s, p, o) in a native Clojure format. ASK returns a boolean.

See also: ->result (SELECT), ->model (DESCRIBE, CONSTRUCT) and ->query-execution. Or use the result->csv..., and model->json-ld convenience methods for serialization to strings.

WARNING: The underlying QueryExecution must be closed in order to prevent resources from leaking. Call the query in a with-open or close manually with (.close (->query-execution (query))).

(defn query
  [connection ^ParameterizedSparqlString pq {:keys [bindings timeout] :as call-options}]
  (let [query-execution (->execution connection (query-with-bindings pq bindings) call-options)
        query (set-additional-fields (.getQuery ^QueryExecution query-execution) call-options)
        query-type (query-type query)]
    (when-let [limit ()])
    (cond
      (= query-type "execSelect")
      (->CloseableResultSet query-execution (query* query-execution))
      (or (= query-type "execDescribeTriples") (= query-type "execConstructTriples"))
      (->CloseableModel query-execution (query* query-execution))
      :else
      (try (query* query-execution)
           (finally (.close query-execution))))))
(defmulti update-exec (fn [connection _] (class connection)))
(defmethod update-exec String [connection update]
  (UpdateExecutionFactory/createRemote update ^String connection))
(defmethod update-exec Dataset [connection update]
  (UpdateExecutionFactory/create update ^Dataset connection))
(defmethod update-exec DatasetGraph [connection update]
  (UpdateExecutionFactory/create update ^DatasetGraph connection))

Execute a SPARQL UPDATE query against the connection, returning nil if success, throw an exception otherwise. bindings will be substituted when possible, can be empty. connection can be a String for a SPARQL endpoint URL or Dataset, or DatasetGraph.

Returns nil on success, or throws an Exception.

(defn update!
  [connection ^ParameterizedSparqlString pq {:keys [bindings]}]
  (let [q (.toString (.asUpdate (query-with-bindings pq bindings)))
        ^UpdateRequest update-request (UpdateFactory/create q)
        ^UpdateProcessor processor (update-exec connection update-request)]
    (.execute processor)))
 
(ns yesparql.tdb
  (:import
   [org.apache.jena.query Dataset DatasetFactory]
   [org.apache.jena.query.text TextDatasetFactory]
   [org.apache.jena.tdb TDBFactory TDBLoader StoreConnection TDB]))

Creates a new TDB-backed Dataset in the directory (absolute path)

(defn ^Dataset create-file-based
  [^String directory]
  (TDBFactory/createDataset directory))

Creates a new TDB-backed Dataset with the provided assembler file (TTL, absolute path)

(defn ^Dataset create-assembler-based
  [assembler & {:keys [text-index]}]
  (if text-index
    (TextDatasetFactory/create assembler)
    (TDBFactory/assembleDataset assembler)))

Create an in-memory, modifiable TDB Dataset

(defn ^Dataset create-in-memory
  []
  (TDBFactory/createDataset))

Creates a bare DataSet (without TDB) Recommended for testing purposes only

(defn ^Dataset create-bare
  []
  (DatasetFactory/createMem))
 
(ns yesparql.types)
(defrecord Query
    [name docstring statement])
 
(ns yesparql.util
  (:require [clojure.java.io :as io]
            [clojure.string :as string]
            [clojure.pprint :refer [pprint]])
  (:import [java.io FileNotFoundException]))
(defn underscores-to-dashes
  [string]
  (when string
    (string/replace string "_" "-")))

Exactly like clojure.core/str, except it returns an empty string with no args (whereas str would return nil).

(defn str-non-nil
  [& args]
  (apply str "" args))

Slurps a file from the classpath.

(defn slurp-from-classpath
  [path]
  (or (some-> path
              io/resource
              slurp)
      (throw (FileNotFoundException. path))))

Given a name and a value, intern a var in the current namespace, taking metadata from the value.

TODO There may well be a built-in for this. If there is, I have not found it.

(defn create-root-var
  ([name value]
   (create-root-var *ns* name value))
  ([ns name value]
   (intern *ns*
           (with-meta (symbol name)
             (meta value))
           value)))