Storage plugins / Drivers#
Content#
Introduction#
A storage is in DLite an abstract concept that represent a generic data source or sink. It can be a file on disk, a local database or a database accessed via a web interface. Loading data from a storage into an instance and saving it back again is a key mechanism for interoperability at a syntactic level.
DLite provides a plugin system that makes it easy to connect to new data sources via a common interface (using a strategy design pattern).
Opening a storage takes in the general case four arguments, a protocol
name identifying the protocol plugin to use for data transfer, a driver
name identifying the storage plugin to use for parsing/serialisation, the location
of the storage and a set of storage-specific options
.
The protocol
argument was introduced in v0.5.22 to provide a clear separation between transfer of raw data from/to the storage and parsing/serialisation of the data.
Storage plugins can be categorised as either generic or specific.
A generic storage plugin can store and retrieve any type of instance and metadata while a specific storage plugin typically deals with specific instances of one type of entity.
DLite comes with a set of generic storage plugins, like json, yaml, rdf, hdf5, postgresql and mongodb.
It also comes with a specific Blob
and Image
storage plugin, that can load and save instances of http://onto-ns.com/meta/0.1/Blob
and http://onto-ns.com/meta/0.1/Image
, respectively.
Storage plugins can be written in either C or Python.
How to make storage plugins available#
As described below it is possible (and most often advisable) to create specific drivers (storage plugins) for your data.
Additional storage plugins drivers can be made available by setting the environment variables
DLITE_STORAGE_PLUGIN_DIRS
or DLITE_PYTHON_STORAGE_PLUGIN_DIRS
e.g.:
export DLITE_STORAGE_PLUGIN_DIRS=/path/to/new/folder:$DLITE_STORAGE_PLUGIN_DIRS
Within python, the path to the directory containing plugins can be added as follows:
import dlite
dlite.python_storage_plugin_path.append("/path/to/plugins/dir")
Often drivers are connected to very specific datamodel (entities).
DLite will find these datamodels if the path to their directory is set with the
environment variable DLITE_STORAGES
or added within python with dlite.storage_path.append
similarly to described above for drivers.
Attention
Often, during development dlite will fail unexpectedly. This is typically either because of an error in the
datamodel or the driver.
The variable DLITE_PYDEBUG can be set as export DLITE_PYDEBUG=
to get python debugging information.
This will give information about the driver.
It is advisable to first check that the datamodel is valid with the command dlite-validate datamodelfilename
.
Using storages implicitly from Python#
For convenience DLite also has an interface for creating storages implicitly.
Loading an instance#
If you only want to load a single instance from a storage, you can use one of the following class methods:
dlite.Instance.load()
: load from a location using a specific protocoldlite.Instance.from_location()
: load from a locationdlite.Instance.from_url()
: load from URLdlite.Instance.from_bytes()
: load from a bufferdlite.Instance.from_dict()
: load from Python dictdlite.Instance.from_json()
: load from a JSON stringdlite.Instance.from_bson()
: load from a BSON stringdlite.Instance.from_storage()
: load from a storagedlite.Instance.from_metaid()
: create a new empty instance (not loading anything)
For example
>>> import dlite
>>> newinst = dlite.Instance.from_location("json", "newfile.json", id="ex:blob1")
>>> newinst.uri
'ex:blob1'
To load a YAML file from a web location, you can combine the http
protocol plugin with the yaml
storage plugin using dlite.Instance.load()
:
>>> url = "https://raw.githubusercontent.com/SINTEF/dlite/refs/heads/master/storages/python/tests-python/input/test_meta.yaml"
>>> dlite.Instance.load(protocol="http", driver="yaml", location=url)
Saving an instance#
Similarly, if you want to save an instance, you can use the following methods:
dlite.Instance.save()
: save to location, optionally using a specific protocol. Can also take an opendlite.Storage
object or an URL as argument.dlite.Instance.to_bytes()
: returns the instance as a bytes objectdlite.Instance.asdict()
: returns the instance as a python dictdlite.Instance.asjson()
: returns the instance as a JSON stringdlite.Instance.asbson()
: returns the instance as a BSON string
For example, saving newinst
to BSON, can be done with:
>>> newinst.save("bson", "newinst.bson", options="mode=w")
Working with storages in Python#
For instance, to create a new file-based JSON storage can in Python be done with:
>>> s = dlite.Storage("json", location="newfile.json", options="mode=w")
where the first argument is the name of the driver to use. It is also possible to combine the three arguments into a URL as in
>>> s = dlite.Storage("json://newfile.json?mode=w")
Note that the URL form is less flexible, since the scheme part identifies the driver and can therefore not be used in the location of an external storage.
The options are plugin specific, but most file-based plugins take a mode
option, with the following three values:
r
: Open existing file for read-only.w
: Truncate existing file or create new file.a
: Append to existing file or create new file.
The values of driver
, location
and options
are available as attributes:
>>> s.driver
'json'
>>> s.location
'newfile.json'
>>> s.options
'mode=w'
Currently, the Python interface does not implement a close()
method to avoid risking unexpected crashes if one tries to access a closed storage object.
If you want to ensure a storage is closed (and possible buffered data is committed), you can delete it:
>>> del s
Alternatively, you can open a storage in a with
-statement.
This is the preferred way, since it ensures that the storage is closed after use.
>>> with dlite.Storage("json", "newfile.json", options="mode=w") as s:
... ... # doctest: +SKIP
Instances can be stored using the save()
method:
# Create some Blob data instances
>>> Blob = dlite.get_instance('http://onto-ns.com/meta/0.1/Blob')
>>> blob1 = Blob(dims={"n": 2}, id="ex:blob1")
>>> blob2 = Blob(dims={"n": 5}, id="ex:blob2")
# Add the instances to the storage
# The option "mode=w" create a new JSON file (or overwrite it if it already exists)
# The option "single=no" make sure that we store in a format that can accomodate multiple instances.
>>> with dlite.Storage("json", location="newfile.json",
... options="mode=w;single=no") as s:
... s.save(blob1)
... s.save(blob2)
... s.save(Blob) # entities can also be saved
Note that we in the above example also store the Blob entity to the storage.
That can be a good idea if you don’t have a proper metadata storage, since DLite needs the metadata in order to instantiate an instance.
When loading an instance and the metadata is not already in memory, DLite will first look for the metadata in the current and all other open storages before checking dlite.storage_path
.
The instances can be loaded back from the storage with the load()
method:
>>> with dlite.Storage("json", "newfile.json") as s:
... inst1 = s.load(id="ex:blob1")
Note that we in the above example have omitted the options
argument to dlite.Storage
, since default is mode=r
.
We also have to provide an id
argument to load()
, to specify which of the instances in the storage we want to load.
We can easily see that the loaded instance has the same UUID as the blob we stored:
>>> inst1.uuid == blob1.uuid
True
In fact, even though blob1
and inst1
are different python objects, they share the same underlying memory:
>>> id(inst1) == id(blob1)
False
>>> blob1.content
array([0, 0], dtype=uint8)
>>> inst1.content = b'\x01\x08'
>>> blob1.content
array([1, 8], dtype=uint8)
Writing Python storage plugins#
Storage plugins can be written in either C or Python.
In Python the storage plugin should be a Python module defining a subclass of dlite.DLiteStorageBase
defining one or more of the following methods:
open(): Open and initiate the storage. Required if any of load(), save(), delete(), query(), flush(), close() are defined.
load(): Load an instance from the storage and return it.
save(): Save an instance to the storage.
delete(): Delete and an instance from the storage.
query(): Query the storage for instance UUIDs.
flush(): Flushed cached data to the storage.
close(): Close the storage.
from_bytes(): Class method that loads an instance from a buffer.
to_bytes(): Class method that saves an instance to a buffer.
A plugin can be defined with only the from_bytes
and to_bytes
methods. This has the advantage that only serialisation/deserialisation of the data and datamodels is considered in the actual plugin. In such cases Instances must
be instantiated with the dlite.Instance.load method, and the protocol plugin must be given.
All methods are optional. You only have to implement the methods providing the functionality you need. See the Python storage plugin template for how to write a Python storage plugin. A complete example can be found in the Python storage plugin example.
In order for DLite to find the storage plugin, it should be in the search path defined by the DLITE_PYTHON_STORAGE_PLUGIN_DIRS
environment variable or from Python, in dlite.python_storage_plugin_path
.
Note
Prior to DLite v0.5.23 all storage plugins were executed in the same scope. This could lead to confusing and hard-to-find bugs due to interference between your plugins.
However, since DLite v0.5.23, plugins are evaluated in separate scopes (which are available in `dlite._plugindict).
Distributing storage plugins#
If you created a python package with new storage plugins, you can make them available for users of your package by creating entry points with the paths to the storage plugins.
Add the following section to your pyproject.toml
:
[project.entry-points.dlite]
python_storage_plugins = "path/to/python_storage_plugins/directory"
See Search paths for more information about entry points.
Working with storages from C and Fortran#
The C API for storages is documented in the C reference manual. Conceptually it is similar to the Python interface, except that everything is implemented as functions. For an example, see ex1.
The Fortran interface relies heavily on code generation. An example is available in ex4.