icat.dumpfile — Backend for icatdump and icatingest

This module provides the base classes icat.dumpfile.DumpFileReader and icat.dumpfile.DumpFileWriter that define the API and the logic for reading and writing ICAT data files. The actual work is done in file format specific backend modules that should provide subclasses that must implement the abstract methods.

class icat.dumpfile.DumpFileReader(client, infile)

Bases: object

Base class for backends that read a data file.

Parameters:
  • client (icat.client.Client) – a client object configured to connect to the ICAT server that the objects in the data file belong to. This client will be used among others to instantiate the objects read from the file and to search for related objects.

  • infile – the data source to read the objects from. It depends on the backend which kind of data source they accept. Most backends will at least accept a file object opened for reading or a Path or a str with a file name.

Changed in version 1.0.0: the infile parameter also accepts a Path object.

mode = 'r'

File mode suitable for the backend.

Subclasses should override this with either “rt” or “rb”, according to the mode required for the backend.

getdata()

Iterate over the chunks in the data file.

Yield some data object in each iteration. This data object is specific to the implementing backend and should be passed as the data argument to getobjs_from_data().

This abstract method must be implemented in the file format specific backend.

getobjs_from_data(data, objindex)

Iterate over the objects in a data chunk.

Yield a new entity object in each iteration. The object is initialized from the data, but not yet created at the client.

This abstract method must be implemented in the file format specific backend.

getobjs(objindex=None)

Iterate over the objects in the data file.

Yield a new entity object in each iteration. The object is initialized from the data, but not yet created at the client.

Parameters:

objindex (dict) – a mapping from keys to entity objects, see icat.client.Client.searchUniqueKey() for details. This serves as a cache of previously retrieved objects, used to resolve object relations. If this is None, an internal cache will be used that is purged at the start of every new data chunk.

class icat.dumpfile.DumpFileWriter(client, outfile)

Bases: object

Base class for backends that write a data file.

Parameters:
  • client (icat.client.Client) – a client object configured to connect to the ICAT server to search the data objects from.

  • outfile – the data file to write the objects to. It depends on the backend what they accept here. Most backends will at least accept a file object opened for writing or a Path or a str with a file name.

Changed in version 1.0.0: the outfile parameter also accepts a Path object.

mode = 'w'

File mode suitable for the backend.

Subclasses should override this with either “wt” or “wb”, according to the mode required for the backend.

head()

Write a header with some meta information to the data file.

This abstract method must be implemented in the file format specific backend.

startdata()

Start a new data chunk.

If the current chunk contains any data, write it to the data file.

This abstract method must be implemented in the file format specific backend.

writeobj(key, obj, keyindex)

Add an entity object to the current data chunk.

This abstract method must be implemented in the file format specific backend.

finalize()

Finalize the data file.

This abstract method must be implemented in the file format specific backend.

writeobjs(objs, keyindex, chunksize=100)

Write some entity objects to the current data chunk.

The objects are searched from the ICAT server. The key index is used to serialize object relations in the data file. For object types that do not have an appropriate uniqueness constraint in the ICAT schema, a generic key is generated. These objects may only be referenced from the same chunk in the data file.

Parameters:
  • objs (icat.query.Query or str or list) –

    query to search the objects, either a Query object or a string. It must contain an appropriate include clause to include all related objects from many-to-one relations. These related objects must also include all informations needed to generate their unique key, unless they are registered in the key index already.

    Furthermore, related objects from one-to-many relations may be included. These objects will then be embedded with the relating object in the data file. The same requirements for including their respective related objects apply.

    As an alternative to a query, objs may also be a list of entity objects. The same conditions on the inclusion of related objects apply.

  • keyindex (dict) – cache of generated keys. It maps object ids to unique keys. See the icat.entity.Entity.getUniqueKey() for details.

  • chunksize (int) – tuning parameter, see icat.client.Client.searchChunked() for details.

writedata(objs, keyindex=None, chunksize=100)

Write a data chunk.

Parameters:
icat.dumpfile.Backends = {}

A register of all known backends.

icat.dumpfile.register_backend(formatname, reader, writer)

Register a backend.

This function should be called by file format specific backends at initialization.

Parameters:
icat.dumpfile.open_dumpfile(client, f, formatname, mode)

Open a data file, either for reading or for writing.

Note that depending on the backend, the file must either be opened in binary or in text mode. If f is a file object, it must have been opened in the appropriate mode according to the backend selected by formatname. The backend classes define a corresponding class attribute mode. If f is a file name, the file will be opened in the appropriate mode.

The subclasses of icat.dumpfile.DumpFileReader and icat.dumpfile.DumpFileWriter may be used as context managers. This function is suitable to be used in the with statement.

>>> with open_dumpfile(client, f, "XML", 'r') as dumpfile:
...     for obj in dumpfile.getobjs():
...         obj.create()
Parameters:
  • client (icat.client.Client) – the ICAT client.

  • f – the object to read the data from or write the data to, according to mode. What object types are supported depends on the backend. All backends support at least a file object or the name of file. The special value of “-” may be used as an alias for sys.stdin or sys.stdout.

  • formatname (str) – name of the file format that has been registered by the backend.

  • mode (str) – either “r” or “w” to indicate that the file should be opened for reading or writing respectively.

Returns:

an instance of the appropriate class. This is either the reader or the writer class, according to the mode, that has been registered by the backend.

Raises:

ValueError – if the format is not known or if the mode is not “r” or “w”.