API Docs

Record Indexer

API for indexing of records.

class invenio_indexer.api.BulkRecordIndexer(search_client=None, exchange=None, queue=None, routing_key=None, version_type=None, record_to_index=None)[source]

Provide an interface for indexing records in Elasticsearch.

Uses bulk indexing by default.

Initialize indexer.

Parameters:
  • search_client – Elasticsearch client. (Default: current_search_client)
  • exchange – A kombu.Exchange instance for message queue.
  • queue – A kombu.Queue instance for message queue.
  • routing_key – Routing key for message queue.
  • version_type – Elasticsearch version type. (Default: external_gte)
  • record_to_index – Function to extract the index and doc_type from the record.
delete(record)[source]

Delete a record.

Parameters:record – Record instance.
delete_by_id(record_uuid)[source]

Delete record from index by record identifier.

index(record)[source]

Index a record.

The caller is responsible for ensuring that the record has already been committed to the database. If a newer version of a record has already been indexed then the provided record will not be indexed. This behavior can be controlled by providing a different version_type when initializing RecordIndexer.

Parameters:record – Record instance.
index_by_id(record_uuid)[source]

Index a record by record identifier.

Parameters:record_uuid – Record identifier.
class invenio_indexer.api.Producer(channel, exchange=None, routing_key=None, serializer=None, auto_declare=None, compression=None, on_return=None)[source]

Producer validating published messages.

For more information visit kombu.Producer.

publish(data, **kwargs)[source]

Validate operation type.

class invenio_indexer.api.RecordIndexer(search_client=None, exchange=None, queue=None, routing_key=None, version_type=None, record_to_index=None)[source]

Provide an interface for indexing records in Elasticsearch.

Bulk indexing works by queuing requests for indexing records and processing these requests in bulk.

Initialize indexer.

Parameters:
  • search_client – Elasticsearch client. (Default: current_search_client)
  • exchange – A kombu.Exchange instance for message queue.
  • queue – A kombu.Queue instance for message queue.
  • routing_key – Routing key for message queue.
  • version_type – Elasticsearch version type. (Default: external_gte)
  • record_to_index – Function to extract the index and doc_type from the record.
bulk_delete(record_id_iterator)[source]

Bulk delete records from index.

Parameters:record_id_iterator – Iterator yielding record UUIDs.
bulk_index(record_id_iterator)[source]

Bulk index records.

Parameters:record_id_iterator – Iterator yielding record UUIDs.
create_producer(*args, **kwds)[source]

Context manager that yields an instance of Producer.

delete(record, **kwargs)[source]

Delete a record.

Parameters:
delete_by_id(record_uuid, **kwargs)[source]

Delete record from index by record identifier.

Parameters:
index(record, arguments=None, **kwargs)[source]

Index a record.

The caller is responsible for ensuring that the record has already been committed to the database. If a newer version of a record has already been indexed then the provided record will not be indexed. This behavior can be controlled by providing a different version_type when initializing RecordIndexer.

Parameters:record – Record instance.
index_by_id(record_uuid, **kwargs)[source]

Index a record by record identifier.

Parameters:
mq_exchange

Message Queue exchange.

Returns:The Message Queue exchange.
mq_queue

Message Queue queue.

Returns:The Message Queue queue.
mq_routing_key

Message Queue routing key.

Returns:The Message Queue routing key.
process_bulk_queue(es_bulk_kwargs=None)[source]

Process bulk indexing queue.

Parameters:es_bulk_kwargs (dict) – Passed to elasticsearch.helpers.bulk().
record_cls

alias of Record

record_to_index(record)[source]

Get index/doc_type given a record.

Parameters:record – The record where to look for the information.
Returns:A tuple (index, doc_type).

Flask Extension

Flask exension for Invenio-Indexer.

class invenio_indexer.ext.InvenioIndexer(app=None)[source]

Invenio-Indexer extension.

Extension initialization.

Parameters:app – The Flask application. (Default: None)
init_app(app)[source]

Flask application initialization.

Parameters:app – The Flask application.
init_config(app)[source]

Initialize configuration.

Parameters:app – The Flask application.
record_to_index[source]

Import the configurable ‘record_to_index’ function.

Celery tasks

Celery tasks to index records.

invenio_indexer.tasks.delete_record(record_uuid)[source]

Delete a single record.

Parameters:record_uuid – The record UUID.
invenio_indexer.tasks.index_record(record_uuid)[source]

Index a single record.

Parameters:record_uuid – The record UUID.
invenio_indexer.tasks.process_bulk_queue(version_type=None, es_bulk_kwargs=None)[source]

Process bulk indexing queue.

Parameters:

Note: You can start multiple versions of this task.

invenio_indexer.tasks.process_bulk_queue(version_type)[source]

Process bulk indexing queue.

Parameters:

Note: You can start multiple versions of this task.

invenio_indexer.tasks.index_record(record_uuid)[source]

Index a single record.

Parameters:record_uuid – The record UUID.
invenio_indexer.tasks.delete_record(record_uuid)[source]

Delete a single record.

Parameters:record_uuid – The record UUID.

Signals

Signals for indexer.

invenio_indexer.signals.before_record_index = <blinker.base.NamedSignal object at 0x7fcb9ea29d50; 'before-record-index'>

Signal sent before a record is indexed.

The sender is the current Flask application, and two keyword arguments are provided:

  • json: The dumped record dictionary which can be modified.
  • record: The record being indexed.
  • index: The index in which the record will be indexed.
  • doc_type: The doc_type for the record.
  • arguments: The arguments to pass to Elasticsearch for indexing.
  • **kwargs: Extra arguments.

This signal also has a .dynamic_connect() method which allows some more flexible ways to connect receivers to it. The most common use case is that you want to apply a receiver only to a specific index. In that case you can call:

For more complex conditions you can provide a function via the condition_func parameter like so: