41/Job Information Service

The Flux Job Information Service provides convenient, read-only access to KVS job data for job owners.

Name

github.com/flux-framework/rfc/spec_41.rst

Editor

Jim Garlick <garlick@llnl.gov>

State

raw

Language

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Background

Job info is stored in a job-specific KVS directory as described in RFC 16. Among the keys stored for each job are:

J

signed jobspec submitted by the user (RFC 15)

jobspec

unwrapped jobspec from J (RFC 25), potentially modified at ingest

eventlog

primary job eventlog (RFC 21)

R

resource set allocated to job (RFC 20)

guest.exec.eventlog

exec system eventlog

guest.input

job input (RFC 24)

guest.output

job output (RFC 24)

While the job is running, the guest key in the job directory is a symbolic link to a private KVS namespace that may be read or written by the job owner. All KVS accesses from the job are redirected to the private namespace. This supports the job shell running as the guest user, and also allows job applications and user-invoked tools to use the KVS.

Once the job becomes inactive, the private namespace is deleted and the guest link becomes a directory in the primary KVS namespace that contains a snapshot of the private namespace’s content. At this point only the instance owner may directly access this information.

The main purpose of the job info service is to give job owners convenient, read-only access to the data in their KVS job directory.

Goals

  • Provide read-only access to all keys in a given KVS job directory.

  • Restrict access to the job owner and the Flux instance owner.

  • Provide scalability and performance comparable to direct KVS access.

  • Hide the complexity of watching eventlogs across the guest transition described above.

Implementation

The job info service SHOULD be distributed across all broker ranks to avoid creating a bottleneck at the leader broker.

Job information requests below are for a single job ID. If a request was not sent by the instance owner or the job owner, it SHALL fail with error 1, “Operation not permitted” (EPERM).

Internally, the job information service MAY determine the job owner by fetching the primary job eventlog and reading the userid from the submit event context.

Lookup

The job-info.lookup RPC fetches one or more job info keys from the KVS.

If a failure occurs while looking up any of the keys, the entire request SHALL fail.

The RPC payloads are defined as follows:

job-info.lookup request

The request SHALL consist of a JSON object with the following keys:

id

(integer, REQUIRED) The job id.

keys

(array of string, REQUIRED) List of keys.

Keys SHALL be specified relative to the job directory, and SHALL use period . as the path delimiter.

flags

(integer, REQUIRED) A bitfield comprised of zero or more flags:

json_decode (1)

For lookups of R or jobspec, return the field as a decoded JSON object instead of a string. This flag has no effect on other keys.

current (2)

For lookups of R or jobspec, return the current version. The current version SHALL be computed by applying any resource-update or jobspec-update events that have been posted to the job eventlog, as described in RFC 21.

job-info.lookup response

The response SHALL consist of a JSON object with the following keys:

id

(integer, REQUIRED) The job id from the request.

keys...

Additional keys correspond to the keys in the request.

Values are the KVS values associated with the keys. Values are encoded as strings, except in special cases indicated by flags in the request.

Eventlog Watch

The job-info.eventlog-watch streaming RPC tracks events posted to an RFC 18 eventlog.

The RPC stream SHALL be terminated with error 61, “No data available” (ENODATA) when one of the following conditions is met:

  • The RPC is canceled with a job-info.eventlog-watch-cancel request.

  • The job becomes inactive

  • A context-specific terminating event is posted to the eventlog:

eventlog

clean

guest.exec.eventlog

done

guest.output

data with eof=true on all streams and all ranks

guest.input

data with eof=true

The RPC payloads are defined as follows:

job-info.eventlog-watch request

The request SHALL consist of a JSON object with the following keys:

id

(integer, REQUIRED) The job id.

path

(string, REQUIRED) The eventlog key.

The key SHALL be specified relative to the job directory, and SHALL use period . as the path delimiter.

flags

(integer, REQUIRED) A bitfield comprised of zero or more flags:

waitcreate (1)

If key does not exist yet, wait for its creation before responding.

job-info.eventlog-watch response

Each non-error response SHALL consist of a JSON object with the following keys:

event

(string, REQUIRED) Exactly one RFC 18 eventlog entry, including trailing newline.

job-info.eventlog-watch-cancel request

Cancel a job-info.eventlog-watch request, as described in RFC 6.

matchtag

(integer, REQUIRED) The matchtag of the request to be canceled.