Skip to content

Conversation

ewinston
Copy link
Contributor

@ewinston ewinston commented Apr 16, 2018

This PR is to rework the submission and management of jobs.

Description

With PR#376, backend rework, we would like a job instance to be returned in a call to the backend's run method;
job = backend.run(qobj)

This call is meant to be non-blocking. The job object can be queried for status and results;

job.result()
job.cancel()
job.status
job.running
job.done
job.cancelled

Since the implementation of some of these methods may be backend dependent this PR requires the backend to define it's job class which derives from BaseJob. The reference implementations for local and ibmq backends are simple and currently use futures.

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@ewinston ewinston changed the title rework jobs [WIP] rework jobs Apr 16, 2018
@diego-plan9
Copy link
Member

Can you please rebase against current master, or a similar measure (it might be easier to just cherry-pick the new commits onto a new clean branch)? The PR is currently showing a lot of changed files.

needed to comment out async methods and associated tests.
def __init__(self):
"""Initializes and initates the asynchronous job"""
pass

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also have methods to get back job_id and backend that runs the job?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we really need to expose the ID, maybe a better place could be our own implementation (IBMQJob) and not pollute the BaseJob class. If we find out that third-party implementers also need to expose the ID, then yeah, let's move it to the BaseJob class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it's important that we try to keep the BaseJob as generic as possible - during our meeting, my mention of including id as a reachable property was aimed at the IBMQ* level, and provided that we had a clear current use case were users needed access to it (which I thought we did - but actually, it would be great to have a solid example of it).

Can we keep BaseJob id-agnostic as possible, following the idea that it should be as less restrictive and away from implementation details as possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sounds fine to me. Currently the spec says the ID is reported in the status dictionary. I suppose IBMQJob could add an additional method to have id or get_id but am somewhat concerned about adding methods which because of their common usage, are assumed to exist for all backends.

IBMQJob (BaseJob)
"""
return IBMQJob(self.run_job, q_job, self._api)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a method get_jobs() to return all jobs for this backend (for the authenticated user)?

# self._run_internal(qobj_list,
# wait=wait,
# timeout=timeout,
# callback=callback)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just remove the code, thanks to git history there's no need to comment code anymore :)

def __init__(self):
"""Initializes and initates the asynchronous job"""
pass

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we really need to expose the ID, maybe a better place could be our own implementation (IBMQJob) and not pollute the BaseJob class. If we find out that third-party implementers also need to expose the ID, then yeah, let's move it to the BaseJob class.

pass

@abstractmethod
def running(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the status() method enough to know whether the job is running or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Futures implements running so it is simple to propagate. Currently I populate status with that method.

def running(self):
return self._future.running()

def add_done_callback(self, fn):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm that might be useful for local jobs... but moving to a future based async model may render this method useless. Do we have a use-case for this method?

def running(self):
return self._future.running()

def add_done_callback(self, fn):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: Do we really need this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's best to keep it simple for now and not use it. There was once a case where I wanted to time individual jobs on a simulator but there are other ways.

@atilag
Copy link
Member

atilag commented Apr 17, 2018

This is great! I love moving from callback based async approach to a future based async approach!!

@jaygambetta jaygambetta mentioned this pull request Apr 18, 2018
77 tasks
@ewinston
Copy link
Contributor Author

ewinston commented Apr 19, 2018

If we have a backend method for job should it return instance or string? Alternatively it could be returned in the status method dictionary.

@ajavadia
Copy link
Member

I think instance, since these are qiskit components talking to each other.

@dcmckay
Copy link

dcmckay commented Apr 19, 2018

@atilag The use case for a job id is if the user wants to retrieve a job in the future and this would be a backend unique identifier for that job. Also this will go into the results data structure and so be a way for users to link results back to a specific execution.

@dcmckayibm
Copy link
Member

dcmckayibm commented Apr 19, 2018

@atilag The use case for a job id is if the user wants to retrieve a job in the future and this would be a backend unique identifier for that job. Also this will go into the results data structure and so be a way for users to link results back to a specific execution. (reply to this one)

Is is too much of a constraint from a general backend to provide a unique id? It doesn't seem like it would be.

@ewinston
Copy link
Contributor Author

@ajavadia If we have a jobs method for basebackend then should it return job instances instead of strings?

@ajavadia
Copy link
Member

ajavadia commented Apr 19, 2018

@ewinston i think so. this is similar to provider.available_backends() which returns Backend instances

*(btw i think that method should be provider.backends()... available_backends might imply the backend is not offline, which it might be)

@dcmckayibm
Copy link
Member

@ajavadia @ewinston If .backends returns backend/job instances does that imply all backends/jobs are instantiated when qiskit is started? In particular for the .jobs for a remote object that could be a lot of overhead.

I think there should be some string type id to uniquely label all backends and jobs separate from the apparatus of objects (?), but in particular for jobs.

@ajavadia
Copy link
Member

@dcmckayibm I think returning objects avoids the overhead of translating between string names and instances by the user. Objects are easier to work with. You get it and call .status() or whatever on it.

1- For backends, it is safe to assume there aren't hundreds of them. So if you look at the current implementation, the list of backend instances is created at the time of provider authentication, and maintained thereafter (the API call to _discover_remote_backends is made during provider __init__).

2- You might be right that for jobs, we may not want to carry this list of job instances through. In that case:
a) I think the API call to get jobs should be made inside .jobs() itself, and not during the backend __init__, contrary to 1.
b) We can return a generator, which is evaluated lazily. But I would do this only if returning jobs turns out to be prohibitive (avoid premature optimization!). See #371 for an example of when this is really useful.

@dcmckayibm
Copy link
Member

dcmckayibm commented Apr 20, 2018

@ajavadia I guess I'm more thinking from a data structures perspective. Consider I have a qobj or result structure (and have lost the objects,e.g., these are from disk) and in those I only have string identifiers to the backend and/or job how do I "grab" those objects again?

Maybe the thing I'm advocating isn't' necessarily that .jobs .backends return strings or objects, but that that at least there is still a .get_backend(id) .get_job(id)

@atilag
Copy link
Member

atilag commented Apr 20, 2018

@dcmckay

The use case for a job id is if the user wants to retrieve a job in the future

Sure, but for me the job_id is an implementation detail that we could totally abstract from the user, so he doesn't need no know anything about IDs. We abstract it through the Job class interface. A user would only use one interface, the one from the Job class, to play with all the possible backends without knowing anything about their internals, and we can provide them with methods to serialize/deserialize the entire Job instance.
For example:

job = ibmqx4.run(qobj)
if not job.done(): # <-- this will internally use the ID to check the status of the Job
    # Is not ready yet! So I could for example, serialize and persist it on disk
    user_function_save_to_disk(qiskit.utils.serialize(job), '/path/to/result-serialized.json')
else:
    # Is ready! Let's print the results
    print(job.result.get_counts('Bell'))

So let's say that the job was not ready and we have serialized it to disk, we can load it from disk again and rebuild an instance of our Job instance, by deserializing:

job = qiskit.utils.deserialize(user_function_load_from_disk('/path/to/result-serialized.json'))
if job.done():
    print(job.result.get_counts('Bell'))

The benefits of this approach are:

  • Users don't have to worry about backend implementation details: No matter how the backend manage jobs internally, they can use IDs, URLs, or whatever other method they may use to query jobs. The Job implementation will take care of it.
  • Homogeneous interface for local and remote jobs: As local jobs don't have an ID.

Is is too much of a constraint from a general backend to provide a unique id? It doesn't seem like it would be.

I'd say that you are right, and sounds legit to me, but I wouldn't like to make assumptions without having any other experience on backend implementation aside from our own backends.

As time is a constraint, and we don't have a serializer/deserializer in place, my suggestion would be to move the id() method to our Job implementation, instead of having it on the BaseJob one.

def status(self):
"""
Returns:
str: 'ERROR', 'QUEUED', 'RUNNING', 'CANCELLED', or 'DONE'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are at it, can we introduce something a bit more robust than plain strings, such as Enum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the spec describes status as a dictionary which I'm changing this to. The status dictionary looks like,

    @property
    @abstractmethod
    def status(self):
        """
        Returns:
            dict: {'job_id': <job_id>,
                   'status': 'ERROR'|'QUEUED'|'RUNNING'|'CANCELLED'|'DONE',
                   'status_msg': <str>
        """
        pass

I suppose that status key could be Enum however that could be a hassle if we need to serialize it at some point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enums have some advantages over strings in this kind of contexts:

  • It's safer and less prone to user typing errors.
  • IDEs usually have a proper support.
  • If there's a need to change the value of the Enum, you just need to change it in one place.
  • Less memory usage (not a big deal for this class, though)

I wouldn't worry to much about the serialization, depending on the type of serialization, the Enum will turn into a string or a binary value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

status is an enum in the schema (as of now)

Copy link
Member

@dcmckayibm dcmckayibm Apr 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atilag the thing about the job_id is that the physical thing that runs and the result it produces are not necessarily a part of qiskit. Technically some one could use a different program to run a job and how would we be able to grab the results (or vice-versa)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The majority of users will never leave the qiskit ecosystem so they should be abstracted away from job_id's as much as possible, but I just want it exposed for these edge cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcmckayibm Doesn't exposing it in just ibmq jobs satisfy the edge case but also not force people who don't need it from being forced to implement it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any remote job will need a job_id, maybe not local simulators (but I could still think of cases, e.g., local simulators with threading that could make use of a job_id)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to leave it out of the qiskit architecture that could be ok, but it will need to be in the result data structure that is returned by the job

@ajavadia
Copy link
Member

ajavadia commented May 8, 2018

@ewinston Thanks Erick, I think this is very close to being done. I only have a couple more comments:

  1. When I run a job on local simulator, the job.status always seems to be 'QUEUED', even when the result is ready. Can you check if this is really the case, and if it can be fixed?
  2. As part of BaseJob, can we have self._backend and a method to get the backend it is running on as well? It should be straight forward to populate this field at the time of backend.run(), and I think it is useful to know the backend that the job is running on.
  3. I have a comment above about some remaining timeout fields in the QuantumJob, that should be cleaned up.

@ajavadia
Copy link
Member

ajavadia commented May 8, 2018

Awesome. It looks ready to merge to me!

@ajavadia
Copy link
Member

ajavadia commented May 8, 2018

@ewinston: just a minor thing: when i submit a job to ibmqx4, and monitor job.status.. it goes from QUEUED to None before returning result. Is this an API issue? It might be in the QISKit code, because I think this morning I was seeing RUNNING before completion.

Also my queue position seems to be constant. I don't know if you see a similar thing.

@ewinston
Copy link
Contributor Author

ewinston commented May 8, 2018

@ajavadia The temporary status of None you saw could have been related to a case I didn't think was possible and didn't catch.

When I monitored the queue position of a job it seemed to be not changing but was actually changing really slowly.

Also, I'm working on another branch of this one which allows submitting ibmq jobs faster. It's on my fork on "ibmq_future" branch. What caveat of it is that an ibmq job can briefly not have a job id.

@ajavadia
Copy link
Member

ajavadia commented May 9, 2018

@ewinston I think maybe None is supposed to be RUNNING actually? Because I just see QUEUED, None and COMPLETED. Here is one output I just got... I query the status every 3 seconds:

Status @ 0 seconds
{'job_id': 'c58ca262b54d480b128435d0c315a34f', 'queue_position': 3, 'status': <JobStatus.QUEUED: 'job is queued'>, 'status_msg': None}
Status @ 3 seconds
{'job_id': 'c58ca262b54d480b128435d0c315a34f', 'queue_position': 3, 'status': <JobStatus.QUEUED: 'job is queued'>, 'status_msg': None}
Status @ 6 seconds
{'job_id': 'c58ca262b54d480b128435d0c315a34f', 'queue_position': 3, 'status': <JobStatus.QUEUED: 'job is queued'>, 'status_msg': None}
Status @ 9 seconds
{'job_id': 'c58ca262b54d480b128435d0c315a34f', 'queue_position': 3, 'status': <JobStatus.QUEUED: 'job is queued'>, 'status_msg': None}
Status @ 12 seconds
{'job_id': 'c58ca262b54d480b128435d0c315a34f', 'queue_position': 3, 'status': <JobStatus.QUEUED: 'job is queued'>, 'status_msg': None}
Status @ 15 seconds
{'job_id': 'c58ca262b54d480b128435d0c315a34f', 'status': None, 'status_msg': None}
Status @ 18 seconds
{'job_id': 'c58ca262b54d480b128435d0c315a34f', 'status': None, 'status_msg': None}
Status @ 21 seconds
{'job_id': 'c58ca262b54d480b128435d0c315a34f', 'status': None, 'status_msg': None}
Status @ 24 seconds
{'job_id': 'c58ca262b54d480b128435d0c315a34f', 'status': None, 'status_msg': None}
{'job_id': 'c58ca262b54d480b128435d0c315a34f', 'status': 'COMPLETED', 'result': [{'data': {'time': 19.651577949523926, 'counts': {'00': 479, '01': 53, '10': 59, '11': 433}, 'date': '2018-05-09T00:37:44.323Z'}, 'status': 'DONE'}], 'name': 'nwHnKuOEo7pen52Xfo1u7lUp4xtx9C', 'backend': 'ibmqx4'}

(pprint.pformat(job_result)))
elif job_result['status'] == 'RUNNING':
# we may have some other information here
if 'infoQueue' in job_result:
Copy link
Member

@ajavadia ajavadia May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix to my comment above is just to add
_status = JobStatus.RUNNING in the if here too. Right now it is only in the else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - that seems to be a needed fix! 👍

cls._qc = qc
cls._provider = IBMQProvider(QE_TOKEN, QE_URL)

def test_run(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add another test_run_device (or similar name), with @slow_test, that runs this same test against a real device? It seems to be failing (tested with ibmqx4):


self = <test.python.test_ibmqjob.TestIBMQJob testMethod=test_run>

    def test_run(self):
        backend = self._provider.get_backend('ibmqx4')
        qobj = qiskit._compiler.compile(self._qc, backend)
        quantum_job = QuantumJob(qobj, backend, shots=1024, preformatted=True)
        job = backend.run(quantum_job)
>       result = job.result()

test/python/test_ibmqjob.py:59: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
qiskit/backends/ibmq/ibmqjob.py:63: in result
    this_result = self._wait_for_job(timeout=timeout, wait=wait)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <qiskit.backends.ibmq.ibmqjob.IBMQJob object at 0x7f4661bc45f8>, timeout = None, wait = 5

    def _wait_for_job(self, timeout=60, wait=5):
        ....
    
        # Get the results
        api_result = self._api.get_job(job_id)
        job_result_return = []
        for index in range(len(api_result['qasms'])):
>           job_result_return.append({'data': api_result['qasms'][index]['data'],
                                      'status': api_result['qasms'][index]['status']})
E           KeyError: 'data'

qiskit/backends/ibmq/ibmqjob.py:245: KeyError

num_queued, num_jobs)
self.log.info('number of currently running jobs: %d/%d',
num_running, num_jobs)
self.assertTrue(all([(job.running or job.queued) for job in job_array]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace this line with an slightly expanded version? Something along the lines of:

        self.assertTrue(all([(job.running or job.queued or job.done) for job in job_array]))

        # Ensure job ids are unique.
        job_ids = [job.job_id for job in job_array]
        self.assertEqual(sorted(job_ids), sorted(list(set(job_ids))))

        # Wait for all the results.
        result_array = [job.result() for job in job_array]

        # Ensure all jobs have finished.
        self.assertTrue(all([job.done for job in job_array]))
        self.assertTrue(all([result.get_status() == 'COMPLETED' for result in result_array]))

The or job.done, along with the fix mentioned by @ajavadia at https://github.com/QISKit/qiskit-sdk-py/pull/403/files#r186906393, should be enough for passing the test in its current state - there is the minor issue that the line would make three queries to the API (one for each job.something) and increase the running time, but seems we can afford that for the moment.

The rest of the lines are for making the tests more complete: not only launching the jobs, but ensuring that they get completed and that they handle different job_ids. For the simulator, it does seem indeed that all the jobs will be in job.done state by the time they reach line 91 (but it seems fine too).

num_queued, num_jobs)
self.log.info('number of currently running jobs: %d/%d',
num_running, num_jobs)
self.assertTrue(all([(job.running or job.queued) for job in job_array]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also expand this line in the same way as the note in test_run_async_simulator()?

@diego-plan9
Copy link
Member

Thanks for the much needed tests, @ewinston ! For the rest of the team: please note that the tests are not run on travis and they need to be launched manually - and are currently failing, unfortunately. The good news is that at least one of the remaining issues that is causing seems to be easy to fix with the proposed fix by @ajavadia - and the other failing test seems also tiny.

quantum_job = QuantumJob(qobj, backend, preformatted=True)
job = backend.run(quantum_job)
while not job.done:
print(job.status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be self.log.info instead of print. It prints out a lot of extra lines during testing.

ewinston added 3 commits May 10, 2018 10:23
When submitting to a backend with a disabled queue
the job would wait in the INITIALIZING state indefinitely
@diego-plan9
Copy link
Member

Thanks @ewinston , and basically all team that was involved in this PR - merging! 🎉

@diego-plan9 diego-plan9 merged commit d40a0e6 into Qiskit:master May 10, 2018
lia-approves pushed a commit to edasgupta/qiskit-terra that referenced this pull request Jul 30, 2019
* Created Job class for backend rework.

* updated dummybackend to use BaseJob

* test_quantumprogram passing

needed to comment out async methods and associated tests.

* passing all tests

* added simple job run tests for local and IBM Q

* hide _run_job method for backends

* passing tests and linter

* qasm cpp simulator passing tests.

* linting

* remove enum.auto for python 3.5

* more linting

* resolve brokenprocesspool on mac

* remove uneeded tests

* bug fixes from merge conflict

* race condition in job_cancel test

* fix inadvertent change to a Makefile

* minor commit

* pre- moving ibmbackend job methods to ibmqjob.

* remove timeout from QuantumJob

- also move some job related methods from ibmqbackend to ibmqjob
- ibmqjob implements concurrent.futures-like API without using it internally.

* merge reorder_bits with async jobs

* minor bug fixes

* remove some uneeded timeouts. fix creation of job result struct.

* add async tests. make ibmq status atomic.

* added async tests and minor refinements

- added async test
- added backend_name method to IBMQJob and LocalJob
- cleaned unused timeouts for QuantumJob
- added queued method to IBMQJob and LocalJob

* correct KeyError in _wait_for_job

* make ibmqjob submission async

* update ibmq job device test

* change wait_for_results logic. add ibmjob device test.

* fix register size exception test

* minor linting issue

* extended ibmq job tests as recommended by @diego-plan9

* adjust ibmqx4 device test for noise

* change print to log.info in test_ibqjob

* fix minor bug

* fix infinite job initializing

When submitting to a backend with a disabled queue
the job would wait in the INITIALIZING state indefinitely
@ewinston ewinston deleted the job_class branch April 6, 2021 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants