Improve thread safety of SwaggerClient #202

mckinsel · 2018-10-23T22:08:07Z

SwaggerClient discovers methods by reading a swagger document, creating
instances of _ClientMethodFactory for paths in the swagger, and
assigning those instances as class attributes of SwaggerClient.

This means that a single instance of each _ClientMethodFactory is shared
across threads. This is not so good because _ClientMethodFactories
store state associated with requests. In particular, consider the
following scenario:

Thread-1: client = hca.dss.DSSClient()
Thread-2: client = hca.dss.DSSClient()
Thread-1: with client.get_file.stream(...) as handle:
Thread-2: with client.get_file.stream(...) as handle:
Thread-1: handle.raw.read()

Depending on the execution of __enter__, Thread-1 may have just read from
the file for Thread-2! Going on,

Thread-1: __exit__ the with block
Thread-2: handle.raw.read()
Thread-2: __exit__ the with block
Exception: NoneType has no attribute close()

Or you know, something like that.

This change makes the methods instance attributes.

This incorporates the change in HumanCellAtlas/dcp-cli#202

* parallelize read from DSS in workers * Use thread-safer DCP interface This incorporates the change in HumanCellAtlas/dcp-cli#202 * Include thread id in logs This is helpful in our multithreaded code paths * Don't print logs twice AWS Lambda has a default root logger that captures everything that goes to stdout. If you don't turn `propagate` off, you get every message twice in cloudwatch. * Poll for locks much less frequently A failure mode we encounter is using up capacity on the lock table. If we're anticipating smaller numbers of worker lambdas that handle larger blocks of works, we can wait longer between checking for lock availability. This reduced our dynamo consumed capacity. * Distribute 100 bundles to each lambda And aggressively parallelize DSS I/O. * Assume that each bundle has a single cell in the mapper This is a bad assumption, but it's helpful for the demo. * Address comments

ttung

unit tests are broken, btw.

my suggestion for making this a bit safer: in __init__(..), create a lock. in stream(..), hold the lock, and verify that _context_manager_response is None before issuing the request. if it's not None, then the previous request is plausibly still running.

mckinsel · 2018-10-28T20:20:38Z

That's definitely safer, but I think the safety is achieved by fully serializing calls to stream. Once one thread calls stream, no other thread could until that first thread calls __exit__.

mckinsel · 2018-10-28T21:10:44Z

Also, re unit tests, I'm pretty sure master is broken? Is that possible?

ttung · 2018-10-29T18:28:06Z

That's definitely safer, but I think the safety is achieved by fully serializing calls to stream. Once one thread calls stream, no other thread could until that first thread calls exit.

I think my suggestion achieves that same effect...?

ttung · 2018-10-29T18:28:19Z

Also, re unit tests, I'm pretty sure master is broken? Is that possible?

Yes, that's definitely possible.

kozbo · 2018-11-13T22:37:43Z

Passing now, would you like to try your unit tests again?

* parallelize read from DSS in workers * Use thread-safer DCP interface This incorporates the change in HumanCellAtlas/dcp-cli#202 * Include thread id in logs This is helpful in our multithreaded code paths * Don't print logs twice AWS Lambda has a default root logger that captures everything that goes to stdout. If you don't turn `propagate` off, you get every message twice in cloudwatch. * Poll for locks much less frequently A failure mode we encounter is using up capacity on the lock table. If we're anticipating smaller numbers of worker lambdas that handle larger blocks of works, we can wait longer between checking for lock availability. This reduced our dynamo consumed capacity. * Distribute 100 bundles to each lambda And aggressively parallelize DSS I/O. * Assume that each bundle has a single cell in the mapper This is a bad assumption, but it's helpful for the demo. * Address comments

SwaggerClient discovers methods by reading a swagger document, creating instances of _ClientMethodFactory for paths in the swagger, and assigning those instances as class attributes of SwaggerClient. This means that a single instance of each _ClientMethodFactory is shared across threads. This is not so good because _ClientMethodFactories store state associated with requests. In particular, consider the following scenario: Thread-1: client = hca.dss.DSSClient() Thread-2: client = hca.dss.DSSClient() Thread-1: with client.get_file.stream(...) as handle: Thread-2: with client.get_file.stream(...) as handle: Thread-1: handle.raw.read() Depending on the execution of __enter__, Thread-1 may have just read from the file for Thread-2! Going on, Thread-1: __exit__ the with block Thread-2: handle.raw.read() Thread-2: __exit__ the with block Exception: NoneType has no attribute close() Or you know, something like that. This change makes the methods instance attributes.

Really feel like this one should pass...

mckinsel · 2018-11-16T21:14:12Z

Thanks @kozbo , still failing, but interesting failures now.

kozbo · 2019-07-01T21:54:03Z

@mckinsel any progress on this? Is this still an issue?

mckinsel added a commit to HumanCellAtlas/matrix-service that referenced this pull request Oct 24, 2018

Use thread-safer DCP interface

7575878

This incorporates the change in HumanCellAtlas/dcp-cli#202

mckinsel added a commit to HumanCellAtlas/matrix-service that referenced this pull request Oct 24, 2018

Use thread-safer DCP interface

fbaf694

This incorporates the change in HumanCellAtlas/dcp-cli#202

mckinsel added a commit to HumanCellAtlas/matrix-service that referenced this pull request Oct 24, 2018

Use thread-safer DCP interface

4b78629

This incorporates the change in HumanCellAtlas/dcp-cli#202

mckinsel added a commit to HumanCellAtlas/matrix-service that referenced this pull request Oct 24, 2018

Use thread-safer DCP interface

2ee61e2

This incorporates the change in HumanCellAtlas/dcp-cli#202

calvinnhieu pushed a commit to HumanCellAtlas/matrix-service that referenced this pull request Oct 24, 2018

Use thread-safer DCP interface

243f886

This incorporates the change in HumanCellAtlas/dcp-cli#202

mckinsel requested a review from ttung October 26, 2018 16:10

ttung requested changes Oct 26, 2018

View reviewed changes

kozbo added this to the M-11-25-18 milestone Nov 14, 2018

mckinsel added 2 commits November 16, 2018 20:59

Add a troubling test of multithreaded SwaggerClient

3c57a7f

Really feel like this one should pass...

mckinsel force-pushed the mckinsel-thread-safety branch from ae247b3 to 3c57a7f Compare November 16, 2018 21:05

kozbo assigned mckinsel and amarjandu Dec 7, 2018

kozbo modified the milestones: M-11-25-18, M-12-21-2018 Dec 21, 2018

kozbo removed this from the M-12-21-2018 milestone Feb 26, 2019

kozbo unassigned amarjandu Jul 1, 2019

Bento007 force-pushed the master branch from be7528a to 4d844f3 Compare September 27, 2019 23:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve thread safety of SwaggerClient #202

Improve thread safety of SwaggerClient #202

Uh oh!

mckinsel commented Oct 23, 2018

Uh oh!

ttung left a comment

Uh oh!

mckinsel commented Oct 28, 2018

Uh oh!

mckinsel commented Oct 28, 2018

Uh oh!

ttung commented Oct 29, 2018

Uh oh!

ttung commented Oct 29, 2018

Uh oh!

kozbo commented Nov 13, 2018

Uh oh!

mckinsel commented Nov 16, 2018

Uh oh!

kozbo commented Jul 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Improve thread safety of SwaggerClient #202

Are you sure you want to change the base?

Improve thread safety of SwaggerClient #202

Uh oh!

Conversation

mckinsel commented Oct 23, 2018

Uh oh!

ttung left a comment

Choose a reason for hiding this comment

Uh oh!

mckinsel commented Oct 28, 2018

Uh oh!

mckinsel commented Oct 28, 2018

Uh oh!

ttung commented Oct 29, 2018

Uh oh!

ttung commented Oct 29, 2018

Uh oh!

kozbo commented Nov 13, 2018

Uh oh!

mckinsel commented Nov 16, 2018

Uh oh!

kozbo commented Jul 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants