simp-core tests fail with random closed stream IOErrors

Description

On different servers, the simp-core acceptance tests fail at different stages in the test (yum installs, puppet agent runs, etc.). All of these cases, the log lines look something like:

Some of the tests have 'workarounds' for this at specific checkpoints (begin/rescue blocks with retry). However, since the failure can occur on *any *host action (host.install, on(), retry_on(), etc.), those workarounds are insufficient.

Other notes:

  • Some failures occur within 10 minutes of the test running, so the session TMOUT parameter of 900 seconds doesn't seem to be the source of the problem.

  • I have tried increasing the ClientAliveInterval (to 2400 seconds) in the sshd configuration, but that did not solve the problem.

  • I have set the ssh keepalive in the nodeset, but that did not solve the problem. (Since the default is true, I wasn't really expecting this one to make a difference).

Acceptance Criteria

None

Activity

Show:
Liz Nemsick
July 23, 2019, 8:29 PM

Investigated beaker changes and believes the problem is in their new timeout logic.
He entered https://tickets.puppetlabs.com/browse/BKR-1605

Liz Nemsick
July 25, 2019, 4:58 PM

has a PR into beaker https://github.com/puppetlabs/beaker/pull/1594 that may help us solve the problem with custom ssh configuration for the nodeset.

Labels

Epic Link

Story Points

None

Components

Assignee

Trevor Vaughan

Priority

Medium
Configure