Discussion:
[Linaro-validation] device tags and target device in V2 jobs
Sjoerd Simons
2016-07-23 22:21:34 UTC
Permalink
Hey all,

We've started experimenting with lava V2 jobs in our lava lab. Thusfar
things are going well but it seems there is no ability to specify
device tags nor to specific target device in V2 thusfar. Is support for
those something that's still in the pipeline :) ?
--
Sjoerd Simons
Collabora Ltd.
Neil Williams
2016-07-24 13:29:58 UTC
Permalink
On Sun, 24 Jul 2016 00:21:34 +0200
Post by Sjoerd Simons
Hey all,
We've started experimenting with lava V2 jobs in our lava lab. Thusfar
things are going well but it seems there is no ability to specify
device tags nor to specific target device in V2 thusfar. Is support
for those something that's still in the pipeline :) ?
Device tag support is a missing element from the V2 job submission
schema. (It has support in the unit tests.) I'll get this into review
next week and then into 2016.8, with some docs too.

There is no support for submitting to specific target devices as this
impedes both scheduling and lab management when needing to retire
broken hardware.
--
Neil Williams
=============
http://www.linux.codehelp.co.uk/
Sjoerd Simons
2016-07-25 07:29:28 UTC
Permalink
Post by Neil Williams
On Sun, 24 Jul 2016 00:21:34 +0200
Post by Sjoerd Simons
Hey all,
We've started experimenting with lava V2 jobs in our lava lab. Thusfar
things are going well but it seems there is no ability to specify
device tags nor to specific target device in V2 thusfar. Is support
for those something that's still in the pipeline :) ?
Device tag support is a missing element from the V2 job submission
schema. (It has support in the unit tests.) I'll get this into review
next week and then into 2016.8, with some docs too.
Great thanks!
Post by Neil Williams
There is no support for submitting to specific target devices as this
impedes both scheduling and lab management when needing to retire
broken hardware.
Hmm, That's true though. Fwiw What I tend to (ab)use submitting to
specific target devices for is mostly for hacking sessions and the
likes when needing to do some maintaince or other aspects that really
need one specific target device rather then any regular jobs. It would
be nice to cover that use-case somehow.
--
Sjoerd Simons
Collabora Ltd.
Neil Williams
2016-07-25 09:53:42 UTC
Permalink
On Mon, 25 Jul 2016 09:29:28 +0200
Post by Sjoerd Simons
Post by Neil Williams
On Sun, 24 Jul 2016 00:21:34 +0200
Post by Sjoerd Simons
Hey all,
We've started experimenting with lava V2 jobs in our lava lab. Thusfar
things are going well but it seems there is no ability to specify
device tags nor to specific target device in V2 thusfar. Is
support for those something that's still in the pipeline :) ?
Device tag support is a missing element from the V2 job submission
schema. (It has support in the unit tests.) I'll get this into
review next week and then into 2016.8, with some docs too.
Great thanks!
Post by Neil Williams
There is no support for submitting to specific target devices as
this impedes both scheduling and lab management when needing to
retire broken hardware.
Hmm, That's true though. Fwiw What I tend to (ab)use submitting to
specific target devices for is mostly for hacking sessions and the
likes when needing to do some maintaince or other aspects that really
need one specific target device rather then any regular jobs. It would
be nice to cover that use-case somehow.
Hacking sessions are for users though. As an admin, you already have
direct access to the device. This was one of the reasons why V1 had all
the device configuration on the dispatcher, so that local scripts could
parse out the connection_command and power_on_cmd to get a way to get
onto the device whilst it was Offline. (This is why we have maintenance
mode on a per-device level.)

With V2, that information is available directly from the UI, so all the
admin needs is take the device offline, ssh onto the dispatcher and
have a web browser looking at the device detail page. No need to wait
for the hacking session to be scheduled (another job could always get
in first, even at high priority a health check takes precedence or
there could be another high priority job already in the queue).

Just because hacking sessions log in a user as root, does *not* mean
that this is a workable solution for administration - that confuses the
issues. TestJobs, like hacking sessions, need to be ephemeral in terms
of storage - that way admins can trust that users can't actually undo
the admin setup just by using a hacking session themselves.
--
Neil Williams
=============
http://www.linux.codehelp.co.uk/
Sjoerd Simons
2016-07-25 10:56:13 UTC
Permalink
Post by Neil Williams
On Mon, 25 Jul 2016 09:29:28 +0200
Post by Sjoerd Simons
Post by Neil Williams
On Sun, 24 Jul 2016 00:21:34 +0200
 
There is no support for submitting to specific target devices as
this impedes both scheduling and lab management when needing to
retire broken hardware.  
Hmm, That's true though. Fwiw What I tend to (ab)use submitting to
specific target devices for is mostly for hacking sessions and the
likes when needing to do some maintaince or other aspects that really
need one specific target device rather then any regular jobs. It would
be nice to cover that use-case somehow.
Hacking sessions are for users though. As an admin, you already have
direct access to the device. This was one of the reasons why V1 had all
the device configuration on the dispatcher, so that local scripts could
parse out the connection_command and power_on_cmd to get a way to get
onto the device whilst it was Offline. (This is why we have
maintenance
mode on a per-device level.)
With V2, that information is available directly from the UI, so all the
admin needs is take the device offline, ssh onto the dispatcher and
have a web browser looking at the device detail page.
But that's basically doing by hand things that lava can already do for
you. 

Maybe i'm just too lazy, but I like telling lava to just go and boot a
board for me with a rootfs of choice such that i can login and do
whatever needs to be done without having to resort to setting things up
by hand.
Post by Neil Williams
No need to wait
for the hacking session to be scheduled (another job could always get
in first, even at high priority a health check takes precedence or
there could be another high priority job already in the queue).
In my experience health checks don't happen often enough to be
problematic for this. For the other aspects, simply restricting
submission to the device works well (Which depending on what gets done
is a good choice anyway). 

Though a maintaince priority/type of job that runs even if the device
is currently offline and trumps all other priorities would be really
nice for these kind of things. Though I bet you disagree on this aspect
:)
Post by Neil Williams
Just because hacking sessions log in a user as root, does *not* mean
that this is a workable solution for administration - that confuses the
issues. TestJobs, like hacking sessions, need to be ephemeral in terms
of storage - that way admins can trust that users can't actually undo
the admin setup just by using a hacking session themselves.
Given that a hacking session gives you root per definitions means folks
can do whatever they like on a board. Nothing is stopping someone in a
hacking session to e.g. reflash the bootloader :)
--
Sjoerd Simons
Collabora Ltd.
Neil Williams
2016-07-25 18:11:50 UTC
Permalink
On Mon, 25 Jul 2016 12:56:13 +0200
Post by Sjoerd Simons
Post by Neil Williams
On Mon, 25 Jul 2016 09:29:28 +0200
Post by Sjoerd Simons
Post by Neil Williams
On Sun, 24 Jul 2016 00:21:34 +0200
 
There is no support for submitting to specific target devices as
this impedes both scheduling and lab management when needing to
retire broken hardware.  
Hmm, That's true though. Fwiw What I tend to (ab)use submitting to
specific target devices for is mostly for hacking sessions and the
likes when needing to do some maintaince or other aspects that really
need one specific target device rather then any regular jobs. It would
be nice to cover that use-case somehow.
Hacking sessions are for users though. As an admin, you already have
direct access to the device. This was one of the reasons why V1 had all
the device configuration on the dispatcher, so that local scripts could
parse out the connection_command and power_on_cmd to get a way to
get onto the device whilst it was Offline. (This is why we have
maintenance
mode on a per-device level.)
With V2, that information is available directly from the UI, so all the
admin needs is take the device offline, ssh onto the dispatcher and
have a web browser looking at the device detail page.
But that's basically doing by hand things that lava can already do for
you. 
Maybe i'm just too lazy, but I like telling lava to just go and boot a
board for me with a rootfs of choice such that i can login and do
whatever needs to be done without having to resort to setting things
up by hand.
Why do you need a rootfs in the first place?

With LAVA V2, the only software needed on the board is the bootloader -
with the exception of devices supporting primary connections. There is
nothing that needs to be done in a rootfs for a V2 device.
Post by Sjoerd Simons
Post by Neil Williams
No need to wait
for the hacking session to be scheduled (another job could always
get in first, even at high priority a health check takes precedence
or there could be another high priority job already in the queue).
In my experience health checks don't happen often enough to be
problematic for this.
That's configurable. In a lab running 1,000 jobs a day it is routine.
Post by Sjoerd Simons
For the other aspects, simply restricting
submission to the device works well (Which depending on what gets done
is a good choice anyway). 
Though a maintaince priority/type of job that runs even if the device
is currently offline and trumps all other priorities would be really
nice for these kind of things. Though I bet you disagree on this
aspect :)
Only a forced health check must ever run on a device which is offline.
Health checks always take precedence over any priority settings.

Offline is a maintenance mode, especially for admins. That is the only
purpose of having an offline status. Offline means that the device is
currently unusable - it could be disconnected, bricked etc. It is up to
the admin to be confident that it is safe to run a health check. There
is also looping mode for repeating such tests.

We're updating the docs on health checks - stressing that a health
check needs to test every type of action supported by the device type
(except a hacking session as it still needs to be fully automated). The
health check still needs to be quick but it also needs to be thorough.
Post by Sjoerd Simons
Post by Neil Williams
Just because hacking sessions log in a user as root, does *not* mean
that this is a workable solution for administration - that confuses the
issues. TestJobs, like hacking sessions, need to be ephemeral in terms
of storage - that way admins can trust that users can't actually
undo the admin setup just by using a hacking session themselves.
Given that a hacking session gives you root per definitions means
folks can do whatever they like on a board. Nothing is stopping
someone in a hacking session to e.g. reflash the bootloader :)
Exactly - it is up to the admins to sanction such users as that causes
work for the admin. It depends on the device - with a device with
sufficient support, the bootloader can be safely replaced by the
testjob so it would not be a problem.

I don't see what operations are needed in V2 that can be done inside a
hacking session, except possibly updating the UBoot uEnv.txt but that's
possible to do from the bootloader shell as well.
--
Neil Williams
=============
http://www.linux.codehelp.co.uk/
Sjoerd Simons
2016-07-26 07:28:56 UTC
Permalink
Post by Neil Williams
On Mon, 25 Jul 2016 12:56:13 +0200
Post by Sjoerd Simons
Post by Neil Williams
On Mon, 25 Jul 2016 09:29:28 +0200
On Sun, 2016-07-24 at 14:29 +0100, Neil Williams wrote:  
Post by Neil Williams
On Sun, 24 Jul 2016 00:21:34 +0200
 
Maybe i'm just too lazy, but I like telling lava to just go and boot a
board for me with a rootfs of choice such that i can login and do
whatever needs to be done without having to resort to setting things
up by hand.
Why do you need a rootfs in the first place?
With LAVA V2, the only software needed on the board is the bootloader -
with the exception of devices supporting primary connections. There is
nothing that needs to be done in a rootfs for a V2 device.
Yes V2 makes a lot of the task redundant which is lovely.

What I do use it for is, which is think still applies to V2 is:
* Upgrading the bootloader
* For some board flashing peripherals firmware
* Diagnosing issues pointed out by health checks
* Diagnosing issues with particular boards exposed by tests.

In a prefect world health checks would be able to check everything, in
practise we don't have glass ball. In which cases it's been very
helpful to go and diagnose by hand what's going on to potentially
improve the health checks or general infrastructure.
Post by Neil Williams
Post by Sjoerd Simons
Post by Neil Williams
No need to wait
for the hacking session to be scheduled (another job could always
get in first, even at high priority a health check takes
precedence
or there could be another high priority job already in the
queue).  
In my experience health checks don't happen often enough to be
problematic for this.
That's configurable. In a lab running 1,000 jobs a day it is routine.
Ofcourse it's routine on the lab as a whole. I'm just talking from
experience that health checks don't happen often enough on any
particular board to be problematic for me to be blocked by them when i
need things done on a particular board (and if they do get in the way,
it's a good excuse to get some hot beverage).
Post by Neil Williams
Post by Sjoerd Simons
For the other aspects, simply restricting
submission to the device works well (Which depending on what gets done
is a good choice anyway). 
 
Post by Sjoerd Simons
Though a maintaince priority/type of job that runs even if the device
is currently offline and trumps all other priorities would be really
nice for these kind of things. Though I bet you disagree on this
aspect :)
Only a forced health check must ever run on a device which is
offline.
Health checks always take precedence over any priority settings.
Offline is a maintenance mode, especially for admins. That is the only
purpose of having an offline status. Offline means that the device is
currently unusable - it could be disconnected, bricked etc. It is up to
the admin to be confident that it is safe to run a health check. There
is also looping mode for repeating such tests.
Yes, what i'm saying is that as an admin who has determined all those
things I'd like to run a job which is not a health check for maintaince
work on the board. Potentially to solve the reason of why the health
check failed in the first place :)
Post by Neil Williams
We're updating the docs on health checks - stressing that a health
check needs to test every type of action supported by the device type
(except a hacking session as it still needs to be fully automated). The
health check still needs to be quick but it also needs to be
thorough.
Yes that's what our healthchecks do, but as said there is always a
tradeoff. (We've seen funky issues where for some reason an SD cards
broke in a way that made accessing particular areas very very slow
(probably firmware retrying reads) but others were entirely fine,
that's not really something a healthcheck can verify and still be
quick)
Post by Neil Williams
Post by Sjoerd Simons
Post by Neil Williams
Just because hacking sessions log in a user as root, does *not* mean
that this is a workable solution for administration - that
confuses
the
issues. TestJobs, like hacking sessions, need to be ephemeral in terms
of storage - that way admins can trust that users can't actually
undo the admin setup just by using a hacking session
themselves.  
Given that a hacking session gives you root per definitions means
folks can do whatever they like on a board. Nothing is stopping
someone in a hacking session to e.g. reflash the bootloader :)
Exactly - it is up to the admins to sanction such users as that causes
work for the admin. It depends on the device - with a device with
sufficient support, the bootloader can be safely replaced by the
testjob so it would not be a problem.
Yes that's what i've done in some cases. But even then i tend to be
careful and switch boards one by one to prevent unexpected side-
effect/issues.

But even if you assume it's safe enough to just run it as a normal test
job with no other precautions, how would I ensure that each board has
it's bootloader replaced apart from scheduling one job per specific
board? 
Post by Neil Williams
I don't see what operations are needed in V2 that can be done inside a
hacking session, except possibly updating the UBoot uEnv.txt but that's
possible to do from the bootloader shell as well.
Unfortunately not everything is u-boot, but yes for changing the u-boot
environment ofcourse you don't need to go fully into a rootfs.

-- 
Sjoerd Simons
Collabora Ltd.

Continue reading on narkive:
Loading...