Post by Neil WilliamsOn Mon, 25 Jul 2016 12:56:13 +0200
Post by Sjoerd SimonsPost by Neil WilliamsOn Mon, 25 Jul 2016 09:29:28 +0200
On Sun, 2016-07-24 at 14:29 +0100, Neil Williams wrote:
Post by Neil WilliamsOn Sun, 24 Jul 2016 00:21:34 +0200
Maybe i'm just too lazy, but I like telling lava to just go and boot a
board for me with a rootfs of choice such that i can login and do
whatever needs to be done without having to resort to setting things
up by hand.
Why do you need a rootfs in the first place?
With LAVA V2, the only software needed on the board is the bootloader -
with the exception of devices supporting primary connections. There is
nothing that needs to be done in a rootfs for a V2 device.
Yes V2 makes a lot of the task redundant which is lovely.
What I do use it for is, which is think still applies to V2 is:
* Upgrading the bootloader
* For some board flashing peripherals firmware
* Diagnosing issues pointed out by health checks
* Diagnosing issues with particular boards exposed by tests.
In a prefect world health checks would be able to check everything, in
practise we don't have glass ball. In which cases it's been very
helpful to go and diagnose by hand what's going on to potentially
improve the health checks or general infrastructure.
Post by Neil WilliamsPost by Sjoerd SimonsPost by Neil WilliamsNo need to wait
for the hacking session to be scheduled (another job could always
get in first, even at high priority a health check takes
precedence
or there could be another high priority job already in the
queue).
In my experience health checks don't happen often enough to be
problematic for this.
That's configurable. In a lab running 1,000 jobs a day it is routine.
Ofcourse it's routine on the lab as a whole. I'm just talking from
experience that health checks don't happen often enough on any
particular board to be problematic for me to be blocked by them when i
need things done on a particular board (and if they do get in the way,
it's a good excuse to get some hot beverage).
Post by Neil WilliamsPost by Sjoerd SimonsFor the other aspects, simply restricting
submission to the device works well (Which depending on what gets done
is a good choice anyway).
Post by Sjoerd SimonsThough a maintaince priority/type of job that runs even if the device
is currently offline and trumps all other priorities would be really
nice for these kind of things. Though I bet you disagree on this
aspect :)
Only a forced health check must ever run on a device which is
offline.
Health checks always take precedence over any priority settings.
Offline is a maintenance mode, especially for admins. That is the only
purpose of having an offline status. Offline means that the device is
currently unusable - it could be disconnected, bricked etc. It is up to
the admin to be confident that it is safe to run a health check. There
is also looping mode for repeating such tests.
Yes, what i'm saying is that as an admin who has determined all those
things I'd like to run a job which is not a health check for maintaince
work on the board. Potentially to solve the reason of why the health
check failed in the first place :)
Post by Neil WilliamsWe're updating the docs on health checks - stressing that a health
check needs to test every type of action supported by the device type
(except a hacking session as it still needs to be fully automated). The
health check still needs to be quick but it also needs to be
thorough.
Yes that's what our healthchecks do, but as said there is always a
tradeoff. (We've seen funky issues where for some reason an SD cards
broke in a way that made accessing particular areas very very slow
(probably firmware retrying reads) but others were entirely fine,
that's not really something a healthcheck can verify and still be
quick)
Post by Neil WilliamsPost by Sjoerd SimonsPost by Neil WilliamsJust because hacking sessions log in a user as root, does *not* mean
that this is a workable solution for administration - that
confuses
the
issues. TestJobs, like hacking sessions, need to be ephemeral in terms
of storage - that way admins can trust that users can't actually
undo the admin setup just by using a hacking session
themselves.
Given that a hacking session gives you root per definitions means
folks can do whatever they like on a board. Nothing is stopping
someone in a hacking session to e.g. reflash the bootloader :)
Exactly - it is up to the admins to sanction such users as that causes
work for the admin. It depends on the device - with a device with
sufficient support, the bootloader can be safely replaced by the
testjob so it would not be a problem.
Yes that's what i've done in some cases. But even then i tend to be
careful and switch boards one by one to prevent unexpected side-
effect/issues.
But even if you assume it's safe enough to just run it as a normal test
job with no other precautions, how would I ensure that each board has
it's bootloader replaced apart from scheduling one job per specific
board?
Post by Neil WilliamsI don't see what operations are needed in V2 that can be done inside a
hacking session, except possibly updating the UBoot uEnv.txt but that's
possible to do from the bootloader shell as well.
Unfortunately not everything is u-boot, but yes for changing the u-boot
environment ofcourse you don't need to go fully into a rootfs.
--
Sjoerd Simons
Collabora Ltd.