Time |
Nick |
Message |
06:01 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
07:04 |
|
rjackson_isl joined #evergreen |
08:39 |
|
mmorgan joined #evergreen |
09:01 |
|
mantis1 joined #evergreen |
09:10 |
|
Dyrcona joined #evergreen |
09:20 |
|
rfrasur joined #evergreen |
10:01 |
|
sandbergja joined #evergreen |
10:05 |
Dyrcona |
So, I have a high confidence that error messages like the following are related to the action trigger crashes and events staying in a collected state: |
10:05 |
Dyrcona |
Caught error from 'run' method: Exception: OpenSRF::EX::JabberDisconnected 2019-12-26T06:26:48 OpenSRF::Application /usr/local/share/perl/5.22.1/OpenSRF/Application.pm:240 JabberDisconnected Exception: This JabberClient instance is no longer connected to the server |
10:08 |
Dyrcona |
I also think the failure of the action_trigger_runner.pl to exit and for additional events to end up collected has to do with open-ils.trigger's use of OpenSRF::MultiSession without a failure handler or possibly it not seeing the error at all. |
10:08 |
Dyrcona |
What I'm unsure of is how to propagate that error up to action_trigger_runner.pl so that it can exit immediately and limit the damage. |
10:13 |
|
stephengwills joined #evergreen |
10:15 |
berick |
Dyrcona: is this one of those A/T reactors that creates new A/T events inside the reactor? |
10:16 |
Dyrcona |
berick: No. It's the processing of event created by an A/T that creates events. |
10:16 |
Dyrcona |
So, maybe..... |
10:16 |
berick |
which reactor? |
10:17 |
Dyrcona |
Autorenewal Notice, id 125. |
10:17 |
Dyrcona |
Created by Autorenewal, id 124. |
10:17 |
berick |
ok, yeah, as feared |
10:18 |
berick |
this is consistent with what caused us problems with MarkItemLost |
10:18 |
berick |
essentially recursive A/T |
10:25 |
|
Dyrcona joined #evergreen |
10:25 |
* Dyrcona |
checks the logs. Fine time for my wifi to drop. |
10:27 |
Dyrcona |
Yeah, so some things I tried to say didn't go through, and I missed berick's reply.... |
10:27 |
berick |
Dyrcona: .. |
10:27 |
berick |
10:17 <+berick> ok, yeah, as feared |
10:27 |
berick |
10:18 <+berick> this is consistent with what caused us problems with MarkItemLost |
10:27 |
berick |
10:18 <+berick> essentially recursive A/T |
10:27 |
Dyrcona |
Yeah. I saw that in the logs. :) |
10:27 |
berick |
I think we have enough to open an LP anyway |
10:27 |
berick |
which I can do in a bit |
10:28 |
Dyrcona |
I'd really like to get to the root of the jabber disconnections. Is it timeout, stanza size, something else? |
10:28 |
Dyrcona |
That's not present in the logs, AFAICT. |
10:31 |
berick |
one thing that comes to mind.. both MarkItemLost and AutoRenew create secondary A/T events in a fire-and-forget manner. this could lead to a pile up of requests |
10:31 |
berick |
might be worth teaching them to wait for the response from their open-ils.trigger.event.autocreate calls |
10:34 |
pastebot |
"berick" at 168.25.130.30 pasted "AutoRenew.pm waits for event creation" (23 lines) at http://paste.evergreen-ils.org/10117 |
10:34 |
berick |
something like that ^- |
10:35 |
Dyrcona |
I can try that, but I've almost never had this happen on a test environment. |
10:37 |
mmorgan |
fwiw, I'm not seeing the same issues as Dyrcona, but the highest number of autorenewal events we've had in a day is just under 7000 |
10:38 |
Dyrcona |
mmorgan: Get that many on a slow day. :) |
10:42 |
Dyrcona |
Meh.... lineendings-- |
10:42 |
mmorgan |
Dyrcona: Do you see the same problem berick mentioned with MarkItemLost? |
10:44 |
Dyrcona |
No, not usually. I do see this problem with courtesy notices, which are not recursive, IIRC. |
10:44 |
Dyrcona |
I can get some numbers in a bit if anyone cares. |
10:45 |
Dyrcona |
It happens with PO JEDI occasionally, too, when there are some large number of line items. |
10:45 |
Dyrcona |
I have a script to fix PO JEDI issues. |
10:46 |
Dyrcona |
I think A/T is the wrong mechanism for some of these things. If it is something that should be generated regularly a plain old cron job should do it. |
10:47 |
Dyrcona |
I can see A/T for things triggered by user actions, like PO JEDI, for instance. |
10:48 |
Dyrcona |
I'm going to run a daily action trigger runner on a test vm with --verbose and --debug-stdout |
10:50 |
berick |
A/T jedi's been replaced, of course. /me still needs to make that migration |
10:50 |
Dyrcona |
Um, yeah, I guess it has. :) |
10:50 |
Dyrcona |
We were testing the replacement, but lost the configuration when I refreshed the training database. |
10:56 |
Dyrcona |
I imagine that this will take a few hours. |
10:56 |
Dyrcona |
I should test it again on Thursday. |
10:59 |
Dyrcona |
I'm running it with berick's patch, I suppose I should run it without and then with.... |
11:00 |
Dyrcona |
So far, though, nothing has blown up. :) |
11:01 |
Dyrcona |
Or, maybe it has. syslog hasn't changed in about 6 minutes. |
11:02 |
Dyrcona |
OIC: It's waiting on the query to find all of the items due to finish in the database. |
11:06 |
Dyrcona |
berick: I'll listen the Keiser Report while I wait. :P |
11:07 |
berick |
heh |
11:09 |
jeff |
as much as i dislike "changing the rules" on open circulations (editing columns like recurring_fine and max_fine), I feel that "removing all fines on materials for teens and children" is one of the better reasons to go through the hassle and resulting chaos. |
11:12 |
Dyrcona |
I probably wouldn't got that far. We'd change the rules and leave it at that. |
11:12 |
Dyrcona |
heh... got... go! |
11:12 |
jeff |
we're going that far, including wiping out outstanding fines previously billed. |
11:12 |
jeff |
baby steps. |
11:14 |
Dyrcona |
OK. Looks like a/t is doing more stuff... text scrolling by in my tmux window with tail -f. |
11:15 |
Dyrcona |
event.id_list, pause, a bunch of event.create.... |
11:15 |
Dyrcona |
probably other messages going by too fast to read. |
11:38 |
|
Christineb joined #evergreen |
11:47 |
Dyrcona |
action_trigger_runner.pl appears to be working on autorenewals, now. |
12:03 |
|
sandbergja joined #evergreen |
12:04 |
|
mmorgan left #evergreen |
12:27 |
Dyrcona |
I always forget that granularity-only doesn't apply to process_hooks and that granularity only gets set if granularity is set. |
12:42 |
Dyrcona |
OOh. I should have run Daily-PD-2 first. I've got 89,428 courtesy notices. |
12:43 |
Dyrcona |
It normally starts at 2:30 am and the daily granularity that does the autorenewals starts at 3:05 am. |
12:46 |
Dyrcona |
I misspoke about process hooks and granularity earlier. |
13:05 |
|
mantis1 left #evergreen |
13:11 |
Dyrcona |
I should check the events on production. |
13:12 |
|
khuckins joined #evergreen |
13:12 |
* Dyrcona |
is a bit slow in the brainbox today. |
13:18 |
Dyrcona |
So, it looks like 26,445 courtesy notices got stuck in 'collected' this morning. |
13:18 |
Dyrcona |
I'll update them to pending in production and see what happens in about 10 minutes. |
13:21 |
Dyrcona |
Looks production has about half as many as my test system. I suppose the difference are the ones that were checked in between midnight Sunday and this morning. |
13:21 |
Dyrcona |
Well, midnight Saturday to Sunday. |
13:22 |
Dyrcona |
Looks like the 10,694 autorenewal and notice events processed OK. |
13:41 |
Dyrcona |
Hmm. The generic run pending did not pick them up. |
14:28 |
Dyrcona |
So, running an action_trigger_runner with --run-pending and the granularity picked up the courtesy notices. |
14:28 |
Dyrcona |
I thought --run-pending with no granularity picked up everything. |
14:28 |
* Dyrcona |
checks the code, again. |
14:30 |
Dyrcona |
line 204 of action_trigger_runner.pl is interesting.... using => as a comma, which is all it really is when you look at Perl's C code. |
14:32 |
Dyrcona |
Hmm.... This seems what I remember from looking at the code before, specifiying no granularity with --run-pending looks for granularity = null. |
14:33 |
Dyrcona |
Ah, right... The autorenewal notice events have null granularity. |
14:34 |
Dyrcona |
So, there's an interplay going on here. The daily a/t runner creates the autorenewal notice events but our twice hourly run-pending runner runs them. Could it be a timing issue? |
14:35 |
Dyrcona |
Seems like I've had parts of this monologue before... |
14:44 |
|
khuckins joined #evergreen |
14:46 |
Dyrcona |
So, test for Thursday morning: start Daily-PD-2, then start daily, then run a generic run-pending every half hour or so. |
14:46 |
Dyrcona |
With --verbose and --debug-stdout on all three. |
16:01 |
|
sandbergja joined #evergreen |
16:38 |
|
book` joined #evergreen |
18:01 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |