Time |
Nick |
Message |
07:30 |
|
kworstell-isl joined #evergreen |
07:41 |
|
BDorsey joined #evergreen |
08:05 |
|
sharpsie joined #evergreen |
08:05 |
|
jweston joined #evergreen |
08:06 |
|
Rogan joined #evergreen |
08:13 |
|
kworstell_isl joined #evergreen |
08:15 |
|
kworstell_isl_ joined #evergreen |
08:16 |
|
kworstell-isl joined #evergreen |
08:36 |
|
mmorgan joined #evergreen |
08:46 |
|
kworstell-isl joined #evergreen |
09:07 |
|
rfrasur joined #evergreen |
09:13 |
|
Dyrcona joined #evergreen |
09:15 |
Dyrcona |
I thought that the errored autorenewals from the other day happened because of the 'no children available' message for open-ils.trigger, but we had 6 that errored this morning. Five of them could be renewed, so I reset them to pending statues and ran the a/t trigger runner again. |
09:17 |
Dyrcona |
The 5 events went to 'complete' after. There's no error output for any of the 6 events. |
09:18 |
Dyrcona |
We *only* had 16,388 auto-renewals last night. Some nights it is more. |
09:30 |
Dyrcona |
I guess I should have mentioned that we don't have the 'no children available' message in the logs for last night. |
09:33 |
Dyrcona |
Auto-renewals appear to be the only event that this happens to. |
09:40 |
Dyrcona |
Fore the last 6 months only our autorenewal events have a state of error. No other event has that state. |
09:42 |
|
BDorsey joined #evergreen |
09:43 |
pinesol |
News from commits: LP#2030523 - OAI config - repository_name extra space <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=0df7343247f867543f07c62eba13e96767c5cf2e> |
09:51 |
Dyrcona |
The one event that I didn't update and rerun corresponds to an error message in the log. I'm going to check again when this happens tomorrow and see if any errored events look the same before I reprocess them. |
09:59 |
Dyrcona |
Ah ha! It looks like that one renewed, but blew up when trying to make the event for the autorenewal notification. The other five events didn't renew, so what happened with those remains a mystery for now. |
10:06 |
Dyrcona |
It looks like we've got drones exiting unceremoniously, i.e. without the listener knowing and while another process is trying to use it still. |
10:12 |
Dyrcona |
I think we need to stop using Perl on the back end. |
10:13 |
pinesol |
News from commits: LP 1917083: Add SSO support to BPAC <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=4e6bd9d7e9d8d608842ccef131a09d30b128b146> |
10:20 |
|
dguarrac joined #evergreen |
10:25 |
* Dyrcona |
wonders if there could be a bug that only affects trigger drones, or maybe we only see it with trigger drones because they're processing 15k to 20k events "at a time?" |
10:34 |
Bmagic |
Dyrcona: yep, there is something "wrong" with Evergreen action triggers but the number of variables are many. And seems to only matter in large volumes. Troubleshooting is tough when it's only affecting a small number of Evergreen consortia. |
10:35 |
Bmagic |
Each trigger definition is a special snowflake. The grouping is situational, etc. |
10:42 |
Dyrcona |
Bmagic: I've been dealing with it for years, but it seems to just be auto-renewals for now. If I focus on those events and the logs around those time, maybe we can crack this nut? |
10:43 |
Bmagic |
Agreed on autorenewals. I took all of the action triggers that reacted with autorenewals, and split them each into their own granularity: autorenew-0, autorenew-1, etc. Then setup a bash script to loop over $i for each number and executed action_trigger_runner for each granularity. And I setup a VM that only does that. |
10:44 |
Bmagic |
That pretty much solved it |
10:44 |
|
mantis1 joined #evergreen |
10:50 |
Dyrcona |
I'm not sure that I follow that. There's really only the one event, unless you've customized the event output for different branches. |
10:51 |
|
BrianK joined #evergreen |
10:52 |
Dyrcona |
Anyway, looking the log from early this morning, I can see that the event that renewed blew up trying to make the autorenew notify event, then the other 5 events were just set to 'error' state by the same trigger process. There don't appear to be any other intervening calls to other services. |
10:52 |
Bmagic |
if there's only one trigger definition, and that one definition launches into tens of thousands events, then that's different. I'm talking about branches/systems in a large consortium, each with their own autorenewal definition |
10:52 |
Dyrcona |
OK. We just have the one, and that's what I thought you meant. |
10:53 |
Bmagic |
My solution was generally: split it up into smaller chunks. Plus split those triggers away from the "general" utility server. So that the autorenewal triggers get the full opensrf drone stack all to itself |
10:54 |
Dyrcona |
We used to run 2 utility servers with different things running on each, but we've pretty much always had trouble with autorenewal regardless of what we do. |
10:55 |
Bmagic |
There is something* to the idea of a single trigger event throwing an error, and causing "the rest" of the triggers to fail. At least the rest of the triggers that were queued for that PID. Or something along those lines. If that one event hadn't been in the queue, the theory is that the other triggers would haven't had an issue finishing |
10:56 |
Bmagic |
would haven't had/wouldn't have had |
10:58 |
Dyrcona |
Yeah. The code for that is probably in OpenSRF. |
10:59 |
berick |
we made a number of autorenew changes locally, but one easy one that might help w/ general A/T stability is something like this: https://gist.github.com/berick/1fff8458c9ca66ba6ac54609625b4da2 |
11:00 |
berick |
in short, change the "fire and forget" event call (toward the end of the reactor) into a "fire and wait" call |
11:00 |
berick |
the fire-and-forget approach can potentially lead to stuff piling up in A/T in a possibly unhelpful way |
11:02 |
berick |
I did the same with MarkItemLost.pm and IIRC it helped -- it's been a while |
11:02 |
Bmagic |
berick++ |
11:02 |
|
Christineb joined #evergreen |
11:03 |
Dyrcona |
berick: Thanks! We might give that a try. It will make the events run a bit longer, though, won't it? |
11:04 |
berick |
on the individual event level, maybe milliseconds, but overrall it's the same amount of work being done on the machine so it pretty much levels out |
11:04 |
Dyrcona |
berick++ |
11:05 |
berick |
if i'm wrong about taking longer, let me know. it didn't have any impact here. |
11:05 |
berick |
but I didn't time it |
11:10 |
Dyrcona |
If it is on the order of milliseconds, it won't make much difference. We can always try it to see what happens. |
11:45 |
|
jihpringle joined #evergreen |
12:25 |
Dyrcona |
Hmm.. That patch needs to have AppUtil added. |
12:26 |
Dyrcona |
Well, not exactly that, but $U isn't set, at least not in 3.7. |
12:31 |
|
jvwoolf joined #evergreen |
12:42 |
Dyrcona |
Interesting... It doesn't seem to overwhelm the trigger drones this way, either. |
12:42 |
Dyrcona |
open-ils.trigger [38390] uptime=09:22 cputime=00:00:00 #drones=2/20 10% |
12:45 |
Dyrcona |
It may not have hit autorenewal yet. Looks like it is working on a different daily granularity event right now. |
12:46 |
Dyrcona |
OK. It has hit the autorenewals, there one that is in state collecting, and looks like it has jumped to 5 trigger drones running. |
12:48 |
Dyrcona |
Think I'll check it again in about half an hour. |
13:01 |
|
jihpringle joined #evergreen |
13:43 |
Dyrcona |
There are 13 open-ils.trigger processes running now, and it's churning through the autorenewal events: 2,1481 complete, 5 reacting, and 18,178 collected. There are 2,485 notifications pending. |
13:44 |
Dyrcona |
It also handled 291 of the 7-day expiration notices we run as part of the daily granularity. |
13:47 |
|
BDorsey joined #evergreen |
13:48 |
Dyrcona |
I'm thinking that we don't need the $ses variable connected to open-ils.trigger with berick's patch. |
13:48 |
Dyrcona |
It's not used any longer. |
13:51 |
Dyrcona |
I'll play with this some more and open a bug later. I think this ought to be the standard behavior and not just a local fix. |
14:15 |
|
stompro joined #evergreen |
14:41 |
stompro |
jeff, Did you have a library that developed bug #1745623? You mentioned the possibility in 2018, just curious if that happened? |
14:41 |
pinesol |
Launchpad bug 1745623 in Evergreen "wishlist circ history - notify staff at checkout if item is in users circ history" [Wishlist,In progress] https://launchpad.net/bugs/1745623 - Assigned to Jeff Godin (jgodin) |
15:30 |
|
mantis1 left #evergreen |
15:32 |
jeff |
yes, that's something we have a rough implementation of. |
15:32 |
jeff |
"rough" as in it works, but could use more polish, especially around staff ability to turn on/off with patron notification, etc. |
15:33 |
jeff |
we also use an external report for the "allow staff to view my list of items out" functionality. |
15:34 |
jeff |
the "in time for 3.2" was clearly optimistic. |
15:35 |
jeff |
for a small library that has a number of patrons who rely on the ability to hand circ staff a pile of books and say "just the ones I haven't already read", it does the trick. |
15:35 |
jeff |
it also has some benefit for outreach "don't bring this homebound patron the same thing over and over", though it is by no means a full set of features for outreach. |
15:36 |
jeff |
I thought I saw some recent comments on that bug. |
15:37 |
jeff |
ah, yes. one. |
15:55 |
Dyrcona |
Hmm. According to a commit message from 2019 the open-ils.circ.renew.auto API was supposed to be removed in Evergreen 3.5. I bet it wasn't. |
15:58 |
jeffdavis |
It was! You wrote the commit that removed it! bug 1856868 |
15:58 |
pinesol |
Launchpad bug 1856868 in Evergreen "Remove deprecated open-ils.circ.renew.auto API" [Low,Fix released] https://launchpad.net/bugs/1856868 |
15:59 |
jeffdavis |
not until 3.8 though, looks like |
15:59 |
Dyrcona |
jeffdavis++ # I just found it in the history. |
16:00 |
Dyrcona |
I was looking at a different file's log earlier. |
16:02 |
jeffdavis |
it's always nice to see deprecated code actually get removed |
16:04 |
Dyrcona |
In this case it was something that was never really useful since it took a different approach from the normal way that the type of renewal is specified. |
16:06 |
|
jihpringle joined #evergreen |
16:28 |
stompro |
jeff, I would love to take a look at the code if you ever want to just submit what you have. |
16:29 |
stompro |
I could try and work on the missing bits. |
16:33 |
Dyrcona |
berick | Bmagic: Lp 2030915 if either of you would like to confirm it, add additional comments, or correct mistakes in the description. |
16:33 |
pinesol |
Launchpad bug 2030915 in Evergreen 3.11 "Autorenewal Can Overwhelm open-ils.trigger Service Drones" [Undecided,New] https://launchpad.net/bugs/2030915 |
16:33 |
berick |
Dyrcona++ |
16:33 |
Bmagic |
Dyrcona++ |
16:33 |
Dyrcona |
I plan to add a branch based on a cleaned version of berick's patch in the morning. |
16:34 |
Dyrcona |
Bmagic++ berick++ |
17:01 |
Bmagic |
Dyrcona++ # will do |
17:07 |
|
mmorgan left #evergreen |
20:54 |
|
stompro joined #evergreen |