Time |
Nick |
Message |
06:00 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
07:20 |
|
rjackson_isl_hom joined #evergreen |
07:27 |
|
Dyrcona joined #evergreen |
07:48 |
|
rfrasur joined #evergreen |
08:31 |
|
mmorgan joined #evergreen |
08:39 |
|
mantis1 joined #evergreen |
08:47 |
|
jvwoolf joined #evergreen |
08:54 |
|
jvwoolf1 joined #evergreen |
09:45 |
|
jvwoolf joined #evergreen |
09:55 |
|
mantis1 left #evergreen |
09:57 |
|
dbwells_ joined #evergreen |
09:59 |
|
mantis1 joined #evergreen |
10:31 |
|
dbwells joined #evergreen |
10:35 |
|
rfrasur joined #evergreen |
10:36 |
|
sandbergja joined #evergreen |
10:54 |
pinesol |
[evergreen|Bill Erickson] LP1837656 Org proximity admin disable org filter - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=07971c7> |
11:13 |
|
berick joined #evergreen |
12:04 |
|
jihpringle joined #evergreen |
12:09 |
mmorgan |
I'm trying to change a best hold selection sort order on a test system and it's not working. |
12:10 |
mmorgan |
We were using a sort order which always sent holds home, and I just changed it to Traditional, and it's still sending holds home. |
12:11 |
mmorgan |
I've restarted services, ran autogen, am I missing something? Traditional should capture a hold where an item lands, right? |
12:22 |
|
mantis2 joined #evergreen |
12:22 |
Dyrcona |
mmorgan: Have you run the hold targeter since making the change? |
12:23 |
|
sandbergja joined #evergreen |
12:24 |
mmorgan |
Dyrcona: I have manually retargeted the individual holds I'm testing, and confirmed that the non-owned item I'm checking is in the hold copy map for the hold at the pickup point. |
12:25 |
Dyrcona |
TBH, I don't know how that works any more. It's too complicated. |
12:26 |
mmorgan |
Agreed, it is complicated :) |
13:08 |
|
sandbergja joined #evergreen |
13:10 |
jeffdavis |
My test server with rel_3_5 and a few backports ran out of disk space due to 34159161 "Could not launch a new child" messages in the logs. |
13:11 |
Dyrcona |
jeffdavis: Yes, that happens. |
13:12 |
jeffdavis |
It didn't happen on a beta server with the same configuration. |
13:13 |
|
sandbergja joined #evergreen |
13:13 |
Dyrcona |
jeffdavis: Could be different work loads. |
13:15 |
jeffdavis |
Not likely in this case. |
13:18 |
jeffdavis |
Sorry to be terse, I'm on a call and also looking for more info on causes of the server issue. |
13:30 |
berick |
if i'm reading the code right, there appears to be no speedbump between attempts to pass a request to a child when the request is read from the backlog queue and there was no child available to process it. |
13:31 |
berick |
which could result in spewing that warning message |
13:35 |
berick |
looks like the Perl code adds a 1 second delay for that same scenario |
13:37 |
csharp |
so the logs filled up the disk? or something filled it up and the logs are showing you what? |
13:38 |
berick |
i read that as the logs filled up the disk |
13:38 |
* csharp |
5p34ks l337 at 13:37 |
13:38 |
csharp |
yeah, that's what I landed on :-) |
13:38 |
jeffdavis |
open-ils.cstore 2020-05-13 08:49:38 [WARN:25277:osrf_prefork.c:1051:] Could not launch a new child as 30 children were already running; consider increasing max_children for this application higher than 30 in the OpenSRF configuration if this message occurs frequently |
13:38 |
jeffdavis |
^ 34 million of those messages in the logs, which chewed up all available disk |
13:39 |
csharp |
the lack of a threadtrace probably means it's something utility-ish |
13:40 |
csharp |
oh wait - are PG connections saturated too? |
13:40 |
jeffdavis |
good question, I'll check |
13:43 |
jeffdavis |
The first error message was at 08:36:40; the only util process likely to be active is hold_targeter.pl which runs at 5,25,45 on that server. |
13:43 |
jeffdavis |
There was some Vandelay activity happening around that time. |
13:50 |
jeffdavis |
I can't rule out PG connection saturation, but the dev db server has max_connections=1000 (which historically was enough for production), usage wouldn't have been that high, and other servers sharing the cluster don't seem to have had any issues. |
13:51 |
csharp |
might be DB I/O depending on the storage configuration |
14:01 |
berick |
specifically regarding the log spewing, this should fix it. noting here in case this continues to be a thing. |
14:01 |
berick |
https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/berick/lpxxx-c-backlog-speedbump |
14:01 |
berick |
(though probably should fix it regardless) |
14:01 |
jeffdavis |
thanks Bill, we'll try that out |
14:05 |
jeffdavis |
FWIW, we have two test servers, upgrade1 (~3.5beta) and upgrade2 (~ current rel_3_5). They are configured pretty much identically, use the same db cluster, etc. upgrade1 has been in use for weeks without issues; upgrade2 ran into the max_children problem within a day or two of first use. |
14:05 |
Dyrcona |
I've seen the fine generator and/or hold targeter go ballistic when there's too much to do, like running the fine generator on old data when circulations haven't happened in a month, so there's a ton of overdue and lost stuff. I had that fill the disk on a training server. IIRC cstore messages like those. |
14:07 |
Dyrcona |
I may have mentioned it in here. I'm pretty sure that I did.... |
14:07 |
jeffdavis |
It could be just a transient usage or db issue, but given how similar the environments are, I am taking a look at the EG/OpenSRF code differences to see if there are any plausible causes there. We're upgrading this weekend so I want to know if the upgrade2 version has important bugs. :) |
14:08 |
jeffdavis |
upgrade2's db snapshot is only 3 days old and hold targeter/fine generator have been running consistently. |
14:08 |
|
jihpringle joined #evergreen |
14:09 |
jeffdavis |
I've definitely seen problems starting up holds/fines on servers with stale data, but I don't think it's the case here. |
14:10 |
Dyrcona |
Well, I had systemd-journald lose its mind because of a a lack of cstore drones on April 27. I know the training server ran out of disk space in the past couple of months and it looked like the fine generator. I didn't mention that here, it turns out. |
14:14 |
Dyrcona |
FWIW: Wednesday, April 1 it looks like our training server crashed because the database partition filled up while the fine generator and other things were running. We had an internal email discussion about it, which is probably why I thought I had mentioned it here. |
14:15 |
Dyrcona |
FWIW, I left Evergreen running on a vm over a weekend at the "undocuments" log level that logs everything, and it ran out of disk space with nothing going on once. |
14:16 |
Dyrcona |
bleh.. undocumented.... |
15:27 |
|
mantis2 left #evergreen |
15:36 |
sandbergja |
Has anybody used autorenewal in conjunction with hard due dates? If not, any potential gotchas? |
15:36 |
sandbergja |
(we are thinking about autorenewal, but want everything back before Summer starts) |
15:38 |
mmorgan |
sandbergja: We have used autorenewal with hard due dates. What happened was this: |
15:39 |
mmorgan |
With a hard date of May 13, as a random example, all items are due. |
15:40 |
mmorgan |
Autorenewal runs on May 13, and the renewed items are now due - on May 13. |
15:40 |
mmorgan |
At that point they have fallen off the autorenewal train, and won't get renewed again. |
15:41 |
sandbergja |
mmorgan: that's a little goofy! But I guess it would have the effect we want |
15:41 |
sandbergja |
Do patrons get a confusing successful autorenewal message? |
15:41 |
mmorgan |
Yes, it seemed like a gotcha at first, but the alternative would be that things would autorenewed past your hard date which is not ideal. |
15:42 |
mmorgan |
Yes, they did get messages that indicated success, but the same due date. You could add language to the message about the final due date. |
15:45 |
sandbergja |
mmorgan: that really helps! Thanks so much! |
15:45 |
mmorgan |
YW, good luck! |
15:46 |
jihpringle |
we're going to be coming up to this too soon so thanks sandbergja for asking the question and mmorgan for the answer :) |
16:14 |
sandbergja |
mmorgan++ |
16:34 |
sandbergja |
mmorgan: just to check, when you say they've fallen off the autorenewal train, does that mean that they just happen to have hit their limit of autorenewals? Or is there something specific about autorenewals that makes them fall off the train? |
16:38 |
mmorgan |
sandbergja: It relates to the processing delay in the autorenewal trigger, not the autorenew limit. Our items autorenew early in the morning the day they are due. |
16:38 |
mmorgan |
Because the item gets the due date of May 13 when it is autorenewed on May 13, it won't be picked up by the autorenew trigger that runs on May 14th. |
16:45 |
|
jihpringle joined #evergreen |
16:54 |
mmorgan |
Regarding my question earlier about best hold selection sort order. There was a hold policy affecting my test. Changing it to Traditional worked as expected. |
17:02 |
|
mmorgan left #evergreen |
18:00 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
18:06 |
|
drigney joined #evergreen |
18:24 |
|
rjackson_isl_hom joined #evergreen |
18:51 |
|
rjackson_isl_hom joined #evergreen |
18:55 |
|
jvwoolf joined #evergreen |
19:41 |
|
book` joined #evergreen |
19:56 |
|
Christineb joined #evergreen |
20:14 |
|
book` joined #evergreen |
21:35 |
|
sandbergja joined #evergreen |
22:31 |
|
sandbergja joined #evergreen |