Evergreen ILS Website

IRC log for #evergreen, 2020-05-13

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
06:00 pinesol News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
07:20 rjackson_isl_hom joined #evergreen
07:27 Dyrcona joined #evergreen
07:48 rfrasur joined #evergreen
08:31 mmorgan joined #evergreen
08:39 mantis1 joined #evergreen
08:47 jvwoolf joined #evergreen
08:54 jvwoolf1 joined #evergreen
09:45 jvwoolf joined #evergreen
09:55 mantis1 left #evergreen
09:57 dbwells_ joined #evergreen
09:59 mantis1 joined #evergreen
10:31 dbwells joined #evergreen
10:35 rfrasur joined #evergreen
10:36 sandbergja joined #evergreen
10:54 pinesol [evergreen|Bill Erickson] LP1837656 Org proximity admin disable org filter - <http://git.evergreen-ils.org/?p=​Evergreen.git;a=commit;h=07971c7>
11:13 berick joined #evergreen
12:04 jihpringle joined #evergreen
12:09 mmorgan I'm trying to change a best hold selection sort order on a test system and it's not working.
12:10 mmorgan We were using a sort order which always sent holds home, and I just changed it to Traditional, and it's still sending holds home.
12:11 mmorgan I've restarted services, ran autogen, am I missing something? Traditional should capture a hold where an item lands, right?
12:22 mantis2 joined #evergreen
12:22 Dyrcona mmorgan: Have you run the hold targeter since making the change?
12:23 sandbergja joined #evergreen
12:24 mmorgan Dyrcona: I have manually retargeted the individual holds I'm testing, and confirmed that the non-owned item I'm checking is in the hold copy map for the hold at the pickup point.
12:25 Dyrcona TBH, I don't know how that works any more. It's too complicated.
12:26 mmorgan Agreed, it is complicated :)
13:08 sandbergja joined #evergreen
13:10 jeffdavis My test server with rel_3_5 and a few backports ran out of disk space due to 34159161 "Could not launch a new child" messages in the logs.
13:11 Dyrcona jeffdavis: Yes, that happens.
13:12 jeffdavis It didn't happen on a beta server with the same configuration.
13:13 sandbergja joined #evergreen
13:13 Dyrcona jeffdavis: Could be different work loads.
13:15 jeffdavis Not likely in this case.
13:18 jeffdavis Sorry to be terse, I'm on a call and also looking for more info on causes of the server issue.
13:30 berick if i'm reading the code right, there appears to be no speedbump between attempts to pass a request to a child when the request is read from the backlog queue and there was no child available to process it.
13:31 berick which could result in spewing that warning message
13:35 berick looks like the Perl code adds a 1 second delay for that same scenario
13:37 csharp so the logs filled up the disk? or something filled it up and the logs are showing you what?
13:38 berick i read that as the logs filled up the disk
13:38 * csharp 5p34ks l337 at 13:37
13:38 csharp yeah, that's what I landed on :-)
13:38 jeffdavis open-ils.cstore 2020-05-13 08:49:38 [WARN:25277:osrf_prefork.c:1051:] Could not launch a new child as 30 children were already running; consider increasing max_children for this application higher than 30 in the OpenSRF configuration if this message occurs frequently
13:38 jeffdavis ^ 34 million of those messages in the logs, which chewed up all available disk
13:39 csharp the lack of a threadtrace probably means it's something utility-ish
13:40 csharp oh wait - are PG connections saturated too?
13:40 jeffdavis good question, I'll check
13:43 jeffdavis The first error message was at 08:36:40; the only util process likely to be active is hold_targeter.pl which runs at 5,25,45 on that server.
13:43 jeffdavis There was some Vandelay activity happening around that time.
13:50 jeffdavis I can't rule out PG connection saturation, but the dev db server has max_connections=1000 (which historically was enough for production), usage wouldn't have been that high, and other servers sharing the cluster don't seem to have had any issues.
13:51 csharp might be DB I/O depending on the storage configuration
14:01 berick specifically regarding the log spewing, this should fix it.  noting here in case this continues to be a thing.
14:01 berick https://git.evergreen-ils.org/?p=worki​ng/OpenSRF.git;a=shortlog;h=refs/heads​/user/berick/lpxxx-c-backlog-speedbump
14:01 berick (though probably should fix it regardless)
14:01 jeffdavis thanks Bill, we'll try that out
14:05 jeffdavis FWIW, we have two test servers, upgrade1 (~3.5beta) and upgrade2 (~ current rel_3_5). They are configured pretty much identically, use the same db cluster, etc. upgrade1 has been in use for weeks without issues; upgrade2 ran into the max_children problem within a day or two of first use.
14:05 Dyrcona I've seen the fine generator and/or hold targeter go ballistic when there's too much to do, like running the fine generator on old data when circulations haven't happened in a month, so there's a ton of overdue and lost stuff. I had that fill the disk on a training server. IIRC cstore messages like those.
14:07 Dyrcona I may have mentioned it in here. I'm pretty sure that I did....
14:07 jeffdavis It could be just a transient usage or db issue, but given how similar the environments are, I am taking a look at the EG/OpenSRF code differences to see if there are any plausible causes there. We're upgrading this weekend so I want to know if the upgrade2 version has important bugs. :)
14:08 jeffdavis upgrade2's db snapshot is only 3 days old and hold targeter/fine generator have been running consistently.
14:08 jihpringle joined #evergreen
14:09 jeffdavis I've definitely seen problems starting up holds/fines on servers with stale data, but I don't think it's the case here.
14:10 Dyrcona Well, I had systemd-journald lose its mind because of a a lack of cstore drones on April 27. I know the training server ran out of disk space in the past couple of months and it looked like the fine generator. I didn't mention that here, it turns out.
14:14 Dyrcona FWIW: Wednesday, April 1 it looks like our training server crashed because the database partition filled up while the fine generator and other things were running. We had an internal email discussion about it, which is probably why I thought I had mentioned it here.
14:15 Dyrcona FWIW, I left Evergreen running on a vm over a weekend at the "undocuments" log level that logs everything, and it ran out of disk space with nothing going on once.
14:16 Dyrcona bleh.. undocumented....
15:27 mantis2 left #evergreen
15:36 sandbergja Has anybody used autorenewal in conjunction with hard due dates?  If not, any potential gotchas?
15:36 sandbergja (we are thinking about autorenewal, but want everything back before Summer starts)
15:38 mmorgan sandbergja: We have used autorenewal with hard due dates. What happened was this:
15:39 mmorgan With a hard date of May 13, as a random example, all items are due.
15:40 mmorgan Autorenewal runs on May 13, and the renewed items are now due - on May 13.
15:40 mmorgan At that point they have fallen off the autorenewal train, and won't get renewed again.
15:41 sandbergja mmorgan: that's a little goofy!  But I guess it would have the effect we want
15:41 sandbergja Do patrons get a confusing successful autorenewal message?
15:41 mmorgan Yes, it seemed like a gotcha at first, but the alternative would be that things would autorenewed past your hard date which is not ideal.
15:42 mmorgan Yes, they did get messages that indicated success, but the same due date. You could add language to the message about the final due date.
15:45 sandbergja mmorgan: that really helps!  Thanks so much!
15:45 mmorgan YW, good luck!
15:46 jihpringle we're going to be coming up to this too soon so thanks sandbergja for asking the question and mmorgan for the answer :)
16:14 sandbergja mmorgan++
16:34 sandbergja mmorgan: just to check, when you say they've fallen off the autorenewal train, does that mean that they just happen to have hit their limit of autorenewals?  Or is there something specific about autorenewals that makes them fall off the train?
16:38 mmorgan sandbergja: It relates to the processing delay in the autorenewal trigger, not the autorenew limit. Our items autorenew early in the morning the day they are due.
16:38 mmorgan Because the item gets the due date of May 13 when it is autorenewed on May 13, it won't be picked up by the autorenew trigger that runs on May 14th.
16:45 jihpringle joined #evergreen
16:54 mmorgan Regarding my question earlier about best hold selection sort order. There was a hold policy affecting my test. Changing it to Traditional worked as expected.
17:02 mmorgan left #evergreen
18:00 pinesol News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
18:06 drigney joined #evergreen
18:24 rjackson_isl_hom joined #evergreen
18:51 rjackson_isl_hom joined #evergreen
18:55 jvwoolf joined #evergreen
19:41 book` joined #evergreen
19:56 Christineb joined #evergreen
20:14 book` joined #evergreen
21:35 sandbergja joined #evergreen
22:31 sandbergja joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat