IRC log for #evergreen, 2020-05-13

All times shown according to the server's local time.

Time	Nick	Message
06:00	pinesol	News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
07:20		rjackson_isl_hom joined #evergreen
07:27		Dyrcona joined #evergreen
07:48		rfrasur joined #evergreen
08:31		mmorgan joined #evergreen
08:39		mantis1 joined #evergreen
08:47		jvwoolf joined #evergreen
08:54		jvwoolf1 joined #evergreen
09:45		jvwoolf joined #evergreen
09:55		mantis1 left #evergreen
09:57		dbwells_ joined #evergreen
09:59		mantis1 joined #evergreen
10:31		dbwells joined #evergreen
10:35		rfrasur joined #evergreen
10:36		sandbergja joined #evergreen
10:54	pinesol	[evergreen\|Bill Erickson] LP1837656 Org proximity admin disable org filter - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=07971c7>
11:13		berick joined #evergreen
12:04		jihpringle joined #evergreen
12:09	mmorgan	I'm trying to change a best hold selection sort order on a test system and it's not working.
12:10	mmorgan	We were using a sort order which always sent holds home, and I just changed it to Traditional, and it's still sending holds home.
12:11	mmorgan	I've restarted services, ran autogen, am I missing something? Traditional should capture a hold where an item lands, right?
12:22		mantis2 joined #evergreen
12:22	Dyrcona	mmorgan: Have you run the hold targeter since making the change?
12:23		sandbergja joined #evergreen
12:24	mmorgan	Dyrcona: I have manually retargeted the individual holds I'm testing, and confirmed that the non-owned item I'm checking is in the hold copy map for the hold at the pickup point.
12:25	Dyrcona	TBH, I don't know how that works any more. It's too complicated.
12:26	mmorgan	Agreed, it is complicated :)
13:08		sandbergja joined #evergreen
13:10	jeffdavis	My test server with rel_3_5 and a few backports ran out of disk space due to 34159161 "Could not launch a new child" messages in the logs.
13:11	Dyrcona	jeffdavis: Yes, that happens.
13:12	jeffdavis	It didn't happen on a beta server with the same configuration.
13:13		sandbergja joined #evergreen
13:13	Dyrcona	jeffdavis: Could be different work loads.
13:15	jeffdavis	Not likely in this case.
13:18	jeffdavis	Sorry to be terse, I'm on a call and also looking for more info on causes of the server issue.
13:30	berick	if i'm reading the code right, there appears to be no speedbump between attempts to pass a request to a child when the request is read from the backlog queue and there was no child available to process it.
13:31	berick	which could result in spewing that warning message
13:35	berick	looks like the Perl code adds a 1 second delay for that same scenario
13:37	csharp	so the logs filled up the disk? or something filled it up and the logs are showing you what?
13:38	berick	i read that as the logs filled up the disk
13:38	* csharp	5p34ks l337 at 13:37
13:38	csharp	yeah, that's what I landed on :-)
13:38	jeffdavis	open-ils.cstore 2020-05-13 08:49:38 [WARN:25277:osrf_prefork.c:1051:] Could not launch a new child as 30 children were already running; consider increasing max_children for this application higher than 30 in the OpenSRF configuration if this message occurs frequently
13:38	jeffdavis	^ 34 million of those messages in the logs, which chewed up all available disk
13:39	csharp	the lack of a threadtrace probably means it's something utility-ish
13:40	csharp	oh wait - are PG connections saturated too?
13:40	jeffdavis	good question, I'll check
13:43	jeffdavis	The first error message was at 08:36:40; the only util process likely to be active is hold_targeter.pl which runs at 5,25,45 on that server.
13:43	jeffdavis	There was some Vandelay activity happening around that time.
13:50	jeffdavis	I can't rule out PG connection saturation, but the dev db server has max_connections=1000 (which historically was enough for production), usage wouldn't have been that high, and other servers sharing the cluster don't seem to have had any issues.
13:51	csharp	might be DB I/O depending on the storage configuration
14:01	berick	specifically regarding the log spewing, this should fix it. noting here in case this continues to be a thing.
14:01	berick	https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/berick/lpxxx-c-backlog-speedbump
14:01	berick	(though probably should fix it regardless)
14:01	jeffdavis	thanks Bill, we'll try that out
14:05	jeffdavis	FWIW, we have two test servers, upgrade1 (~3.5beta) and upgrade2 (~ current rel_3_5). They are configured pretty much identically, use the same db cluster, etc. upgrade1 has been in use for weeks without issues; upgrade2 ran into the max_children problem within a day or two of first use.
14:05	Dyrcona	I've seen the fine generator and/or hold targeter go ballistic when there's too much to do, like running the fine generator on old data when circulations haven't happened in a month, so there's a ton of overdue and lost stuff. I had that fill the disk on a training server. IIRC cstore messages like those.
14:07	Dyrcona	I may have mentioned it in here. I'm pretty sure that I did....
14:07	jeffdavis	It could be just a transient usage or db issue, but given how similar the environments are, I am taking a look at the EG/OpenSRF code differences to see if there are any plausible causes there. We're upgrading this weekend so I want to know if the upgrade2 version has important bugs. :)
14:08	jeffdavis	upgrade2's db snapshot is only 3 days old and hold targeter/fine generator have been running consistently.
14:08		jihpringle joined #evergreen
14:09	jeffdavis	I've definitely seen problems starting up holds/fines on servers with stale data, but I don't think it's the case here.
14:10	Dyrcona	Well, I had systemd-journald lose its mind because of a a lack of cstore drones on April 27. I know the training server ran out of disk space in the past couple of months and it looked like the fine generator. I didn't mention that here, it turns out.
14:14	Dyrcona	FWIW: Wednesday, April 1 it looks like our training server crashed because the database partition filled up while the fine generator and other things were running. We had an internal email discussion about it, which is probably why I thought I had mentioned it here.
14:15	Dyrcona	FWIW, I left Evergreen running on a vm over a weekend at the "undocuments" log level that logs everything, and it ran out of disk space with nothing going on once.
14:16	Dyrcona	bleh.. undocumented....
15:27		mantis2 left #evergreen
15:36	sandbergja	Has anybody used autorenewal in conjunction with hard due dates? If not, any potential gotchas?
15:36	sandbergja	(we are thinking about autorenewal, but want everything back before Summer starts)
15:38	mmorgan	sandbergja: We have used autorenewal with hard due dates. What happened was this:
15:39	mmorgan	With a hard date of May 13, as a random example, all items are due.
15:40	mmorgan	Autorenewal runs on May 13, and the renewed items are now due - on May 13.
15:40	mmorgan	At that point they have fallen off the autorenewal train, and won't get renewed again.
15:41	sandbergja	mmorgan: that's a little goofy! But I guess it would have the effect we want
15:41	sandbergja	Do patrons get a confusing successful autorenewal message?
15:41	mmorgan	Yes, it seemed like a gotcha at first, but the alternative would be that things would autorenewed past your hard date which is not ideal.
15:42	mmorgan	Yes, they did get messages that indicated success, but the same due date. You could add language to the message about the final due date.
15:45	sandbergja	mmorgan: that really helps! Thanks so much!
15:45	mmorgan	YW, good luck!
15:46	jihpringle	we're going to be coming up to this too soon so thanks sandbergja for asking the question and mmorgan for the answer :)
16:14	sandbergja	mmorgan++
16:34	sandbergja	mmorgan: just to check, when you say they've fallen off the autorenewal train, does that mean that they just happen to have hit their limit of autorenewals? Or is there something specific about autorenewals that makes them fall off the train?
16:38	mmorgan	sandbergja: It relates to the processing delay in the autorenewal trigger, not the autorenew limit. Our items autorenew early in the morning the day they are due.
16:38	mmorgan	Because the item gets the due date of May 13 when it is autorenewed on May 13, it won't be picked up by the autorenew trigger that runs on May 14th.
16:45		jihpringle joined #evergreen
16:54	mmorgan	Regarding my question earlier about best hold selection sort order. There was a hold policy affecting my test. Changing it to Traditional worked as expected.
17:02		mmorgan left #evergreen
18:00	pinesol	News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
18:06		drigney joined #evergreen
18:24		rjackson_isl_hom joined #evergreen
18:51		rjackson_isl_hom joined #evergreen
18:55		jvwoolf joined #evergreen
19:41		book` joined #evergreen
19:56		Christineb joined #evergreen
20:14		book` joined #evergreen
21:35		sandbergja joined #evergreen
22:31		sandbergja joined #evergreen