IRC log for #evergreen, 2026-01-08

All times shown according to the server's local time.

Time	Nick	Message
00:03		JBoyer joined #evergreen
08:35		mmorgan joined #evergreen
09:20		Dyrcona joined #evergreen
09:38	Dyrcona	Interesting....With no parallel for react and collect in open-ils.trigger neither Ejabberd nor Redis had issues with the mark item lost events last night. I'll leave it running like this for at least 1 more day to see if that changes.
09:40	Dyrcona	Apparently, neither did the auto-renewals. I think there's a bug with parallel event processing.
09:45	csharp_	the only times I've seen trouble with parallel event processing ended up being a RAM issue
09:46	csharp_	(as in, insufficient RAM for what I was trying to do)
09:46	Dyrcona	csharp_: I've been able to reliably reproduce it on VMs with 16GB of RAM and basically all that's going is the mark item lost process.
09:47	csharp_	looks like our A/T server has 24GB right now
09:47	Dyrcona	If it takes > 16GB of RAM to process 1,000 lost items, then something is seriously wrong with the memory management of our code.
09:48	csharp_	we have 3 parallel procs configured
09:48	Dyrcona	That's what I had: 3 collect and 3 react.
09:48	csharp_	same
09:49	csharp_	four CPU cores: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
09:49	csharp_	that's the VM host's processor
09:49	csharp_	(fwiw)
09:52	Dyrcona	csharp_: 16GB RAM, 16 CPUs for the vms where I'm doing my testing.
09:53	* csharp_	nods
09:53	Dyrcona	Bmagic will have to say how the production docker image that does the mark item lost is configured, but I'm considering disabling parallel processing for now.
09:53	Dyrcona	But, seriously, I will repeat: something is very wrong if it takes more than 16GB of RAM to process 300 to 1,000 events in parallel.
09:55	Dyrcona	Yes, 16: <vcpu placement='static'>16</vcpu>
09:55	csharp_	also I've seen it fail in the collection stage when there are too many fleshed items or whatever
09:55	csharp_	1 patron, 50 lost items or something
09:55	Dyrcona	I've seen that, but it was rare.
09:55	csharp_	though that's usually circ notice related, especially 1 day overdue or something
09:56	Dyrcona	Usually some "magic" card that the libraries use for some purpose: book club, tracking lost things in some way because 4 ILSes ago didn't track lost items, stuff like that.
09:56	csharp_	ah yes
09:57	csharp_	"we don't need no stinking buckets! we have this handy card!"
09:57	Dyrcona	I hate when I see workflows from 30 years ago being used.
09:57	csharp_	at my first public library job we had a fake card named MS. E PIECES
09:58	Dyrcona	I recall seeing the notebook with the steps for some process at a library, and they were complaining that Evergreen was broken. I told them, those steps didn't work in Horizon either. (This was at MVLC.)
09:59	Dyrcona	I also "like" notes saying things like "number in the field above is..." Really? You think that's gonna be accurate after an update, never mind a migration?
10:00	Dyrcona	Anyway, I came here to bash (heh) Perl and Evergreen, not staff..... :)
10:00	csharp_	@decide bash or perl or evergreen
10:00	pinesol	csharp_: That's a tough one...
10:02	Dyrcona	Our production utility vm that still runs auto-renewals with parallel processing has 32GB of RAM and 8 CPUs. I guess CWMARS is just too big for our hardware.
10:02	Dyrcona	We used to have 2 vms configured that way for utility stuff. We still had problems, but not as frequently.
10:03	Dyrcona	Well, one of our production utility servers was actual hardware, IIRC.
10:06	Dyrcona	@decide lua or lisp
10:06	pinesol	Dyrcona: That's a tough one...
10:06	Dyrcona	pinesol: Why so indecisive today?
10:06	pinesol	Dyrcona: What do you mean? An African or European swallow?
10:09	Dyrcona	"It's not a question of 'ow 'e grips it."
10:09	Dyrcona	Anyway, what I'm gonna do? Rewrite open-ils.trigger in Rust?
10:34	Dyrcona	csharp_: I can't find any evidence of the OOM Killer running on either vm where I've been testing this.
10:36	csharp_	hmm
10:40	Bmagic	Dyrcona: your findings agree with my suspicions. The root issue is likely memory management, and when running the same Evergreen code over the top of Redis, it shows the issue more often. Probably* because ejabberd is slower and gives time for garbage collection?
10:53	Dyrcona	Yeah, but I'm not finding any OOM Killer messages in the logs nor with journalctl. I'm still looking.
10:55	Dyrcona	Anyway, think I'm just going to disable the parallel settings for open-ils.trigger. Guess I'll open a ticket with our hosting vendor. :)
11:13	Bmagic	I don't see OOM either when the issue arises. Couple of theories: ulimit maybe, or Redis automatically starts dropping children when it starves.
11:18	Dyrcona	ulimit: unlimited
11:18	jmurray-isl	EGIN's utility server uses 32 CPUs and 32 GB RAM (though we probably don't exceed 16GB), with 8 collect and 8 react. We typically get a 75% cpu load only during morning notice processing, but it doesn't typically last long.
11:19	Dyrcona	We're just cursed. :)
11:24	Dyrcona	We could be blowing out the 8MB stack limit. Not sure how I'd find that, segmentation faults?
11:27	Dyrcona	Nope. No segfaults either.
12:02		jihpringle joined #evergreen
12:23		jihpringle joined #evergreen
12:46		mantis1 joined #evergreen
13:14		mantis1 joined #evergreen
15:12		jihpringle joined #evergreen
15:21		mantis1 left #evergreen
17:08		mmorgan left #evergreen
20:22		beardicus joined #evergreen
22:28		beardicus4 joined #evergreen
22:38		jeff joined #evergreen