Evergreen ILS Website

IRC log for #evergreen, 2026-01-08

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
00:03 JBoyer joined #evergreen
08:35 mmorgan joined #evergreen
09:20 Dyrcona joined #evergreen
09:38 Dyrcona Interesting....With no parallel for react and collect in open-ils.trigger neither Ejabberd nor Redis had issues with the mark item lost events last night. I'll leave it running like this for at least 1 more day to see if that changes.
09:40 Dyrcona Apparently, neither did the auto-renewals. I think there's a bug with parallel event processing.
09:45 csharp_ the only times I've seen trouble with parallel event processing ended up being a RAM issue
09:46 csharp_ (as in, insufficient RAM for what I was trying to do)
09:46 Dyrcona csharp_: I've been able to reliably reproduce it on VMs with 16GB of RAM and basically all that's going is the mark item lost process.
09:47 csharp_ looks like our A/T server has 24GB right now
09:47 Dyrcona If it takes > 16GB of RAM to process 1,000 lost items, then something is seriously wrong with the memory management of our code.
09:48 csharp_ we have 3 parallel procs configured
09:48 Dyrcona That's what I had: 3 collect and 3 react.
09:48 csharp_ same
09:49 csharp_ four CPU cores: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
09:49 csharp_ that's the VM host's processor
09:49 csharp_ (fwiw)
09:52 Dyrcona csharp_: 16GB RAM, 16 CPUs for the vms where I'm doing my testing.
09:53 * csharp_ nods
09:53 Dyrcona Bmagic will have to say how the production docker image that does the mark item lost is configured, but I'm considering disabling parallel processing for now.
09:53 Dyrcona But, seriously, I will repeat: something is very wrong if it takes more than 16GB of RAM to process 300 to 1,000 events in parallel.
09:55 Dyrcona Yes, 16: <vcpu placement='static'>16</vcpu>
09:55 csharp_ also I've seen it fail in the collection stage when there are too many fleshed items or whatever
09:55 csharp_ 1 patron, 50 lost items or something
09:55 Dyrcona I've seen that, but it was rare.
09:55 csharp_ though that's usually circ notice related, especially 1 day overdue or something
09:56 Dyrcona Usually some "magic" card that the libraries use for some purpose: book club, tracking lost things in some way because 4 ILSes ago didn't track lost items, stuff like that.
09:56 csharp_ ah yes
09:57 csharp_ "we don't need no stinking buckets!  we have this handy card!"
09:57 Dyrcona I hate when I see workflows from 30 years ago being used.
09:57 csharp_ at my first public library job we had a fake card named MS. E PIECES
09:58 Dyrcona I recall seeing the notebook with the steps for some process at a library, and they were complaining that Evergreen was broken. I told them, those steps didn't work in Horizon either. (This was at MVLC.)
09:59 Dyrcona I also "like" notes saying things like "number in the field above is..." Really? You think that's gonna be accurate after an update, never mind a migration?
10:00 Dyrcona Anyway, I came here to bash (heh) Perl and Evergreen, not staff..... :)
10:00 csharp_ @decide bash or perl or evergreen
10:00 pinesol csharp_: That's a tough one...
10:02 Dyrcona Our production utility vm that still runs auto-renewals with parallel processing has 32GB of RAM and 8 CPUs. I guess CWMARS is just too big for our hardware.
10:02 Dyrcona We used to have 2 vms configured that way for utility stuff. We still had problems, but not as frequently.
10:03 Dyrcona Well, one of our production utility servers was actual hardware, IIRC.
10:06 Dyrcona @decide lua or lisp
10:06 pinesol Dyrcona: That's a tough one...
10:06 Dyrcona pinesol: Why so indecisive today?
10:06 pinesol Dyrcona: What do you mean? An African or European swallow?
10:09 Dyrcona "It's not a question of 'ow 'e grips it."
10:09 Dyrcona Anyway, what I'm gonna do? Rewrite open-ils.trigger in Rust?
10:34 Dyrcona csharp_: I can't find any evidence of the OOM Killer running on either vm where I've been testing this.
10:36 csharp_ hmm
10:40 Bmagic Dyrcona: your findings agree with my suspicions. The root issue is likely memory management, and when running the same Evergreen code over the top of Redis, it shows the issue more often. Probably* because ejabberd is slower and gives time for garbage collection?
10:53 Dyrcona Yeah, but I'm not finding any OOM Killer messages in the logs nor with journalctl. I'm still looking.
10:55 Dyrcona Anyway, think I'm just going to disable the parallel settings for open-ils.trigger. Guess I'll open a ticket with our hosting vendor. :)
11:13 Bmagic I don't see OOM either when the issue arises. Couple of theories: ulimit maybe, or Redis automatically starts dropping children when it starves.
11:18 Dyrcona ulimit: unlimited
11:18 jmurray-isl EGIN's utility server uses 32 CPUs and 32 GB RAM (though we probably don't exceed 16GB), with 8 collect and 8 react. We typically get a 75% cpu load only during morning notice processing, but it doesn't typically last long.
11:19 Dyrcona We're just cursed. :)
11:24 Dyrcona We could be blowing out the 8MB stack limit. Not sure how I'd find that, segmentation faults?
11:27 Dyrcona Nope. No segfaults either.
12:02 jihpringle joined #evergreen
12:23 jihpringle joined #evergreen
12:46 mantis1 joined #evergreen
13:14 mantis1 joined #evergreen
15:12 jihpringle joined #evergreen
15:21 mantis1 left #evergreen
17:08 mmorgan left #evergreen
20:22 beardicus joined #evergreen
22:28 beardicus4 joined #evergreen
22:38 jeff joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat