| 09:44 |
|
sandbergja joined #evergreen |
| 09:50 |
Dyrcona |
I tried searching Lp with fixed committed bugs enabled, and nothing came up there either. |
| 09:51 |
Dyrcona |
Production could have a patch applied from elsewhere... |
| 09:51 |
Dyrcona |
I suppose that I can test that locally. I do have the list of commits applied. |
| 09:54 |
sandbergja |
Has anybody run into a situation where logging in to the staff client starts to fail most (but not all) of the time? The browser console complains about open-ils.circ.offline.data.retrieve failing because it can't connect to the server to get stat cats. This is with redis, recent main with some bonus commits that don't seem related, and ubuntu |
| 09:54 |
sandbergja |
jammy. osrf_control --diagnostic reported that everything was happy. osrf_control --restart-all fixed the issue (at least for now...) |
| 09:54 |
sandbergja |
The error: https://gist.githubusercontent.com/sandbergja/c938c9a63286bededec1bb5a9e18a162/raw/bfed9be845978f0a21b518c53fe896d056585394/log%2520entries%2520-%2520login%2520issue |
| 11:20 |
|
Christineb joined #evergreen |
| 11:21 |
jeffdavis |
We've run into the lost db connection thing before too. I've opened bug 2098507 as a feature request. |
| 11:21 |
pinesol |
Launchpad bug 2098507 in Evergreen "Respond gracefully when database connection is lost" [Undecided,New] https://launchpad.net/bugs/2098507 |
| 12:52 |
Dyrcona |
Well, applying all of the Pg commits that I could find did not help my issue with the XML parser error on the test vm. |
| 13:08 |
Dyrcona |
I am tempted to delete the VM and rebuild it. |
| 13:29 |
Dyrcona |
And that's what I'm doing. |
| 13:57 |
* Dyrcona |
is about to find out if OpenSRF 3.3.2 works with Evergreen 3.7.4. |
| 14:12 |
csharp_ |
we are upgrading from 3.12 to 3.14 on Saturday |
| 14:12 |
csharp_ |
oh |
| 14:12 |
csharp_ |
hmm - maybe we can save ourselves the pain |
| 14:13 |
jeffdavis |
So simply omitting 1416 would in theory mean that an upgraded system is a match for a clean install. |
| 14:13 |
jeffdavis |
I haven't tested this yet *at all* so please don't rely on me being right about this :) |
| 14:13 |
jeffdavis |
(and good luck with the upgrade either way!) |
| 14:13 |
csharp_ |
thanks! |
| 14:14 |
Dyrcona |
I think you can skip it without danger. |
| 14:17 |
csharp_ |
Dyrcona++ # gonna skip it! |
| 14:25 |
Dyrcona |
Ugh. Half of a script just blew up because i did one of the steps early. |
| 14:27 |
Dyrcona |
Well, not quite half. |
| 14:28 |
Dyrcona |
I think I should be able to test OpenSRF now. |
| 14:30 |
Bmagic |
does the SIP 98/99 keepalive refresh the memcached authtoken? |
| 14:30 |
Dyrcona |
Bmagic: Not sure. |
| 14:50 |
jeffdavis |
Bmagic: Are you using SIPServer or SIP2Mediator? |
| 10:40 |
Dyrcona |
I resized the Evergreen logo and that's all I care about right now. |
| 10:40 |
redavis |
++ |
| 10:41 |
Dyrcona |
This is a junked up dev system, so I probably have settings to do or something else is broken. |
| 10:44 |
Dyrcona |
I'll through this on the actual test system and see what happens. It's only a template change. |
| 10:45 |
* sleary |
*always* forgets the self-check URL |
| 10:47 |
redavis |
Hah. I didn't remember it at all. Just went to docs and found it there. |
| 10:47 |
* redavis |
has NO memory for such things |
| 16:02 |
Dyrcona |
I think you need OpenSRF main for Redis to actually work. |
| 16:02 |
csharp_ |
OpenSRF main & Evergreen 3.14.3 |
| 16:02 |
Dyrcona |
OK, that combo should be OK. |
| 16:02 |
csharp_ |
yeah, I have it working on our test servers, but I did a whole bunch of manual sh... stuff |
| 16:03 |
Dyrcona |
Ah, that'll do it. |
| 16:24 |
Bmagic |
Dyrcona: I have a git question that I think you will probably know the answer to. I have a freshly cloned Evergreen repo and the working repo added. git fetch --all has been ran on it. But when I issue this command: git show 60fee76598effde1b800fdad8ed23eed03c853a9 - bad object |
| 16:24 |
Bmagic |
but I can see it in gitolite https://git.evergreen-ils.org/?p=working/Evergreen.git;a=commit;h=60fee76598effde1b800fdad8ed23eed03c853a9 |
| 10:47 |
Bmagic |
Another one was Load PO recors from MARC, with an especially beefy Order, like 900 records |
| 10:48 |
Bmagic |
It was those two tickets that put me over the edge, and changed all production back to ejabberd for the time being |
| 10:49 |
Bmagic |
csharp_: but my issue is likely compounded by docker. Though, it's interesting (and a bit of a relief) that it's not only containers that make the problem OOM/resource issue bubble up |
| 10:49 |
berick |
was about to ask.. in any event, offer stands to log in to test systems and poke around |
| 10:50 |
Bmagic |
berick++ # I will likely take you up on that as soon as I can get something that breaks reliably. Oh and that's another thing: my test machine worked fine with the same MARC export test. It was only a problem on the production machine. Likely due to hardware differences |
| 10:52 |
Bmagic |
ejabberd being slower might be a feature, not a bug, lol |
| 10:52 |
berick |
it's not unheard of |
| 10:52 |
Bmagic |
redis makes the hardware burn bright like a star until it burns out |
| 15:07 |
abneiman |
the other item is still pending |
| 15:08 |
shulabramble |
#action awaiting word from gmcharlt concerning bug 2051946 |
| 15:08 |
pinesol |
Launchpad bug 2051946 in Evergreen "institute a Git commit message template" [Wishlist,New] https://launchpad.net/bugs/2051946 - Assigned to Galen Charlton (gmc) |
| 15:08 |
shulabramble |
#action gmchartl will reach out about POeditor git stuff prior to the next point release cycle |
| 15:09 |
shulabramble |
#action sleary and sandbergja will report progress on test writing wiki pages next month |
| 15:09 |
sleary |
some small progress has been made! more to come. |
| 15:09 |
shulabramble |
sleary++ sandbergja++ |
| 15:10 |
shulabramble |
#topic sleary will make a new LP tag denoting bugs that involve string changes |
| 16:06 |
jeff |
(I haven't confirmed that theory yet, but noticed the symptoms and the deviation from expected-by-me behavior.) |
| 16:22 |
berick |
jeff: hm, i wonder if that's the real source of the issue. at a glance, the Angular is not looking at the 'context' on the eg.print.config.$context settings |
| 16:22 |
berick |
which makes sense, since it's redundant info |
| 16:23 |
jeff |
Testing further, I can end up with the same "JSON value contains unexpected default context on non-default setting names", but I can't make it cause a problem with context confusion on a test machine, so the issue on the problem machine may be subtly different. |
| 16:24 |
berick |
the context can be overridden per template + workstation via workstation setting |
| 16:24 |
jeff |
I wonder if Hatch (java) was confused by the presence of, and deletion of, a duplicate printer name. |
| 16:24 |
jeff |
*nod* |
| 16:43 |
csharp_ |
@band add The Way of the Dojo |
| 16:43 |
pinesol |
csharp_: Band 'The Way of the Dojo' added to list |
| 16:45 |
mmorgan |
berick: Ok, thanks! So is that is still set in the client via workstation settings - print templates - force printer context? |
| 16:47 |
jeff |
testing on this machine shows that the change to forced context did not take effect until I logged out of, then back into the client. it's possible also that it's cached server-side and I'm just lucking out by logging out then back in. |
| 16:48 |
jeff |
haven't poked any further yet. |
| 16:48 |
jeff |
we don't use this interface to print hold pull lists, and a library recently started making use of it, so I haven't experienced the sharp edges before now. :-) |
| 16:49 |
berick |
i believe there's also an issue sharing workstation setting values across angular and angjs within the same login session |
| 16:49 |
berick |
what with the caching |
| 16:49 |
jeff |
also, judging by the 419 error on printing a larger hold pull list, I'm guessing that the entire set of data is round tripped from the server to the client then back up to the server for formatting and back down to the client to print? |
| 09:36 |
* redavis |
attempts to shift the earworm |
| 09:36 |
redavis |
I mean, it is...but is it? |
| 09:36 |
abneiman |
hahahaha |
| 09:37 |
redavis |
I never stop thinkin' about tomorrow (except during meditation and mindfulness practices) |
| 09:39 |
* redavis |
is now thinking about cults and ashrams and gurus and psychodelics and testing documentation. |
| 09:39 |
redavis |
You're welcome |
| 09:41 |
* redavis |
also pictures Gandalf wearing a Flavor Flav neck chain that has a big gold pendant with the letters "ADHD" instead of a clock as he stands on the bridge over the precipice screaming at the "other things" my brain wants to think about screaming "YOU SHALL NOT PASS." |
| 09:42 |
* redavis |
might have had too much caffeine and too little sleep. |
| 09:42 |
redavis |
maybe |
| 09:42 |
redavis |
or...it could just be Thursday |
| 09:45 |
berick |
i was on a flight w/ Favor Flav once. sadly none of those things happened. |
| 09:46 |
redavis |
I mean, just being on a flight with him is something of note. even without LOTR integrations. |
| 09:49 |
Dyrcona |
S'OK. I woke up with "Harden my Heart" by Quarterflash playing in my mind this morning. |
| 15:54 |
berick |
Bmagic: i exported 3k bibs from :dev docker image, no issues :\ |
| 15:55 |
berick |
docker runing within a VM on my desktop. not using docker desktop. |
| 16:48 |
Bmagic |
berick: go 32k |
| 16:49 |
Bmagic |
I haven't tried it on my test docker machine yet, but darn it, I thought that would be a good test to find an error |
| 16:52 |
berick |
oh i thought you said 2k was the breaking point |
| 17:06 |
Bmagic |
It did seem to not work at 2k. I think we want it to break so we can figure out why |
| 17:07 |
Bmagic |
I'm off until Monday but if I have some time, I'll see if I can get some more information |
| 09:55 |
Dyrcona |
So, I'm just throwing that out there for no real reason. |
| 10:43 |
Bmagic |
interesting |
| 10:43 |
Bmagic |
Still using Redis? |
| 10:45 |
Bmagic |
Along the Redis lines: I think I found a new reliable test. exporting bibs via CSV. On Redis, in a container, it dies pretty reliably with CSV's of more than 2k bibs. Which is nice, because it was looking like the only way to test the Redis issue was with high volumes of AT events, which means I'd need production data. |
| 10:46 |
Dyrcona |
No, not using Redis. That's on the production utility vm with Ejabberd. |
| 10:47 |
berick |
Bmagic: are things crashing or just not working? |
| 10:48 |
Bmagic |
berick: I would like to answer that more thoroughly, but right now I don't have anything captured |
| 10:49 |
csharp_ |
@band add Trash Folder |
| 10:49 |
pinesol |
csharp_: Band 'Trash Folder' added to list |
| 10:49 |
Bmagic |
Dyrcona: interesting |
| 10:50 |
Bmagic |
Dyrcona: I don't expect any issues with Redis outside of a container. berick: same for you, if you're testing the bib export idea on a VM, I don't expect an issue. |
| 10:50 |
Dyrcona |
Also, FWIW, acquisitions does NOT have that issue on another machine running Redis with the same code, almost the same data. The big difference, the database is on a different machine. On the vm with the loading issues, the db runs on the same "hardware." |
| 10:50 |
csharp_ |
berick: we're testing 3.14 + RediSRF and it's going well so far, but we did have an OOM kill for OpenSRF when someone was working with buckets apparently |
| 10:50 |
berick |
Trash Folder's first song: I Gotta Bad Feeling About This |
| 10:50 |
csharp_ |
I don't have much data about that yet, though |
| 10:50 |
csharp_ |
berick++ |
| 16:02 |
berick |
bleh, forgot docker within lxd does not play well. starting over w/ qemu/kvm |
| 16:05 |
Bmagic |
Docker in docker? |
| 16:06 |
Dyrcona |
You can run vms in vms, but I wouldn't recommend it.... |
| 16:07 |
Dyrcona |
I have an Ubuntu Desktop image that I run in lxd when I want to test the Evergreen client with a different timezone or whatever. |
| 16:14 |
csharp_ |
make sure to upgrade rsync everybody! https://learn.cisecurity.org/webmail/799323/2363710855/710d456a01842242223c665efab6fe7e542b968b56c3fa24c95977779ba85770 |
| 16:14 |
csharp_ |
Ubuntu version: https://ubuntu.com/security/CVE-2024-12084 |
| 16:14 |
pinesol |
The LearnDash LMS plugin for WordPress is vulnerable to Sensitive Information Exposure in all versions up to, and including, 4.10.2 via API. This makes it possible for unauthenticated attackers to obtain access to quiz questions. (http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2024-1208) |
| 09:53 |
mantis1 |
Does anyone use Long Overdue? |
| 10:02 |
* csharp_ |
raises hand on behalf of PINES |
| 10:03 |
csharp_ |
mantis1: I'm in a meeting, but ask your question and I'll see if I can help |
| 10:06 |
mantis1 |
I'm trying to do some testing with it. We never tried it before. I just wanted to see if anyone has an example of their AT for it. |
| 10:06 |
mantis1 |
I guess that isn't really a question |
| 10:08 |
|
Christineb joined #evergreen |
| 10:17 |
Bmagic |
mantis1: the first thing to know about Long Overdues is that it's the exact same thing as LOST. It just results in a different status for the item. Some people like the word "Long Overdue" instead of LOST. In it's current implementation, it's not possible to also have LOST AT setup. |
| 10:18 |
mantis1 |
ok so it's either one or the other is what you're saying Bmagic? |
| 10:18 |
mantis1 |
Bmagic++ |
| 10:20 |
Bmagic |
I created a branch that allows the LOST AT to react to a previously-marked Long Overdue. It's a bit crusty, and none of our libraries are using it at this time: bug 1331174 |
| 10:20 |
pinesol |
Launchpad bug 1331174 in Evergreen "Long Overdue processing needs org unit settings separate from Lost Processing" [Wishlist,Confirmed] https://launchpad.net/bugs/1331174 |
| 10:21 |
mantis1 |
Bmagic: I can look at it since we're testing now |
| 10:21 |
mantis1 |
thank you though! |
| 10:21 |
Bmagic |
mantis1++ |
| 10:29 |
mantis1 |
Should the docs be changed to reflect this? There just isn't anything in there right now. |
| 10:29 |
mantis1 |
*that at least explains the situation with the bug |
| 15:07 |
abneiman |
he has been poked |
| 15:08 |
shulabramble |
#action abneiman will reach out to gmcharlt concerning bug 2051946 and POEditor git integration |
| 15:08 |
pinesol |
Launchpad bug 2051946 in Evergreen "institute a Git commit message template" [Wishlist,New] https://launchpad.net/bugs/2051946 - Assigned to Galen Charlton (gmc) |
| 15:08 |
shulabramble |
#action sleary and sandbergja will report progress on test writing wiki pages next month |
| 15:08 |
sleary |
carry over, please! sorry to keep kicking this down the road |
| 15:08 |
shulabramble |
that's fine! I mistyped so it's already kicked :) |
| 15:09 |
sleary |
:) |
| 15:09 |
shulabramble |
#topic sleary will make a new LP tag denoting bugs that involve string changes |
| 15:09 |
sleary |
whoops, that slipped my mind. Will do it this afternoon |
| 15:09 |
shulabramble |
#action sleary will make a new LP tag denoting bugs that involve string changes |
| 15:09 |
shulabramble |
sleary++ |
| 15:09 |
shulabramble |
#topic sandbergja and sleary will revisit feasibility of automated testing for string changes |
| 15:10 |
sandbergja |
also please carry over! |
| 15:10 |
sleary |
second verse, same as the first |
| 15:10 |
shulabramble |
#action sandbergja and sleary will revisit feasibility of automated testing for string changes |
| 15:10 |
shulabramble |
#topic abneiman will poll concerning moving the developer's meeting from IRC to a different platform |
| 15:10 |
abneiman |
ask not for whom the can kicks |
| 15:10 |
abneiman |
it kicks for thee! |
| 15:10 |
terranm |
lol |
| 10:59 |
|
sandbergja joined #evergreen |
| 11:16 |
|
smayo joined #evergreen |
| 12:01 |
|
jihpringle joined #evergreen |
| 13:15 |
Bmagic |
berick: back to my Acq order load issue: I switched my test VM back to ejabberd and it imported fine |
| 13:16 |
Bmagic |
So that sucks |
| 13:19 |
Dyrcona |
Bmagic: What, exactly, did you do to test the ace order load? We might want to give it a try on our dev vm. |
| 13:19 |
Bmagic |
I'm formulating a theory that it's Docker+redis that is introducing [something]. I couldn't find anything in the logs that helped me, but I think I'll run it again on Redis and capture the whole osrf log and post it. Before I do that, I wonder if you have any other logs that I should be looking at or perhaps changing settings for more verbocity? |
| 13:21 |
Bmagic |
I have a marc file with 891 marc records taylored for a certain library's shelving locations/Circ lib/etc. And it loads the order fine, but the bib records aren't created nor matched. I ran through the exact* same steps with the same exact data on the same exact database, and success on ejabberd and fail on redis. A little more info: loaded the same file via MARC import (not acq) works fine. |
| 13:22 |
Dyrcona |
Bmagic: Thanks. That sounds like something we could test. |
| 13:23 |
Bmagic |
I'm not sure the MARC file matters all that much. When I cut the file down to smaller chunks, it worked on Redis. So there's another clue |
| 13:24 |
Dyrcona |
Yeah. Number of records might matter. |
| 13:26 |
Bmagic |
For a little while I was thinking that there was a record somewhere in the file that was tripping the code. Splitting and splitting and splitting the file down to 10 record chunks eventually started giving me successful loads. Once I got it down to 10, I figured the other 10 half would fail because the 20 record file with both halves failed, but my surprised, both halves worked |
| 13:26 |
Bmagic |
3.14 |
| 13:27 |
Bmagic |
which is an interesting point: The ticket was sent to us when the system was still on 3.13 with Redis. And in the interum, we've upgraded the system to 3.14. The version of Evergreen doesn't seem to make a difference here. I'm seeing the same issue on 3.14 |
| 13:28 |
Dyrcona |
OK. I |
| 13:28 |
Dyrcona |
I'll ask our catalogers what they've tested in acquisitions. |
| 13:28 |
Dyrcona |
I'm pretty sure someone was looking at acq the other day. |
| 13:28 |
Bmagic |
Dyrcona++ |
| 13:49 |
Dyrcona |
I asked, and it looks like someone is trying it out right now. |
| 15:16 |
berick |
Bmagic: for the imports, there are no errors or crashes anywhere? |
| 15:17 |
Bmagic |
nadda |
| 15:17 |
Bmagic |
you'd think it would have an error somewhere. That's why I was asking if I needed to look in a different log, something for Redis specifically maybe |
| 15:19 |
berick |
there is a redis log, and journalctl |
| 15:20 |
berick |
i have a hunch redis itself is fine. it's pretty battle tested. more likely some other moving part. |
| 15:22 |
Bmagic |
the clues are: size and docker. In both cases, it was large action triggers (at least in that case we got a SEGFAULT), and large MARC file in ACQ PO order load. One guess is that Redis is bumping up against some kind of docker artificial ceiling in terms of resources, and when it's denied, it silently breaks |
| 15:23 |
Bmagic |
I had a problem several years ago having to do with ulimits. This feels similiar |
| 15:24 |
Dyrcona |
Bmagic: I saw an open-ils.trigger drone using 3.3GB briefly yesterday. I think it was hanging around from Tuesday. |
| 15:25 |
Bmagic |
maybe ejabberd did a better job of normalizing the bursts? |
| 15:26 |
Bmagic |
probably the interplay with the OpenSRF code |
| 15:26 |
Dyrcona |
You might be on to something. It could be that ejabberd spread the work out more. I haven't tried quantifying it, but it looks like fewer drones are used with Redis. |
| 15:27 |
Dyrcona |
I have no data for that, but it would be worth figuring out how to test it. |
| 15:32 |
Dyrcona |
I suppose we could test it with two comparable systems and then run things like the hold targeter and fine generator with the same settings on the same data. |
| 15:37 |
berick |
i can promise higher/faster throughput for redis vs. ejabberd, even more so for larger messages |
| 15:39 |
berick |
Bmagic: fwiw, imported 1k records via vandelay with no issue on a local non-docker vm. not a 1-to-1 experiment, but still |
| 15:40 |
Bmagic |
oddly enough, vandelay works, it's acq that fails |
| 16:56 |
berick |
thanks Bmagic |
| 17:05 |
|
mmorgan left #evergreen |
| 17:29 |
berick |
Bmagic: 1k recs into a new PO worked ok here. bibs and line items. records only, though, no copies |
| 17:29 |
Bmagic |
that's pretty huge, just 891 for my test. But good to know. Though, I am loading copy info via vendor tag mapping |
| 08:34 |
|
mmorgan joined #evergreen |
| 08:38 |
|
dguarrac joined #evergreen |
| 09:10 |
|
Dyrcona joined #evergreen |
| 09:13 |
Dyrcona |
Going to run all of the same "tests" as yesterday on the dev machine. Running autorenew now, and it looks good so far. I realized that yesterday's may have very little to do since it was probably doing things due on New Year's Day. (There shouldn't have been any, but there almost always are a couple of things due on closed days because of manually adjusted due dates.) |
| 09:13 |
Dyrcona |
...or missing closed dates. |
| 10:19 |
* Dyrcona |
waits on autorenew to finish. I'm going to schedule the hold_targeter at X:30 and the fine generator at (X+1):00 once the autorenew finishes. This will reflect how they're run from cron, so they might overlap. |
| 10:19 |
Dyrcona |
Depending on how that goes, I may also reduce the hold_targeter parallel number from 6 to 3 or 4. |
| 10:43 |
Dyrcona |
hm.. maybe I should have skipped the autorenew. Looks like there's a bit over 32,700 of them, and its completed 13,501 of them so far. |
| 11:00 |
Dyrcona |
I estimate it will take another 2 hours for the autorenew to complete, so I could schedule the other things for 1:30 and 2:00 pm. They don't run while autorenew is running normally, so it wouldn't be a fair test to start them now. |
| 11:07 |
Dyrcona |
Scheduled a run-pending a/t runner at 11:30. It does run while autorenew is going, so we'll see what happens. It may push things over the top. |
| 11:13 |
|
sandbergja joined #evergreen |
| 12:04 |
|
mmorgan left #evergreen |
| 12:30 |
|
collum joined #evergreen |
| 13:08 |
Dyrcona |
Looks like the action_trigger_runners did OK. No out of memory errors while they both were running. Looks like they're done. |
| 13:11 |
Dyrcona |
I might try one of the aspen jobs later. |
| 13:28 |
Dyrcona |
It looks like someone is testing offline circulation, and I see redis is using 30BM of memory at the same time |
| 14:22 |
Dyrcona |
Bmagic: Are you around today? I've got a question about setting sysctl entries in the ansible playbook for the dev/test system. |
| 14:43 |
* Dyrcona |
wonders if the systcl ansible module is available. |
| 15:02 |
Dyrcona |
Well, I'll be back on Thursday. |
| 19:20 |
|
kworstell-isl joined #evergreen |
| 09:21 |
Dyrcona |
As for redis crashing, I suspect, but do not know, that it is likely making an additional copy of the data as messages get passed around, thereby increasing the memory pressure. |
| 09:22 |
Dyrcona |
BTW, beam.smp running Ejabberd on the one system is using 4224472K virt. That's 4GB! |
| 09:23 |
Dyrcona |
Redis doesn't even show up when top -c is sorted by memory consumption when idle. |
| 09:26 |
Dyrcona |
On the mostly unused test system Redis has VSZ of 74520K or 74MB. On the one where I just restarted it, the VSZ is 76460K. |
| 09:28 |
Dyrcona |
Bmagic || berick : See my monologue above if you get this notification. If not, it's in the logs for future reference. |
| 09:29 |
Dyrcona |
So, I'm going to rung the fine generator and keep an eye on top sorted by memory consumption. |
| 09:32 |
Dyrcona |
s/rung/run/ but you get it. |
| 09:37 |
Dyrcona |
OK. Going to restart the VM, first. |
| 09:39 |
Dyrcona |
Huh. had to 'kill' simple2zoom twice. Once for each process. Guess they're not spawned in the same process group? |
| 09:41 |
Dyrcona |
Gotta remember to start websocketd. It's not setup with systemd. |
| 09:41 |
Dyrcona |
Think I'll do these tests on a couple of other vms to see what difference there is with memory use between Redis and Ejabberd if any. |
| 09:42 |
berick |
Dyrcona: read it, but i'm unclear what the takeaway is |
| 09:42 |
Dyrcona |
Well, I'm not sure there's a takeaway, yet, except that storage drones use a lot of memory regardless. |
| 09:44 |
Dyrcona |
I'm in the process of ferreting out where the high memory use happens, and I'm logging it publicly rather than keeping private notes. It's a quiet day in IRC, so I don't think anyone will really mind much. If they do, know one has said anything, yet. |
| 11:43 |
Dyrcona |
Doing anything with the database cranks up the buffers/cache. |
| 11:44 |
jeff |
expected, since you're reading from disk and the kernel is caching that read data. |
| 11:45 |
Dyrcona |
Yes. |
| 11:47 |
Dyrcona |
My testing points the finger away from redis and more towards running everything on 1 machine. |
| 11:47 |
jeff |
how much physical memory and how much swap does this machine/vm/whatnot have? |
| 11:47 |
jeff |
can you share the output of this query somewhere? SELECT name, setting, unit, source FROM pg_settings WHERE name ~ 'mem'; |
| 11:48 |
Dyrcona |
It says 30G of memory, no swap. (I've considered adding swap, but haven't bothered because it would only delay the inevitable.) |
| 15:46 |
Dyrcona |
For certain definitions of crazy. :) |
| 15:46 |
Dyrcona |
No, I mostly agree. |
| 15:47 |
Dyrcona |
Looks like they've been actually doing more stuff based on the run time. |
| 15:49 |
Dyrcona |
I plan to set shared buffers back to the default in a bit and restart Pg. I'll run the same tests again tomorrow. If that's all good, I'll enable all but our Aspen scheduled jobs on Thursday. |
| 15:52 |
Dyrcona |
Looking at the overcommit documentation, adding the swap may have helped in ways other than just being an overflow space. It looks like swap amount is used in the overcommit calculations for the heuristic and don't overcommit methods. |
| 15:54 |
Dyrcona |
I'm curious to look at that part of the kernel code, but not so curious that I'm going to do it right now. |
| 16:00 |
Dyrcona |
Without doing anything buffers/cached dropped by half just now. I think the cache pressure setting is also helping. |
| 09:01 |
|
dguarrac joined #evergreen |
| 09:04 |
|
Dyrcona joined #evergreen |
| 09:05 |
Dyrcona |
Hm. Had issues with a couple of overnight jobs on the dev system, but the router is still running. |
| 09:06 |
Dyrcona |
No problems with Autorenew on the other test vm, either. |
| 09:07 |
Dyrcona |
Got this in the email from the badge_score_generator: [auth] WRONGPASS invalid username-password pair, at /usr/share/perl5/Redis.pm line 311. |
| 09:08 |
Dyrcona |
Haven't seen that one before. All the services are running OK. |
| 09:09 |
Dyrcona |
refresh carousels reported "Unable to bootstrap client for requests." When I saw that I thought it might have been because of a crash. |
| 10:21 |
Dyrcona |
I did recently switch the fine generator to run at 30 minutes past the hour because we were getting regular emails from the monitor about the number of processes running when it and the hold targeter started at the same time. |
| 10:22 |
Dyrcona |
We're running password reset every minute, and the pending a/t running every half hour. |
| 10:23 |
|
redavis joined #evergreen |
| 10:23 |
Dyrcona |
We have a SQL sending data to Bywater for our test Aspen installation every 5 minutes. That could be using a lot of RAM..... |
| 10:25 |
Dyrcona |
The person testing acq yesterday said that they did experience Evergreen stop working at that time. They also encountered Lp 2086786. I don't think it's related. |
| 10:25 |
pinesol |
Launchpad bug 2086786 in Evergreen 3.13 "Acq: Multi-Branch Libraries Can't See All Their Branch Funds in Invoices" [High,Fix committed] https://launchpad.net/bugs/2086786 |
| 10:26 |
Dyrcona |
Someone else mentioned Evergreen "crashing" at that time. |
| 10:29 |
berick |
Dyrcona: mind checking periodically to see if Redis memory usage is slowly climbing? if that's the case, I have a script that will help |
| 10:35 |
Dyrcona |
And, that was easy. |
| 10:38 |
Dyrcona |
Hm... it's 85MB of VM, rss is 13MB at the moment. |
| 10:41 |
Dyrcona |
FWIW, I'm monitoring it this way: ps -o etime,time,rss,size,vsize 15997 [that's the actual PID] |
| 10:45 |
Dyrcona |
I have not had it crash on the other test server, but I don't usually run cron, and it's almost never used. |
| 10:46 |
Dyrcona |
It should be quiet next week. I'll try recompiling OpenSRF and Evergreen with debug options turned on. That might help, particularly if I can figure out how to reproduce a crash. |
| 10:57 |
Dyrcona |
Think I'll do that on my other test vm, too, if I haven't already. I'll do that after I test the 3.12.10 tarball. |
| 11:03 |
|
sandbergja joined #evergreen |
| 11:19 |
|
Christineb joined #evergreen |
| 12:24 |
|
kworstell-isl joined #evergreen |
| 09:24 |
mmorgan |
Dyrcona: re: open-ils.cat.biblio.record.metadata.retrieve, suspect it's to access the creator and editor. |
| 09:29 |
berick |
Dyrcona: pcrud to metabib.metarecord and metabib.metarecord_source_map will do it. |
| 09:35 |
Dyrcona |
mmorgan++ berick++ |
| 09:36 |
Dyrcona |
I want to get the metarecord info for Aspen, so I'm going to see if the mods is enough. I think something like that goes on with copy holds, but I'm now in the middle of updating my test/dev vms. |
| 09:39 |
Dyrcona |
Hm.. looks like my db server is still booting or failed to boot all the way... |
| 09:39 |
|
sandbergja joined #evergreen |
| 09:42 |
Dyrcona |
Yeah. Power was off. I'm pretty sure I did 'reboot' and not 'shutdown' |
| 12:47 |
Dyrcona |
berick++ that worked! |
| 12:48 |
Dyrcona |
JBoyer: I think you're correct when it comes to pcrud and cstore, now that I've played with pcrud a bit. I don't usually try this from srfsh. |
| 12:53 |
Dyrcona |
This particular record's mods field is null anyway. |
| 12:55 |
Dyrcona |
I wonder if we missed an update somewhere? My test database has 1,623,928 metarecord entries with null mods, and 591 where it isn't null. |
| 12:58 |
Dyrcona |
Production has similar numbers. |
| 13:00 |
|
BDorsey_ joined #evergreen |
| 13:21 |
|
jvwoolf joined #evergreen |
| 13:59 |
csharp_ |
redavis: I'm here - sorry, was in meetings all morning and just back from lunch |
| 14:00 |
redavis |
No worries at all. csharp_, we're just waiting on 3.13.7 and testing tarballs |
| 14:00 |
csharp_ |
onky donky |
| 14:22 |
|
kworstell-isl joined #evergreen |
| 14:32 |
|
BDorsey__ joined #evergreen |
| 15:32 |
|
mantis left #evergreen |
| 15:38 |
Dyrcona |
Likely Autorenew blew it up. |
| 15:38 |
Dyrcona |
That starts at 2:30 am. |
| 15:47 |
Dyrcona |
I have a way to test that hypothesis. I have another machine with half the ram set up to use Redis. It has the same dump loaded. If I run autorenew right now, it should get almost the same data to process. |
| 15:48 |
Bmagic |
Dyrcona++ |
| 15:51 |
Dyrcona |
And, they're off! |
| 15:51 |
redavis |
sandbergja++ |
| 15:07 |
shulabramble |
Okay, then |
| 15:07 |
shulabramble |
#action gmcharlt - create a Git commit message type and update bug 2051946 |
| 15:07 |
pinesol |
Launchpad bug 2051946 in Evergreen "institute a Git commit message template" [Wishlist,New] https://launchpad.net/bugs/2051946 - Assigned to Galen Charlton (gmc) |
| 15:07 |
shulabramble |
#action waiting on gmcharlt for access to POEditor for git integration |
| 15:08 |
shulabramble |
#info sleary and sandbergja will report progress on test writing wiki pages next month |
| 15:08 |
sleary |
I've been out sick; please carry forward |
| 15:08 |
sleary |
(sorry sandbergja!) |
| 15:08 |
shulabramble |
Aww! Hope you're feeling better. |
| 15:08 |
sandbergja |
no problem! |
| 15:08 |
redavis |
Just throwing the agenda here for new people - https://wiki.evergreen-ils.org/doku.php?id=dev:meetings:2024-12-10 |
| 15:09 |
sandbergja |
one note: outside of what sleary and I were working on, I figured out how to run perl unit tests with the perl debugger |
| 15:09 |
shulabramble |
redavis++ thanks, I've had an interesting day and that slipped my mind. |
| 15:09 |
sandbergja |
and added my notes here in case they are useful for anyone: https://wiki.evergreen-ils.org/doku.php?id=dev:testing:debugging_perl_unit_tests |
| 15:09 |
shulabramble |
sandbergja++ |
| 15:09 |
sleary |
sandbergja++ |
| 15:09 |
berick |
sandbergja++ |
| 15:12 |
sleary |
whoops! on it |
| 15:13 |
shulabramble |
#action sleary will make a new LP tag denoting bugs that involve string changes |
| 15:13 |
redavis |
sleary++ |
| 15:13 |
shulabramble |
#topic revisit feasibility of automated testing for string changes |
| 15:13 |
shulabramble |
berick++ sleary++ |
| 15:15 |
shulabramble |
anything on this? |
| 15:16 |
abneiman |
tbh I don't recall whose action item that is |
| 15:16 |
shulabramble |
neither do i. |
| 15:16 |
|
ajarterburn joined #evergreen |
| 15:18 |
sleary |
sandbergja is that something we can add to our QA list? |
| 15:18 |
sandbergja |
Sure! Sounds fun |
| 15:18 |
shulabramble |
#action sandbergja and sleary will revisit feasibility of automated testing for string changes |
| 15:18 |
sleary |
sandbergja++ |
| 15:19 |
shulabramble |
sandbergja++ sleary++ |
| 15:19 |
sandbergja |
sleary++ |
| 15:21 |
abneiman |
shulabramble: yes I have your emial, thanks |
| 15:21 |
shulabramble |
zipping through this agenda at top speed. |
| 15:22 |
Dyrcona |
Lp 2055796 |
| 15:22 |
pinesol |
Launchpad bug 2055796 in Evergreen "Have github actions run pgtap tests for us" [Medium,Fix committed] https://launchpad.net/bugs/2055796 |
| 15:22 |
Dyrcona |
That's done. |
| 15:22 |
berick |
yeah |
| 15:23 |
shulabramble |
dyrcona++ berick++ |
| 15:37 |
redavis |
phasefx++ |
| 15:37 |
sandbergja |
phasefx++ |
| 15:38 |
phasefx |
Hand holding, pep talk, chocolate... |
| 15:38 |
sandbergja |
I can build and test a tarball |
| 15:38 |
shulabramble |
sandbergja++ |
| 15:38 |
* mmorgan |
can probably help with point releases next week |
| 15:38 |
redavis |
I'll call a meeting for the point releases once some more spots are filled out. So, email imminent today or tomorrow for that. |
| 11:07 |
Dyrcona |
berick: Is that in your rust repo on GH? |
| 11:07 |
berick |
if it crashes, there will be a file + line number + crash message in the kernel logs. none of this C segfault hunting madness. |
| 11:08 |
berick |
Dyrcona: yes |
| 11:08 |
Dyrcona |
I'll see about testing it on of my vms. I'll put it on the one that I plan to use with a test Aspen instance. That should give it a work out. |
| 11:09 |
berick |
Dyrcona++ |
| 11:10 |
Bmagic |
berick++ # This is production, albeit a single cronjob forked off to it's own machine for analyzing. I'm willing to use it as a test bed to hunt down this bug. Because, IMO, the only real way to do it is on production |
| 11:13 |
Dyrcona |
I think I'll get Aspen hooked up, first, and try the Rust router later. I'm already using Redis on that vm. I still have to figure out how to hook Aspen up to Evergreen and get the indexer going. I'll probably pop into Slack to ask questions. |
| 11:15 |
* Dyrcona |
*mumbles* Right. Lp bug about camera not working... |
| 11:17 |
Dyrcona |
Wait a minute... Maybe I read that update wrong and it autoremoved the drivers? |
| 10:14 |
Dyrcona |
I don't think setting that to 0 was ever meant to be fine. Looks like an unreported bug was fixed. |
| 10:22 |
|
stephengwills joined #evergreen |
| 10:25 |
mmorgan |
Bmagic++ |
| 10:48 |
Dyrcona |
Well, I have Aspen running, but it's not talking to a test Evergreen instance, yet. I suppose I could get answers in Slack, but I'll peruse the docs and the code first. |
| 10:51 |
|
kworstell-isl joined #evergreen |
| 11:26 |
pinesol |
News from commits: LP#2089419: fix parsing of offset/limit in C-based DB search methods <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=0458ae01ef6a84734efb7232bc1cdaf479dd3be8> |
| 11:53 |
|
jihpringle joined #evergreen |
| 12:14 |
jeffdavis |
If I'm reading Open-ILS/src/c-apps/oils_auth_internal.c correctly, seems like 0 is the default timeout value if auth.staff_timeout is unset. |
| 12:14 |
jeffdavis |
Does the login issue occur on a 3.14 system where auth.staff_timeout is unset? |
| 12:15 |
Bmagic |
here's another finding: if I was using a browser that already had a workstation registered, I could login! (even with the timeout setting set to 0) - but if I needed to to the workstation registration dance, I would be in a login loop |
| 12:15 |
Bmagic |
jeffdavis: I didn't test that |
| 12:16 |
Bmagic |
I'll try it on a test machine |
| 12:16 |
mmorgan |
jeffdavis: I found that the issue does NOT happen when auth.staff_timeout is unset, but it would be great to see confirmation. |
| 12:17 |
Bmagic |
mmorgan jeffdavis: confirmed, no value (no row) in actor.org_unit_setting, I can still login |
| 12:18 |
Dyrcona |
jeffdavis: What branch are you looking at/ |
| 12:19 |
Bmagic |
FYI: I'm on main OpenSRF with the redis stuff merged |
| 12:19 |
|
jihpringle joined #evergreen |
| 12:19 |
Bmagic |
however, I've also tested on opensrf 3.3.2 (ejabberd) and I had the same problem/solution with the setting |
| 12:19 |
* mmorgan |
tested on 3.14.0 |
| 12:20 |
Bmagic |
mmorgan: can you confirm that a zero setting will result in the login loop? |
| 12:21 |
mmorgan |
Bmagic: Yes, I did observe that on 3.14.0, so can confirm. |
| 12:21 |
Bmagic |
and! you have to be sure you're not using a browser that has the workstation registered |
| 12:36 |
Dyrcona |
0 seems like a bad value for an infinite time out when the setting is meant to be an interval. -1 is probably better. |
| 12:36 |
Bmagic |
thinking about it more, the magic trick was probably having "route to" in the query string on the login page. Where the route to was an eg page and* the login page was eg |
| 12:37 |
Dyrcona |
I suspect you're overthinking it, and the problem is likely simpler than that. some eg2 code is getting the time out in seconds and treating 0 as 0, not as infinity. |
| 12:38 |
Bmagic |
I get it, yes, eg2 JS is treating 0 differently than eg. I understand that. I'm pontificating and going over in my head the various ways I was able to overcome the issue during testing. And I think it was when I was able to never touch eg2 during the auth process |
| 12:38 |
Dyrcona |
OK> |
| 12:41 |
Dyrcona |
The core eg2 auth service looks like it gets the timeout from the backend. |
| 12:43 |
Dyrcona |
Staff component looks like it uses the auth service. |
| 15:13 |
Dyrcona |
pinesol: No, but I did try turning it off and back on again. |
| 15:13 |
pinesol |
Dyrcona: http://images.cryhavok.org/d/1291-1/Computer+Rage.gif |
| 15:13 |
Dyrcona |
Yeah, pretty much. |
| 15:14 |
Dyrcona |
Oh, right. I was in the middle of installing some backports on a 3.7.4 test installation. |
| 15:15 |
Dyrcona |
I had just run the 'chown' command and when it seemed to take too long, that's when I knew something was up. :) |
| 15:23 |
Dyrcona |
Sometimes, you just gotta run desktop-clear.... |
| 15:30 |
|
mantis left #evergreen |
| 10:59 |
csharp_ |
redis and sip2-mediator and reports oh my! |
| 11:00 |
csharp_ |
"wellsir..." <slaps the roof of EG 3.12> "this Evergreen version has served us very well" |
| 11:01 |
berick |
and we (kcls) have moved on a bit from the stock EG mediator code, which would put you on the bleeding edge csharp_ fyi |
| 11:02 |
csharp_ |
I'll keep SIPServer around to crank up if needed |
| 11:03 |
csharp_ |
also, willing to test bleeding edge at this stage of our upgrade (targeted for mid February) |
| 11:07 |
berick |
in particular, the rust variant forgoes HTTP communication, a key facet of the original design, and ops for a more evergreen-centric opensrf client approach (i.e. direct redis communication). in the end, it's just more efficient and in some ways more practical. |
| 11:12 |
berick |
that is to say, though the http approach has its own benefits, there are no examples of it in use in the wild. |
| 11:12 |
berick |
that i'm aware of |
| 11:27 |
Bmagic |
csharp_: I feel your pain on the bot battle |
| 11:34 |
|
Dyrcona joined #evergreen |
| 11:37 |
mmorgan |
berick: jeff: Looking at bug 2076921. jeff's testing looks favorable, any reason not to roll it out? Our pc support tech is seeing changes being rolled out that disable the current extension. |
| 11:37 |
pinesol |
Launchpad bug 2076921 in Evergreen "Hatch: Chrome Extension Requires Redevelopment" [High,Confirmed] https://launchpad.net/bugs/2076921 - Assigned to Jeff Godin (jgodin) |
| 11:40 |
berick |
mmorgan: i see no reason to delay any longer |
| 11:41 |
|
sandbergja joined #evergreen |