| 10:59 |
|
sandbergja joined #evergreen |
| 11:16 |
|
smayo joined #evergreen |
| 12:01 |
|
jihpringle joined #evergreen |
| 13:15 |
Bmagic |
berick: back to my Acq order load issue: I switched my test VM back to ejabberd and it imported fine |
| 13:16 |
Bmagic |
So that sucks |
| 13:19 |
Dyrcona |
Bmagic: What, exactly, did you do to test the ace order load? We might want to give it a try on our dev vm. |
| 13:19 |
Bmagic |
I'm formulating a theory that it's Docker+redis that is introducing [something]. I couldn't find anything in the logs that helped me, but I think I'll run it again on Redis and capture the whole osrf log and post it. Before I do that, I wonder if you have any other logs that I should be looking at or perhaps changing settings for more verbocity? |
| 13:21 |
Bmagic |
I have a marc file with 891 marc records taylored for a certain library's shelving locations/Circ lib/etc. And it loads the order fine, but the bib records aren't created nor matched. I ran through the exact* same steps with the same exact data on the same exact database, and success on ejabberd and fail on redis. A little more info: loaded the same file via MARC import (not acq) works fine. |
| 13:22 |
Dyrcona |
Bmagic: Thanks. That sounds like something we could test. |
| 13:23 |
Bmagic |
I'm not sure the MARC file matters all that much. When I cut the file down to smaller chunks, it worked on Redis. So there's another clue |
| 13:24 |
Dyrcona |
Yeah. Number of records might matter. |
| 13:26 |
Bmagic |
For a little while I was thinking that there was a record somewhere in the file that was tripping the code. Splitting and splitting and splitting the file down to 10 record chunks eventually started giving me successful loads. Once I got it down to 10, I figured the other 10 half would fail because the 20 record file with both halves failed, but my surprised, both halves worked |
| 13:26 |
Bmagic |
3.14 |
| 13:27 |
Bmagic |
which is an interesting point: The ticket was sent to us when the system was still on 3.13 with Redis. And in the interum, we've upgraded the system to 3.14. The version of Evergreen doesn't seem to make a difference here. I'm seeing the same issue on 3.14 |
| 13:28 |
Dyrcona |
OK. I |
| 13:28 |
Dyrcona |
I'll ask our catalogers what they've tested in acquisitions. |
| 13:28 |
Dyrcona |
I'm pretty sure someone was looking at acq the other day. |
| 13:28 |
Bmagic |
Dyrcona++ |
| 13:49 |
Dyrcona |
I asked, and it looks like someone is trying it out right now. |
| 15:16 |
berick |
Bmagic: for the imports, there are no errors or crashes anywhere? |
| 15:17 |
Bmagic |
nadda |
| 15:17 |
Bmagic |
you'd think it would have an error somewhere. That's why I was asking if I needed to look in a different log, something for Redis specifically maybe |
| 15:19 |
berick |
there is a redis log, and journalctl |
| 15:20 |
berick |
i have a hunch redis itself is fine. it's pretty battle tested. more likely some other moving part. |
| 15:22 |
Bmagic |
the clues are: size and docker. In both cases, it was large action triggers (at least in that case we got a SEGFAULT), and large MARC file in ACQ PO order load. One guess is that Redis is bumping up against some kind of docker artificial ceiling in terms of resources, and when it's denied, it silently breaks |
| 15:23 |
Bmagic |
I had a problem several years ago having to do with ulimits. This feels similiar |
| 15:24 |
Dyrcona |
Bmagic: I saw an open-ils.trigger drone using 3.3GB briefly yesterday. I think it was hanging around from Tuesday. |
| 15:25 |
Bmagic |
maybe ejabberd did a better job of normalizing the bursts? |
| 15:26 |
Bmagic |
probably the interplay with the OpenSRF code |
| 15:26 |
Dyrcona |
You might be on to something. It could be that ejabberd spread the work out more. I haven't tried quantifying it, but it looks like fewer drones are used with Redis. |
| 15:27 |
Dyrcona |
I have no data for that, but it would be worth figuring out how to test it. |
| 15:32 |
Dyrcona |
I suppose we could test it with two comparable systems and then run things like the hold targeter and fine generator with the same settings on the same data. |
| 15:37 |
berick |
i can promise higher/faster throughput for redis vs. ejabberd, even more so for larger messages |
| 15:39 |
berick |
Bmagic: fwiw, imported 1k records via vandelay with no issue on a local non-docker vm. not a 1-to-1 experiment, but still |
| 15:40 |
Bmagic |
oddly enough, vandelay works, it's acq that fails |
| 16:56 |
berick |
thanks Bmagic |
| 17:05 |
|
mmorgan left #evergreen |
| 17:29 |
berick |
Bmagic: 1k recs into a new PO worked ok here. bibs and line items. records only, though, no copies |
| 17:29 |
Bmagic |
that's pretty huge, just 891 for my test. But good to know. Though, I am loading copy info via vendor tag mapping |
| 08:34 |
|
mmorgan joined #evergreen |
| 08:38 |
|
dguarrac joined #evergreen |
| 09:10 |
|
Dyrcona joined #evergreen |
| 09:13 |
Dyrcona |
Going to run all of the same "tests" as yesterday on the dev machine. Running autorenew now, and it looks good so far. I realized that yesterday's may have very little to do since it was probably doing things due on New Year's Day. (There shouldn't have been any, but there almost always are a couple of things due on closed days because of manually adjusted due dates.) |
| 09:13 |
Dyrcona |
...or missing closed dates. |
| 10:19 |
* Dyrcona |
waits on autorenew to finish. I'm going to schedule the hold_targeter at X:30 and the fine generator at (X+1):00 once the autorenew finishes. This will reflect how they're run from cron, so they might overlap. |
| 10:19 |
Dyrcona |
Depending on how that goes, I may also reduce the hold_targeter parallel number from 6 to 3 or 4. |
| 10:43 |
Dyrcona |
hm.. maybe I should have skipped the autorenew. Looks like there's a bit over 32,700 of them, and its completed 13,501 of them so far. |
| 11:00 |
Dyrcona |
I estimate it will take another 2 hours for the autorenew to complete, so I could schedule the other things for 1:30 and 2:00 pm. They don't run while autorenew is running normally, so it wouldn't be a fair test to start them now. |
| 11:07 |
Dyrcona |
Scheduled a run-pending a/t runner at 11:30. It does run while autorenew is going, so we'll see what happens. It may push things over the top. |
| 11:13 |
|
sandbergja joined #evergreen |
| 12:04 |
|
mmorgan left #evergreen |
| 12:30 |
|
collum joined #evergreen |
| 13:08 |
Dyrcona |
Looks like the action_trigger_runners did OK. No out of memory errors while they both were running. Looks like they're done. |
| 13:11 |
Dyrcona |
I might try one of the aspen jobs later. |
| 13:28 |
Dyrcona |
It looks like someone is testing offline circulation, and I see redis is using 30BM of memory at the same time |
| 14:22 |
Dyrcona |
Bmagic: Are you around today? I've got a question about setting sysctl entries in the ansible playbook for the dev/test system. |
| 14:43 |
* Dyrcona |
wonders if the systcl ansible module is available. |
| 15:02 |
Dyrcona |
Well, I'll be back on Thursday. |
| 19:20 |
|
kworstell-isl joined #evergreen |
| 09:21 |
Dyrcona |
As for redis crashing, I suspect, but do not know, that it is likely making an additional copy of the data as messages get passed around, thereby increasing the memory pressure. |
| 09:22 |
Dyrcona |
BTW, beam.smp running Ejabberd on the one system is using 4224472K virt. That's 4GB! |
| 09:23 |
Dyrcona |
Redis doesn't even show up when top -c is sorted by memory consumption when idle. |
| 09:26 |
Dyrcona |
On the mostly unused test system Redis has VSZ of 74520K or 74MB. On the one where I just restarted it, the VSZ is 76460K. |
| 09:28 |
Dyrcona |
Bmagic || berick : See my monologue above if you get this notification. If not, it's in the logs for future reference. |
| 09:29 |
Dyrcona |
So, I'm going to rung the fine generator and keep an eye on top sorted by memory consumption. |
| 09:32 |
Dyrcona |
s/rung/run/ but you get it. |
| 09:37 |
Dyrcona |
OK. Going to restart the VM, first. |
| 09:39 |
Dyrcona |
Huh. had to 'kill' simple2zoom twice. Once for each process. Guess they're not spawned in the same process group? |
| 09:41 |
Dyrcona |
Gotta remember to start websocketd. It's not setup with systemd. |
| 09:41 |
Dyrcona |
Think I'll do these tests on a couple of other vms to see what difference there is with memory use between Redis and Ejabberd if any. |
| 09:42 |
berick |
Dyrcona: read it, but i'm unclear what the takeaway is |
| 09:42 |
Dyrcona |
Well, I'm not sure there's a takeaway, yet, except that storage drones use a lot of memory regardless. |
| 09:44 |
Dyrcona |
I'm in the process of ferreting out where the high memory use happens, and I'm logging it publicly rather than keeping private notes. It's a quiet day in IRC, so I don't think anyone will really mind much. If they do, know one has said anything, yet. |
| 11:43 |
Dyrcona |
Doing anything with the database cranks up the buffers/cache. |
| 11:44 |
jeff |
expected, since you're reading from disk and the kernel is caching that read data. |
| 11:45 |
Dyrcona |
Yes. |
| 11:47 |
Dyrcona |
My testing points the finger away from redis and more towards running everything on 1 machine. |
| 11:47 |
jeff |
how much physical memory and how much swap does this machine/vm/whatnot have? |
| 11:47 |
jeff |
can you share the output of this query somewhere? SELECT name, setting, unit, source FROM pg_settings WHERE name ~ 'mem'; |
| 11:48 |
Dyrcona |
It says 30G of memory, no swap. (I've considered adding swap, but haven't bothered because it would only delay the inevitable.) |
| 15:46 |
Dyrcona |
For certain definitions of crazy. :) |
| 15:46 |
Dyrcona |
No, I mostly agree. |
| 15:47 |
Dyrcona |
Looks like they've been actually doing more stuff based on the run time. |
| 15:49 |
Dyrcona |
I plan to set shared buffers back to the default in a bit and restart Pg. I'll run the same tests again tomorrow. If that's all good, I'll enable all but our Aspen scheduled jobs on Thursday. |
| 15:52 |
Dyrcona |
Looking at the overcommit documentation, adding the swap may have helped in ways other than just being an overflow space. It looks like swap amount is used in the overcommit calculations for the heuristic and don't overcommit methods. |
| 15:54 |
Dyrcona |
I'm curious to look at that part of the kernel code, but not so curious that I'm going to do it right now. |
| 16:00 |
Dyrcona |
Without doing anything buffers/cached dropped by half just now. I think the cache pressure setting is also helping. |
| 09:01 |
|
dguarrac joined #evergreen |
| 09:04 |
|
Dyrcona joined #evergreen |
| 09:05 |
Dyrcona |
Hm. Had issues with a couple of overnight jobs on the dev system, but the router is still running. |
| 09:06 |
Dyrcona |
No problems with Autorenew on the other test vm, either. |
| 09:07 |
Dyrcona |
Got this in the email from the badge_score_generator: [auth] WRONGPASS invalid username-password pair, at /usr/share/perl5/Redis.pm line 311. |
| 09:08 |
Dyrcona |
Haven't seen that one before. All the services are running OK. |
| 09:09 |
Dyrcona |
refresh carousels reported "Unable to bootstrap client for requests." When I saw that I thought it might have been because of a crash. |
| 10:21 |
Dyrcona |
I did recently switch the fine generator to run at 30 minutes past the hour because we were getting regular emails from the monitor about the number of processes running when it and the hold targeter started at the same time. |
| 10:22 |
Dyrcona |
We're running password reset every minute, and the pending a/t running every half hour. |
| 10:23 |
|
redavis joined #evergreen |
| 10:23 |
Dyrcona |
We have a SQL sending data to Bywater for our test Aspen installation every 5 minutes. That could be using a lot of RAM..... |
| 10:25 |
Dyrcona |
The person testing acq yesterday said that they did experience Evergreen stop working at that time. They also encountered Lp 2086786. I don't think it's related. |
| 10:25 |
pinesol |
Launchpad bug 2086786 in Evergreen 3.13 "Acq: Multi-Branch Libraries Can't See All Their Branch Funds in Invoices" [High,Fix committed] https://launchpad.net/bugs/2086786 |
| 10:26 |
Dyrcona |
Someone else mentioned Evergreen "crashing" at that time. |
| 10:29 |
berick |
Dyrcona: mind checking periodically to see if Redis memory usage is slowly climbing? if that's the case, I have a script that will help |
| 10:35 |
Dyrcona |
And, that was easy. |
| 10:38 |
Dyrcona |
Hm... it's 85MB of VM, rss is 13MB at the moment. |
| 10:41 |
Dyrcona |
FWIW, I'm monitoring it this way: ps -o etime,time,rss,size,vsize 15997 [that's the actual PID] |
| 10:45 |
Dyrcona |
I have not had it crash on the other test server, but I don't usually run cron, and it's almost never used. |
| 10:46 |
Dyrcona |
It should be quiet next week. I'll try recompiling OpenSRF and Evergreen with debug options turned on. That might help, particularly if I can figure out how to reproduce a crash. |
| 10:57 |
Dyrcona |
Think I'll do that on my other test vm, too, if I haven't already. I'll do that after I test the 3.12.10 tarball. |
| 11:03 |
|
sandbergja joined #evergreen |
| 11:19 |
|
Christineb joined #evergreen |
| 12:24 |
|
kworstell-isl joined #evergreen |
| 09:24 |
mmorgan |
Dyrcona: re: open-ils.cat.biblio.record.metadata.retrieve, suspect it's to access the creator and editor. |
| 09:29 |
berick |
Dyrcona: pcrud to metabib.metarecord and metabib.metarecord_source_map will do it. |
| 09:35 |
Dyrcona |
mmorgan++ berick++ |
| 09:36 |
Dyrcona |
I want to get the metarecord info for Aspen, so I'm going to see if the mods is enough. I think something like that goes on with copy holds, but I'm now in the middle of updating my test/dev vms. |
| 09:39 |
Dyrcona |
Hm.. looks like my db server is still booting or failed to boot all the way... |
| 09:39 |
|
sandbergja joined #evergreen |
| 09:42 |
Dyrcona |
Yeah. Power was off. I'm pretty sure I did 'reboot' and not 'shutdown' |
| 12:47 |
Dyrcona |
berick++ that worked! |
| 12:48 |
Dyrcona |
JBoyer: I think you're correct when it comes to pcrud and cstore, now that I've played with pcrud a bit. I don't usually try this from srfsh. |
| 12:53 |
Dyrcona |
This particular record's mods field is null anyway. |
| 12:55 |
Dyrcona |
I wonder if we missed an update somewhere? My test database has 1,623,928 metarecord entries with null mods, and 591 where it isn't null. |
| 12:58 |
Dyrcona |
Production has similar numbers. |
| 13:00 |
|
BDorsey_ joined #evergreen |
| 13:21 |
|
jvwoolf joined #evergreen |
| 13:59 |
csharp_ |
redavis: I'm here - sorry, was in meetings all morning and just back from lunch |
| 14:00 |
redavis |
No worries at all. csharp_, we're just waiting on 3.13.7 and testing tarballs |
| 14:00 |
csharp_ |
onky donky |
| 14:22 |
|
kworstell-isl joined #evergreen |
| 14:32 |
|
BDorsey__ joined #evergreen |
| 15:32 |
|
mantis left #evergreen |
| 15:38 |
Dyrcona |
Likely Autorenew blew it up. |
| 15:38 |
Dyrcona |
That starts at 2:30 am. |
| 15:47 |
Dyrcona |
I have a way to test that hypothesis. I have another machine with half the ram set up to use Redis. It has the same dump loaded. If I run autorenew right now, it should get almost the same data to process. |
| 15:48 |
Bmagic |
Dyrcona++ |
| 15:51 |
Dyrcona |
And, they're off! |
| 15:51 |
redavis |
sandbergja++ |
| 15:07 |
shulabramble |
Okay, then |
| 15:07 |
shulabramble |
#action gmcharlt - create a Git commit message type and update bug 2051946 |
| 15:07 |
pinesol |
Launchpad bug 2051946 in Evergreen "institute a Git commit message template" [Wishlist,New] https://launchpad.net/bugs/2051946 - Assigned to Galen Charlton (gmc) |
| 15:07 |
shulabramble |
#action waiting on gmcharlt for access to POEditor for git integration |
| 15:08 |
shulabramble |
#info sleary and sandbergja will report progress on test writing wiki pages next month |
| 15:08 |
sleary |
I've been out sick; please carry forward |
| 15:08 |
sleary |
(sorry sandbergja!) |
| 15:08 |
shulabramble |
Aww! Hope you're feeling better. |
| 15:08 |
sandbergja |
no problem! |
| 15:08 |
redavis |
Just throwing the agenda here for new people - https://wiki.evergreen-ils.org/doku.php?id=dev:meetings:2024-12-10 |
| 15:09 |
sandbergja |
one note: outside of what sleary and I were working on, I figured out how to run perl unit tests with the perl debugger |
| 15:09 |
shulabramble |
redavis++ thanks, I've had an interesting day and that slipped my mind. |
| 15:09 |
sandbergja |
and added my notes here in case they are useful for anyone: https://wiki.evergreen-ils.org/doku.php?id=dev:testing:debugging_perl_unit_tests |
| 15:09 |
shulabramble |
sandbergja++ |
| 15:09 |
sleary |
sandbergja++ |
| 15:09 |
berick |
sandbergja++ |
| 15:12 |
sleary |
whoops! on it |
| 15:13 |
shulabramble |
#action sleary will make a new LP tag denoting bugs that involve string changes |
| 15:13 |
redavis |
sleary++ |
| 15:13 |
shulabramble |
#topic revisit feasibility of automated testing for string changes |
| 15:13 |
shulabramble |
berick++ sleary++ |
| 15:15 |
shulabramble |
anything on this? |
| 15:16 |
abneiman |
tbh I don't recall whose action item that is |
| 15:16 |
shulabramble |
neither do i. |
| 15:16 |
|
ajarterburn joined #evergreen |
| 15:18 |
sleary |
sandbergja is that something we can add to our QA list? |
| 15:18 |
sandbergja |
Sure! Sounds fun |
| 15:18 |
shulabramble |
#action sandbergja and sleary will revisit feasibility of automated testing for string changes |
| 15:18 |
sleary |
sandbergja++ |
| 15:19 |
shulabramble |
sandbergja++ sleary++ |
| 15:19 |
sandbergja |
sleary++ |
| 15:21 |
abneiman |
shulabramble: yes I have your emial, thanks |
| 15:21 |
shulabramble |
zipping through this agenda at top speed. |
| 15:22 |
Dyrcona |
Lp 2055796 |
| 15:22 |
pinesol |
Launchpad bug 2055796 in Evergreen "Have github actions run pgtap tests for us" [Medium,Fix committed] https://launchpad.net/bugs/2055796 |
| 15:22 |
Dyrcona |
That's done. |
| 15:22 |
berick |
yeah |
| 15:23 |
shulabramble |
dyrcona++ berick++ |
| 15:37 |
redavis |
phasefx++ |
| 15:37 |
sandbergja |
phasefx++ |
| 15:38 |
phasefx |
Hand holding, pep talk, chocolate... |
| 15:38 |
sandbergja |
I can build and test a tarball |
| 15:38 |
shulabramble |
sandbergja++ |
| 15:38 |
* mmorgan |
can probably help with point releases next week |
| 15:38 |
redavis |
I'll call a meeting for the point releases once some more spots are filled out. So, email imminent today or tomorrow for that. |
| 11:07 |
Dyrcona |
berick: Is that in your rust repo on GH? |
| 11:07 |
berick |
if it crashes, there will be a file + line number + crash message in the kernel logs. none of this C segfault hunting madness. |
| 11:08 |
berick |
Dyrcona: yes |
| 11:08 |
Dyrcona |
I'll see about testing it on of my vms. I'll put it on the one that I plan to use with a test Aspen instance. That should give it a work out. |
| 11:09 |
berick |
Dyrcona++ |
| 11:10 |
Bmagic |
berick++ # This is production, albeit a single cronjob forked off to it's own machine for analyzing. I'm willing to use it as a test bed to hunt down this bug. Because, IMO, the only real way to do it is on production |
| 11:13 |
Dyrcona |
I think I'll get Aspen hooked up, first, and try the Rust router later. I'm already using Redis on that vm. I still have to figure out how to hook Aspen up to Evergreen and get the indexer going. I'll probably pop into Slack to ask questions. |
| 11:15 |
* Dyrcona |
*mumbles* Right. Lp bug about camera not working... |
| 11:17 |
Dyrcona |
Wait a minute... Maybe I read that update wrong and it autoremoved the drivers? |
| 10:14 |
Dyrcona |
I don't think setting that to 0 was ever meant to be fine. Looks like an unreported bug was fixed. |
| 10:22 |
|
stephengwills joined #evergreen |
| 10:25 |
mmorgan |
Bmagic++ |
| 10:48 |
Dyrcona |
Well, I have Aspen running, but it's not talking to a test Evergreen instance, yet. I suppose I could get answers in Slack, but I'll peruse the docs and the code first. |
| 10:51 |
|
kworstell-isl joined #evergreen |
| 11:26 |
pinesol |
News from commits: LP#2089419: fix parsing of offset/limit in C-based DB search methods <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=0458ae01ef6a84734efb7232bc1cdaf479dd3be8> |
| 11:53 |
|
jihpringle joined #evergreen |
| 12:14 |
jeffdavis |
If I'm reading Open-ILS/src/c-apps/oils_auth_internal.c correctly, seems like 0 is the default timeout value if auth.staff_timeout is unset. |
| 12:14 |
jeffdavis |
Does the login issue occur on a 3.14 system where auth.staff_timeout is unset? |
| 12:15 |
Bmagic |
here's another finding: if I was using a browser that already had a workstation registered, I could login! (even with the timeout setting set to 0) - but if I needed to to the workstation registration dance, I would be in a login loop |
| 12:15 |
Bmagic |
jeffdavis: I didn't test that |
| 12:16 |
Bmagic |
I'll try it on a test machine |
| 12:16 |
mmorgan |
jeffdavis: I found that the issue does NOT happen when auth.staff_timeout is unset, but it would be great to see confirmation. |
| 12:17 |
Bmagic |
mmorgan jeffdavis: confirmed, no value (no row) in actor.org_unit_setting, I can still login |
| 12:18 |
Dyrcona |
jeffdavis: What branch are you looking at/ |
| 12:19 |
Bmagic |
FYI: I'm on main OpenSRF with the redis stuff merged |
| 12:19 |
|
jihpringle joined #evergreen |
| 12:19 |
Bmagic |
however, I've also tested on opensrf 3.3.2 (ejabberd) and I had the same problem/solution with the setting |
| 12:19 |
* mmorgan |
tested on 3.14.0 |
| 12:20 |
Bmagic |
mmorgan: can you confirm that a zero setting will result in the login loop? |
| 12:21 |
mmorgan |
Bmagic: Yes, I did observe that on 3.14.0, so can confirm. |
| 12:21 |
Bmagic |
and! you have to be sure you're not using a browser that has the workstation registered |
| 12:36 |
Dyrcona |
0 seems like a bad value for an infinite time out when the setting is meant to be an interval. -1 is probably better. |
| 12:36 |
Bmagic |
thinking about it more, the magic trick was probably having "route to" in the query string on the login page. Where the route to was an eg page and* the login page was eg |
| 12:37 |
Dyrcona |
I suspect you're overthinking it, and the problem is likely simpler than that. some eg2 code is getting the time out in seconds and treating 0 as 0, not as infinity. |
| 12:38 |
Bmagic |
I get it, yes, eg2 JS is treating 0 differently than eg. I understand that. I'm pontificating and going over in my head the various ways I was able to overcome the issue during testing. And I think it was when I was able to never touch eg2 during the auth process |
| 12:38 |
Dyrcona |
OK> |
| 12:41 |
Dyrcona |
The core eg2 auth service looks like it gets the timeout from the backend. |
| 12:43 |
Dyrcona |
Staff component looks like it uses the auth service. |
| 15:13 |
Dyrcona |
pinesol: No, but I did try turning it off and back on again. |
| 15:13 |
pinesol |
Dyrcona: http://images.cryhavok.org/d/1291-1/Computer+Rage.gif |
| 15:13 |
Dyrcona |
Yeah, pretty much. |
| 15:14 |
Dyrcona |
Oh, right. I was in the middle of installing some backports on a 3.7.4 test installation. |
| 15:15 |
Dyrcona |
I had just run the 'chown' command and when it seemed to take too long, that's when I knew something was up. :) |
| 15:23 |
Dyrcona |
Sometimes, you just gotta run desktop-clear.... |
| 15:30 |
|
mantis left #evergreen |
| 10:59 |
csharp_ |
redis and sip2-mediator and reports oh my! |
| 11:00 |
csharp_ |
"wellsir..." <slaps the roof of EG 3.12> "this Evergreen version has served us very well" |
| 11:01 |
berick |
and we (kcls) have moved on a bit from the stock EG mediator code, which would put you on the bleeding edge csharp_ fyi |
| 11:02 |
csharp_ |
I'll keep SIPServer around to crank up if needed |
| 11:03 |
csharp_ |
also, willing to test bleeding edge at this stage of our upgrade (targeted for mid February) |
| 11:07 |
berick |
in particular, the rust variant forgoes HTTP communication, a key facet of the original design, and ops for a more evergreen-centric opensrf client approach (i.e. direct redis communication). in the end, it's just more efficient and in some ways more practical. |
| 11:12 |
berick |
that is to say, though the http approach has its own benefits, there are no examples of it in use in the wild. |
| 11:12 |
berick |
that i'm aware of |
| 11:27 |
Bmagic |
csharp_: I feel your pain on the bot battle |
| 11:34 |
|
Dyrcona joined #evergreen |
| 11:37 |
mmorgan |
berick: jeff: Looking at bug 2076921. jeff's testing looks favorable, any reason not to roll it out? Our pc support tech is seeing changes being rolled out that disable the current extension. |
| 11:37 |
pinesol |
Launchpad bug 2076921 in Evergreen "Hatch: Chrome Extension Requires Redevelopment" [High,Confirmed] https://launchpad.net/bugs/2076921 - Assigned to Jeff Godin (jgodin) |
| 11:40 |
berick |
mmorgan: i see no reason to delay any longer |
| 11:41 |
|
sandbergja joined #evergreen |
| 12:02 |
Bmagic |
ok, I'm game |
| 12:03 |
berick |
oh right, we can attach to running processes... |
| 12:03 |
berick |
so just restart everything like normal |
| 12:03 |
Bmagic |
I've isolated the issue to this machine by trial and error. The "main" util server isn't having issues, now that I've divided the crons up. And this one cron seems to be the one that consistently causes the segfault. In other words: we can use this as a test bed. |
| 12:04 |
Bmagic |
ok, I'm leaving the machine alive, and restarting everything like normal |
| 12:06 |
berick |
once it's up, get the PIDs of the 2 router proceses, open 2 terminals and run this for each pid: gdb /openils/bin/opensrf_router <pid> |
| 12:06 |
berick |
i /think/ that will do it |
| 12:31 |
berick |
oh, you mean the binary.. |
| 12:31 |
berick |
maybe? |
| 12:31 |
Bmagic |
I'll try |
| 12:32 |
Bmagic |
that would be nice, so I'm testing the same situation where I've had this segfault a few times |
| 12:34 |
berick |
Bmagic: also https://stackoverflow.com/questions/21395106/how-can-i-gdb-attach-to-a-process-running-in-a-docker-container |
| 12:35 |
berick |
hm, don't see how that's really different from what you already tried |
| 12:35 |
Bmagic |
lxc-attach is the magic |
| 10:32 |
csharp_ |
no, it should be considered the new CentOS |
| 10:32 |
Dyrcona |
OK. I haven't been keeping up with the RedHats.... |
| 10:32 |
Dyrcona |
I thought it was the new Fedora. |
| 10:33 |
csharp_ |
Fedora's still Fedora - I may try to support it too - but Fedora is the testing ground for RHEL/Rocky |
| 10:34 |
csharp_ |
like a more stable Debian sid |
| 10:34 |
Dyrcona |
Yeah. I would not object to adding Rocky support, but I'm a bit leery of adding Fedora install support officially. |
| 10:34 |
Dyrcona |
I mean, *I* don't want to support it. :) |
| 10:34 |
csharp_ |
agreed - I can't imagine anyone running Fedora in production given how quickly the release cycles run |
| 17:02 |
mantis |
mmorgan++ |
| 17:03 |
|
mantis left #evergreen |
| 17:04 |
Dyrcona |
mmorgan++ redavis++ csharp_++ |
| 17:06 |
csharp_ |
starting my test of 3.13 |
| 17:06 |
mmorgan |
3.12.9 tarball is set also, files shared. |
| 17:11 |
|
mmorgan left #evergreen |
| 17:31 |
* Dyrcona |
signs out for now. |
| 15:56 |
Bmagic |
ok, I see. Looking at a fresh installation of OpenSRF (main) - comes packed with that line CONFIG SET save "" |
| 16:04 |
Dyrcona |
berick++ Bmagic++ |
| 16:04 |
Dyrcona |
I let do-release-upgrade overwrite /etc/redis.conf, so wondered when I didn't see anything in the README about it. |
| 16:06 |
Dyrcona |
After doing do-release-upgrade, I `rm -rf /openils/*` and reinstalled OpenSRF main and my test branch of Evergreen base on 3.14.0. |
| 16:07 |
Dyrcona |
It's working so far. |
| 16:12 |
* Dyrcona |
is also ready for translations and release building tomorrow. |
| 16:13 |
redavis |
Dyrcona++ |
| 16:25 |
redavis |
Bmagic, yep. If you can. No pressure if you can't though. |
| 16:25 |
Bmagic |
:) I'd love to |
| 16:25 |
redavis |
Rock on. Thank you. |
| 16:28 |
eeevil |
re the reporter, they last "one more failure" test was not related to the problem at hand, but it's also (as abneiman says) fixed, pending testing. LP may not have sent my email to that effect yet |
| 16:29 |
Dyrcona |
eeevil: I got the email with your comment. I'm Ok with that going in if someone signs off tonight. |
| 16:30 |
redavis |
eeevil, I'm just refreshing for comments anyway :-). I think we're waiting on Llewelyn for the actual ticket. Is there another ticket for the "one more failure"? |
| 16:30 |
redavis |
Oh, eeevil, n/m my question. I see the answer in your comment. |
| 16:30 |
Bmagic |
I was about to say |
| 16:30 |
Dyrcona |
:) |
| 16:31 |
Bmagic |
it seems like we should wait until this tree.js branch receives it's additional commit, and maybe Llewellyn can double check it on his test machine, the same as he did with the first commit |
| 16:33 |
Dyrcona |
My answer is it depends on how long we have to wait. |
| 16:33 |
Dyrcona |
I would like to get started tomorrow morning. |
| 16:34 |
Bmagic |
agreed |
| 16:44 |
Bmagic |
that's fine, it means we'll wait |
| 16:54 |
* Dyrcona |
signs out for the day. |
| 17:01 |
|
mmorgan left #evergreen |
| 17:01 |
eeevil |
Bmagic: ok! looking at the clock, I'm going to suggest that I push a branch with just (what we've been calling) the followup commit -- it's separate, but discovered during tree.ts testing -- and if you get the signoffs you want/need on the current branch, I recommend we proceed with merging that. then, if you also like the cut of the new branch's jib, and feel like pulling that in, I'm comfortable with it going in quickly. if it has to sit, I won't |
| 17:01 |
eeevil |
cry too hard. |
| 17:02 |
Bmagic |
ok then, we have something we can test? |
| 17:03 |
redavis |
Bmagic, I think it's the current branch listed at https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-2087562-dynamic-tree-node-parent-links |
| 17:03 |
Bmagic |
that's what I'm looking at. So, roll with that for now, and we'll catch the rest next cycle, is that right? |
| 17:04 |
eeevil |
to that end, here's the followup commit on a branch by itself: https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/fix-virtual-field-link-joins-in-upgraded-templates |
| 17:05 |
Bmagic |
eeevil: thanks! And you're not comfortable with that second branch just yet? |
| 17:05 |
eeevil |
Bmagic: next cycle, or this if it looks sane to you. I'm mainly trying to get it out there for others to test/use asap |
| 17:05 |
eeevil |
oh, /I/ am comfortable, but it's not been tested by not-mike. |
| 17:05 |
Bmagic |
understandable |
| 17:06 |
redavis |
How about this, maybe Llewellyn signs off on the first branch tonight.. but probably not. Even if he doesn't, we can get to testing the second branch and have both ready for next month...and for anyone that wants to apply in the meantime. |
| 17:06 |
Bmagic |
well, like you said, it's three lines. Seems like we should be able to test that with some humans. Sure wish we had some e2e tests to do the job for us |
| 17:07 |
* eeevil |
imagines the selenium script that would drive the reporter to test this ... shudders, passes out |
| 17:07 |
redavis |
LOL!!! |
| 17:08 |
Bmagic |
yeah, it'd be one heck of a test |
| 17:09 |
eeevil |
the test is ... clone a report that uses a might_have link on a virtual idl field and make sure it uses the left-side pkey column instead of the virtual column name in the join |
| 17:09 |
abneiman |
I'd be delighted to have the first one go in this evening, for point releases this month |
| 17:09 |
Bmagic |
it's radio silience over there at Cardinal. Maybe they'll get back with me before the "morning" |
| 17:10 |
abneiman |
but! I've put my finger(s) on the scale(s) too many times on this particular issue so with that imma head out and go swimmin |
| 17:13 |
redavis |
lol, or close to the Y |
| 17:13 |
redavis |
also, rock on for the first branch. So, unless anyone wants to stop me (please stop me), I'm going to open a new LP ticket for the second branch to make sure it doesn't get neglected. |
| 17:15 |
eeevil |
redavis: this is me, not stopping you! :) |
| 17:15 |
Bmagic |
yeah, that's a sane course of action I think. Considering the test is nuanced |
| 17:15 |
eeevil |
redavis++ |
| 17:15 |
redavis |
Excellent! This is also you not smellin' a burning clutch. |
| 17:15 |
Bmagic |
digging up an old template that " uses a might_have link on a virtual idl field and make sure it uses the left-side pkey column instead of the virtual column name in the join" |
| 11:19 |
csharp_ |
ick |
| 11:19 |
csharp_ |
pinesol: be better |
| 11:19 |
pinesol |
csharp_: Have you tried taking it apart and putting it back together again? |
| 11:19 |
Dyrcona |
berick: I'm signing off on your commits for Lp 2083856 and looking into the test failure. I swear it did not do that last Wednesday. Also, I swear I added all of the prerequisites, but maybe I didn't or switched branches and lost 'em? |
| 11:19 |
pinesol |
Launchpad bug 2083856 in Evergreen "Add Support for PostgreSQL 17" [Wishlist,Confirmed] https://launchpad.net/bugs/2083856 - Assigned to Jason Stephenson (jstephenson) |
| 11:20 |
csharp_ |
berick++ # don't blame me, I voted for Kodos |
| 11:20 |
berick |
Dyrcona: cool, yeah, i didn't get a chance to follow up on the failures. |
| 11:21 |
|
Christineb joined #evergreen |
| 11:28 |
Dyrcona |
berick: It looks like the same output with the 100 and 600 tags in opposite order. |
| 11:29 |
Dyrcona |
I'm going to reverse them in one of the copies and see what diff says after. |
| 11:31 |
Dyrcona |
Yeah, that's it. If I modify a copy of what we're looking for from the test to have the 100 before the 600, and the 905 (to put 110 before 600), the output is the same. |
| 11:32 |
Dyrcona |
I wonder if we have been depending on undefined behavior again? |
| 11:32 |
Dyrcona |
Bleh that 110 should be 100... Anyway.... |
| 11:33 |
Dyrcona |
I suppose I can make the test check the Pg version and do something different if it 17+. It wouldn't be the first time we've done that. |
| 11:35 |
Dyrcona |
I wonder though if the auth overlay function needs fixing for more predictable output? |
| 11:38 |
pinesol |
News from commits: LP#1721026: Default owner for copy tags and types <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=79980aff138c39b6056f8c0795fa308a8293fa79> |
| 11:42 |
csharp_ |
oh, I didn't realize the next Ubuntu release is "noble" - that kind of synergy doesn't happen often, does it NOBLE? |
| 11:44 |
Dyrcona |
I name my test vms for the release codenames: bullseye, bookworm, jammy, noble, etc. So, I have a test vm named noble.cwmars.org. :) |
| 11:45 |
csharp_ |
SO CONFUSING |
| 11:45 |
Dyrcona |
I suppose I could have used the animal name: numbat.cwmars.org.... |
| 11:47 |
* mmorgan |
wants to say it's cool, but agrees with csharp_ SO CONFUSING! |
| 13:29 |
abowling |
Dyrcona: It's what csharp_ referenced earlier; my 5.26.1 folder has the dependencies, but not the 5.30.0 folder |
| 13:30 |
Dyrcona |
Well, 5.26.1 should still be in the Perl path... You could try csharp_'s suggestion of deleting the 5.26.1 stuff and then reinstalling the prerequisites. |
| 13:31 |
|
jihpringle joined #evergreen |
| 13:36 |
Dyrcona |
heh.. The test still fails...... |
| 13:38 |
Bmagic |
tests++ |
| 13:48 |
Dyrcona |
extraneous space from where I inserted the new output. |
| 13:50 |
Dyrcona |
The db test passes on pg 17. I'll test on an earlier pg release before committing the changes. |
| 13:50 |
Dyrcona |
I mean committing them locally and pushing to a working branch. |
| 14:20 |
|
BDorsey joined #evergreen |
| 14:34 |
|
collum joined #evergreen |
| 15:08 |
shulabramble |
#topic waiting on gmcharlt for access to POEditor for git integration |
| 15:08 |
shulabramble |
And I assume the same for this unless there's any discussion to be had? |
| 15:08 |
abneiman |
I will ping gmcharlt in our channel about that 2nd one since Dyrcona is awaiting access |
| 15:09 |
shulabramble |
abneiman++ |
| 15:09 |
shulabramble |
#action waiting on gmcharlt for access to POEditor for git integration |
| 15:09 |
shulabramble |
#topic sleary and sandbergja will report progress on test writing wiki pages next month / at hackaway |
| 15:10 |
|
smorrison joined #evergreen |
| 15:10 |
sandbergja |
ooh! We did some work in the wiki. We haven't had a chance to coordinate something for the hackaway |
| 15:10 |
sleary |
I have not made much progress on my part of that, so let's kick that to next month, please! |
| 15:10 |
abneiman |
sandbergja++ sleary++ |
| 15:10 |
shulabramble |
sandbergja++ sleary++ got it |
| 15:11 |
shulabramble |
and a belated eeevil++ |
| 15:11 |
shulabramble |
#action sleary and sandbergja will report progress on test writing wiki pages next month / at hackaway |
| 15:11 |
sandbergja |
very happy if people discuss tests at hackaway though :-D |
| 15:11 |
shulabramble |
#topic bug 2076921 expected to get more testing and merged, and beta uploaded to store |
| 15:11 |
pinesol |
Launchpad bug 2076921 in Evergreen "Hatch: Chrome Extension Requires Redevelopment" [High,Confirmed] https://launchpad.net/bugs/2076921 - Assigned to Jeff Godin (jgodin) |
| 15:12 |
berick |
jeff did some testing, many thanks |
| 16:12 |
pinesol |
Launchpad bug 2055796 in Evergreen "Have github actions run pgtap tests for us" [Undecided,Confirmed] https://launchpad.net/bugs/2055796 - Assigned to Bill Erickson (berick) |
| 16:12 |
smayo |
shulabramble++ |
| 16:12 |
shulabramble |
sandbergja? |
| 16:12 |
sandbergja |
yes, pasting some text |
| 16:13 |
sandbergja |
The back story: we used to have our tests automatically running regularly (2x/day?), and alerting IRC if there was a test failure. This has not worked for some time, so currently we don't find out about those bugs we've added to Evergreen until somebody runs the tests manually (which may be a while later). Github offers free infrastructure for running tests, so a year ago, we decided to explore |
| 16:13 |
sandbergja |
I have a pull request to get our pgtap (database) tests running automatically on github actions, berick has reviewed it but mentioned "unclear if there's any additional decision processes re: adding github actions". |
| 16:13 |
sandbergja |
I figured this was the place to ask. :-) |
| 16:13 |
phasefx |
testing would be easier with simpler installations... *runs away* |
| 16:13 |
sandbergja |
phasefx++ |
| 16:14 |
sandbergja |
100% agree |
| 16:14 |
redavis |
phasefx++ #also hahahahahah |
| 16:14 |
sandbergja |
trying to install an eg-compatible ejabberd in github actions has me totally stumped |
| 16:14 |
sandbergja |
so I can't wait for redis to be merged |
| 16:14 |
sandbergja |
but I'm getting off topic hahaha |
| 16:15 |
berick |
any objections to running pgtap tests via github actions? going once? |
| 16:15 |
Bmagic |
berick says "strong +1 to merging" |
| 16:15 |
Bmagic |
no objections at all |
| 16:15 |
phasefx |
no objections to anyone maintaining testing infrastructure |
| 16:16 |
abneiman |
tests++ |
| 16:16 |
berick |
didn't think so, but it's kind of new, so.. |
| 16:16 |
berick |
sandbergja++ |
| 16:16 |
shulabramble |
sandbergja++ |
| 16:17 |
redavis |
berick++ |
| 16:18 |
shulabramble |
berick++ |
| 16:18 |
phasefx |
berick: just main? |
| 16:18 |
Bmagic |
I find it interesting that the server upon which the test run, presumabely don't have the perl dependencies? But that's fine for the scope of the tests? |
| 16:18 |
* phasefx |
belatedly pulls up the ticket |
| 16:19 |
shulabramble |
#action it is okay to merge lp2055796, sandberg and berick will attend |
| 16:19 |
shulabramble |
wrangling before we go a full half hour over |
| 17:40 |
pinesol |
Launchpad bug 2032835 in OpenSRF "Discussion: Merge OpenSRF Into Evergreen?" [Wishlist,New] https://launchpad.net/bugs/2032835 |
| 17:41 |
sandbergja |
+1 |
| 17:42 |
Bmagic |
+1 |
| 17:48 |
pinesol |
News from commits: LP2055796: run pgtap tests on Github Actions <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=7e83c5de8ca1e7530853878bc7562d961ce348c4> |
| 17:50 |
Bmagic |
sandbergja++ berick++ |
| 18:08 |
|
phasefx joined #evergreen |
| 18:08 |
|
abneiman joined #evergreen |