| Time |
Nick |
Message |
| 07:08 |
|
collum joined #evergreen |
| 07:13 |
|
book` joined #evergreen |
| 08:43 |
|
mmorgan joined #evergreen |
| 08:47 |
|
dguarrac joined #evergreen |
| 09:04 |
|
Dyrcona joined #evergreen |
| 09:07 |
Dyrcona |
So, our dev/training system has action_trigger_runner.pl piling up for the same jobs (i.e. granularity) and never finishing. I didn't count, but there were at least 5 password-reset granularity a/t runners going and at least two that we run every half hour. I hope a vacuum full analyze helps. |
| 09:07 |
Dyrcona |
This is with Evergreen 3.14.4 |
| 09:10 |
Dyrcona |
Oh, and the srfsh jobs all reported that they could not bootstrap the client overnight. The opensrf_core.xml file is in the usual place. |
| 09:11 |
Dyrcona |
We've had issues with this system before, but I wonder if anyone else has encountered something like this. I suspect it's just a lack of resources, RAM or CPU, but thought I'd bring it up in case someone else has seen something like this before. |
| 09:21 |
Dyrcona |
this vacuum full analyze is taking a long time. |
| 09:22 |
Dyrcona |
Caught error from 'run' method: Exception: OpenSRF::EX::JabberDisconnected 2025-03-26T06:37:57 OpenSRF::Application /usr/local/share/perl/5.34.0/OpenSRF/Application.pm:240 JabberDisconnected Exception: This JabberClient instance is no longer connected to the server |
| 09:22 |
Dyrcona |
So, jabber settings, maybe? |
| 09:25 |
Dyrcona |
Everything looks OK. I'm going to bump the max stanza size. |
| 09:36 |
Dyrcona |
I bumped it to 1MB + 6,144 bytes because it looks like the prior settings were something like 256,000 + 6,144. |
| 09:38 |
JBoyer |
srfsh doesn't use opensrf_core.xml directly unless you setup a symlink to it from ~/.srfsh.xml |
| 09:38 |
JBoyer |
Though if ~/.srfsh.xml is there and accurate it's hard to say what's wrong |
| 09:39 |
Dyrcona |
JBoyer: I'll double check .srfsh.xml. It might be wrong. |
| 09:39 |
Dyrcona |
Well, out of date. |
| 09:41 |
Dyrcona |
Ah, yeahp. It has the redis password in it and not the ejabberd password. |
| 09:41 |
Dyrcona |
JBoyer++ |
| 09:41 |
Dyrcona |
That explains the srfsh jobs not starting. |
| 09:42 |
Dyrcona |
I want the opensrf user's private password in there, right? |
| 09:44 |
Dyrcona |
Well, duh. It says so right in .srfsh.xml. |
| 09:44 |
Dyrcona |
Oh, and the port needs to change. |
| 09:46 |
Dyrcona |
the vacuum full is still chugging away after 40 minutes. |
| 09:59 |
Dyrcona |
Either GCP storage is really slow or this database needs a lot of cleaning up....54 minutes and counting. |
| 10:10 |
Dyrcona |
Never seen this before: 2025-03-25 13:32:37.409902-04:00 [error] <0.88.0>@ejabberd_system_monitor:do_kill/2:290 Killed 1 process(es) consuming more than 10310 message(s) each |
| 10:13 |
Dyrcona |
2025-03-26 09:01:56.882977-04:00 [info] <0.4136.0>@ejabberd_c2s:process_terminated/2:292 (tcp|<0.4136.0>) Closing c2s session for opensrf private.localhost/open-ils.supercat_drone_at_localhost_67134: Connection failed: connection closed |
| 10:14 |
Dyrcona |
A lot of lines like that. I wonder if that's bots or Aspen? |
| 10:14 |
Dyrcona |
We do have a test instance of Aspen talking to this server. |
| 10:35 |
Rogan |
Dyrcona am I imagining it or were you looking at pulling digital bookplates into Aspen at some point? |
| 10:41 |
Rogan |
ignore ^, found it |
| 10:43 |
|
Christineb joined #evergreen |
| 11:15 |
Dyrcona |
Two hours and 10 minutes so far. I opened a ticket about the db storage. |
| 11:33 |
jeff |
Dyrcona: what type of storage volume? PD, Hyperdisk, or Local SSD? |
| 11:33 |
jeff |
and how much data are you rewriting? |
| 11:33 |
jeff |
(er, asked another way, "how large is the db on disk?") |
| 11:34 |
jeff |
oh, and I forgot about Extreme PDs as an option! :-) |
| 11:35 |
Dyrcona |
jeff: According to df; 387GB. |
| 11:36 |
Dyrcona |
Hm. Now it says 425GB. |
| 11:39 |
Dyrcona |
it's growing while the vacuum full makes copies of tables. |
| 11:45 |
|
jihpringle joined #evergreen |
| 11:50 |
Dyrcona |
Yeah. Down to 403GB now. |
| 11:55 |
|
mantis joined #evergreen |
| 11:57 |
|
jihpringle joined #evergreen |
| 12:05 |
|
jihpringle38 joined #evergreen |
| 12:08 |
csharp_ |
Dyrcona: we ended up setting max_stanza_size to max_stanza_size: 10000000 on our A/T server |
| 12:08 |
Dyrcona |
csharp_: I used to set it to 10 * 1024 * 1024. |
| 12:08 |
csharp_ |
Dyrcona: you should see the erroring out in the ejabberd logs |
| 12:08 |
Dyrcona |
So 10MB |
| 12:09 |
csharp_ |
haven't run A/T on redis yet, we'll see if that's better eventually |
| 12:09 |
Dyrcona |
yeah, I used to grep for something like stanza too large or whatever. I tried looking for stanza, and just came up with someone named Costanza in the logs. |
| 12:09 |
berick |
haha |
| 12:09 |
* csharp_ |
is off but couldn't resist responding |
| 12:09 |
Dyrcona |
I have run A/T on redis. It's fine. |
| 12:09 |
csharp_ |
@who is killing independent George? |
| 12:09 |
pinesol |
scottangel_ is killing independent George. |
| 12:10 |
Dyrcona |
This machine used to be set up for Redis, but after an update/rebuild in December, it started having issues so we switched back to Ejabberd. |
| 12:10 |
* Dyrcona |
is leaning toward the db storage being too slow. |
| 12:10 |
csharp_ |
Dyrcona: also something like "large message" in the osrf logs |
| 12:10 |
Dyrcona |
yeah... |
| 12:10 |
Dyrcona |
I'll look for large message. |
| 12:12 |
csharp_ |
we learned from pain that A/T processing needs 1) tons of RAM 2) unreasonably large max_stanza_size on ejabberd and 3) granularity to distribute things more evenly |
| 12:12 |
csharp_ |
all of that assumes a fast-ish DB, yes |
| 12:12 |
Dyrcona |
Yeah. I've learned that, too. |
| 12:13 |
Dyrcona |
Maybe I'll set max_stanza_size to 10MB anyway. There no services running. I want this vacuum to run unhampered. |
| 12:29 |
Dyrcona |
It finished after 3 hours and 18 minutes. |
| 12:29 |
Dyrcona |
363GB are used on the database partition. |
| 12:36 |
Dyrcona |
I really like Emacs regex replace because you can execute elisp code: \(max_stanza_size: \)[0-9]+ → \1\,(format "%d" (+ 6144 (* 1024 1024 10))) |
| 12:36 |
Dyrcona |
\, is used to indicate that the following is elisp code that returns a string. |
| 12:44 |
Dyrcona |
csharp_: I finally found "Stream closed by local host: XML stanza is too big (policy-violation)" in the Ejabberd logs. |
| 12:45 |
Dyrcona |
open-ils.trigger drone was the last one to do it. |
| 13:04 |
Dyrcona |
I haven't found anything in osrfsys.log that obviously corresponds to the XML stanza is too big message, but there are 3,247 log lines for the second of the timestamp of ejabberd message in osrfsys.log. |
| 13:05 |
Dyrcona |
The open-ils.trigger stderr log has the timestamp of the minute that the error occurred. I pasted the last entry earlier. |
| 13:06 |
Dyrcona |
i think the stderr logs should include timestamps for individual log entries. Maybe I should Lp that? |
| 13:08 |
Dyrcona |
i seem to recall having to adjust the log settings in opensrf_core.xml before the ejabberd messages would pass through to osrfsys.log. |
| 13:25 |
Dyrcona |
The message from the stderr log shows up in osrfsys.log, so there's tat. |
| 13:26 |
Dyrcona |
s/tat/that/ |
| 13:37 |
|
jihpringle joined #evergreen |
| 13:58 |
|
jihpringle joined #evergreen |
| 14:05 |
|
jihpringle24 joined #evergreen |
| 14:20 |
|
jihpringle joined #evergreen |
| 15:13 |
Rogan |
to everyone in channel, I have not sent out a formal announcement yet but we will be looking for a host for this year's Hack-A-Way. If anyone is interested keep an eye out for the announcement. |
| 15:18 |
|
mantis left #evergreen |
| 15:55 |
Dyrcona |
Oh crap. We're still under the commit moratorium, aren't we? I totally forgot and pushed something to main, rel_3_13, and rel_3_14..... |
| 15:56 |
Dyrcona |
I was just about to add it to rel_3_15 when it struck me what I was doing. |
| 15:57 |
Dyrcona |
I think I can actually fix it. let me give it a shot. |
| 15:57 |
Dyrcona |
Nope. Maybe that rule only applies to main... |
| 15:58 |
Dyrcona |
Eh, no. Only applies to certain repos, not Evergreen. |
| 16:10 |
abneiman |
Dyrcona: we are still under the moratorium but it will be lifted by the end of today, hopefully |
| 16:11 |
abneiman |
apologies for the process taking longer than usual, but I was last minute out of town at one conference (while presenting at another conference) and we have a lot of people (happily) learning new steps in this process, myself included. Appreciate your patience. |
| 16:14 |
Dyrcona |
abneiman: Thanks. I don't think much actual harm is done, since rel_3_15 wasn't touched. Whoever is working on branches should be able to add the db upgrade to main in more or less the usual way. If whoever that is needs help, contact me, and I'll gladly do it. |
| 16:14 |
abneiman |
thanks, will do |
| 16:14 |
mmorgan |
abneiman++ |
| 16:14 |
mmorgan |
release_team++ |
| 16:14 |
mmorgan |
Dyrcona++ |
| 16:16 |
pinesol |
News from commits: LP#2051946: Add Co-authored-by to commit-template <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=dd23b9bb54d324ea5e0cb250501857458e4ac9f0> |
| 16:46 |
|
jihpringle joined #evergreen |
| 16:49 |
* Dyrcona |
is clocking out, but I'll keep an eye on email. |
| 17:25 |
|
mmorgan left #evergreen |
| 18:35 |
abneiman |
redavis++ |
| 18:49 |
|
jihpringle joined #evergreen |
| 19:26 |
|
jihpringle joined #evergreen |
| 21:28 |
|
abowling1 joined #evergreen |
| 22:16 |
|
abowling joined #evergreen |