Time |
Nick |
Message |
04:15 |
|
JBoyer_ joined #evergreen |
04:51 |
|
rjackson_isl_hom joined #evergreen |
05:30 |
|
alynn26 joined #evergreen |
05:39 |
|
gmcharlt joined #evergreen |
06:02 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
08:11 |
|
mantis1 joined #evergreen |
08:30 |
|
Dyrcona joined #evergreen |
08:38 |
|
mmorgan joined #evergreen |
09:00 |
|
rfrasur joined #evergreen |
09:26 |
|
jvwoolf joined #evergreen |
09:26 |
Dyrcona |
Went and made a custom type just returning ROW would probably work. |
09:31 |
Dyrcona |
Hmm. Missing a conjunction there.... Or, RECORD maybe... |
10:36 |
|
awitter joined #evergreen |
10:48 |
csharp_ |
I'm still scouring opensrf/ejabberd logs and I'm not finding anything obvious |
10:48 |
csharp_ |
it's like the client dies off and no one cares, log-wise |
10:48 |
csharp_ |
not sure where to start adding debug to perl/C whatever |
10:49 |
csharp_ |
you would think if ejabberd was under some sort of duress it would be showing error log messages |
10:49 |
csharp_ |
I think "timeout" is a symptom, not a cause |
10:50 |
csharp_ |
(as in 2022-01-20 11:12:01.958 [info] <0.28199.1>@ejabberd_c2s:process_terminated:271 (tcp|<0.28199.1>) Closing c2s session for opensrfprivate.brick01-head.gapines.org/open-ils.actor_listener_brick01-head.gapines.org_118630: Connection failed: timeout ) |
10:50 |
Dyrcona |
Time to add more bricks? |
10:50 |
csharp_ |
and looking at the code, "just deactivate new cataloging UIs" is not as easy as I thought it might be |
10:50 |
Dyrcona |
Never is. |
10:51 |
csharp_ |
my last of Angular/AngJS foo is working against me there too |
10:51 |
Dyrcona |
Have some lasagna, by the way we only have spaghetti noodles. |
10:51 |
csharp_ |
:-/ |
10:51 |
* csharp_ |
irons the noodles in hopes they get flat enough |
10:52 |
Dyrcona |
I'm getting reports that searching OCLC via Z39.50 is not working. I look in the logs and see "search returned 0 hits" and no errors. Anyone else getting similar reports from libraries? |
10:55 |
csharp_ |
this doesn't feel like a networking threshold issue either - it's random |
10:55 |
csharp_ |
and it only happens during the workday |
10:56 |
Dyrcona |
It still feels like a resource limit to me, maybe not one you can adjust. |
10:56 |
csharp_ |
I liked the old days when I didn't have to give a sh*t about ejabberd :-( |
10:56 |
Dyrcona |
I see lines like this, and I wonder if that is what was really meant: my $count = $$res{count} = $results->size; |
10:58 |
csharp_ |
if it's a resource limit, would I see something at 7:30 p.m.? 2022-01-26 19:33:06.051 [info] <0.6239.0>@ejabberd_c2s:process_terminated:271 (tcp|<0.6239.0>) Closing c2s session for opensrfprivate.brick05-head.gapines.org/open-ils.actor_listener_brick05-head.gapines.org_92370: Connection failed: timeout |
10:59 |
Dyrcona |
Yeah, maybe. It's hard to say. |
10:59 |
Dyrcona |
I know of lot of things don't get logged by any part of the chain that I wish were logged. |
11:00 |
Dyrcona |
s/of/a/ |
11:00 |
csharp_ |
I've thought that for a long time and it's really kicking my ass right now |
11:01 |
csharp_ |
too many of the less useful messages, not enough of the useful ones (of course, that's pretty subjective and contextual) |
11:01 |
Dyrcona |
Yeah. |
11:02 |
Dyrcona |
I'm just grepping logs for things remotely related to the OCLC situation, and I see lots of bad input in the error logs. :) |
11:02 |
Dyrcona |
Not for the OCLC/Z39.50, though, but for other searches and some circ calls. |
11:03 |
Dyrcona |
This isn't going to match an ISBN: *&0062993151 |
11:04 |
Dyrcona |
Some of it looks like someone trying to fuzz one of the gateways. |
11:09 |
csharp_ |
@blame THE FUZZ |
11:09 |
pinesol |
csharp_: THE FUZZ WILL PERISH UNDER MAXIMUM DELETION! DELETE. DELETE. DELETE! |
11:10 |
Dyrcona |
In my case, I don't see any signs of a problem, just that OCLC searches are returning 0 results. Again, this is basically useless, though not as serious. |
11:14 |
Dyrcona |
csharp_: Maybe it is time to ditch ejabberd for something else? |
11:16 |
berick |
i'll be proposing a conf. session on just that topic |
11:16 |
alynn26 |
+1 for ditching ejabberd |
11:17 |
berick |
(which obv. won't help csharp_ in the short term) |
11:17 |
Dyrcona |
berick: You have something specific in mind? |
11:17 |
berick |
Dyrcona: i have a proof of concept using Redis |
11:20 |
Dyrcona |
berick: Adding a new Transport, or is it more complicated than that? |
11:22 |
csharp_ |
berick: interesting |
11:22 |
berick |
Dyrcona: adding a new transport, while also scrapping the xmpp xml wrapper, which requires a few additional changes, but not huge changes. |
11:23 |
Dyrcona |
Yes, definitely sounds interesting. |
11:24 |
berick |
well, other changes too. i'm still experimenting and writing up the proposal |
11:25 |
Dyrcona |
berick++ |
11:26 |
csharp_ |
berick++ |
11:26 |
csharp_ |
well, the PINES starr are talking about halting cataloging for a day to rule those out as possible causes - feels extreme to me, but without much to go on from the logging end, I'm getting desperate |
11:27 |
csharp_ |
er.. s/starr/staff/ |
11:28 |
berick |
csharp_: like Dyrcona asked, have you considered adding a server or two to help spread the load until this can be resolved? seems like one of those things that can't hurt... |
11:31 |
csharp_ |
berick: no, I haven't really considered that - given the fact that the servers don't appear to be under any sort of duress it might be adding to my babysitting duties |
11:35 |
berick |
csharp_: yeah, i get that. just thinking if the problem only happens during the workday, more load seems at least partially to blame |
11:35 |
berick |
load that could be spread |
11:37 |
csharp_ |
interesting - I'm thinking about that |
11:52 |
Dyrcona |
Is Mercury in retrograde? |
11:54 |
rjackson_isl_hom |
quick question to see if this rings a bell and any helpful hints on how to resolve: db server replaced overnight due to hardware issues - up and running but seeing errors when translate_isbn013 is called and error indicates can't locate Business/ISBN.pm - which is part of the db function |
11:54 |
rjackson_isl_hom |
is this a path from within the postgres install on the db server that needs adjusted, or ??? |
11:55 |
Dyrcona |
rjackson_isl_hom: You probably need to install the db server prerequisites. |
11:55 |
rjackson_isl_hom |
actual error looks like this |
11:55 |
rjackson_isl_hom |
DBD::Pg::st execute failed: ERROR: Can't locate Business/ISBN.pm in @INC (you may need to install the Business::ISBN module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.26.1 /usr/local/share/perl/5.26.1 /usr/lib/x86_64-linux-gnu/perl5/5.26 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.26 /usr/share/perl/5.26 /usr/local/lib/site_perl) at line 2.\nBEGIN failed--compilation aborted at line |
11:55 |
rjackson_isl_hom |
2.\nCONTEXT: compilation of PL/Perl function "translate_isbn1013" [for Statement " -- bib search: |
11:57 |
rjackson_isl_hom |
Dyrcona I am back seat driver but trying to assist if I can! What are the prerequisites that might be missing and can this be fixed post db load? System is "up" relatively speaking |
11:59 |
Dyrcona |
rjackson_isl_hom: Look in the Open-ILS/extras/install/Makefile.<distro>-<release> for your Evergreen version, distro, and release. You need to install the packages listed for DEB_PGSQL_COMMON_MODS and the cpan modules from CPAN_MODULES_PGSQL. |
12:00 |
rjackson_isl_hom |
OK thanks I will pass that on Dyrcona++ |
12:03 |
Dyrcona |
rjackson_isl_hom: I omitted a level in the path before. It is supposed to be Open-ILS/src/extras/install/Makefiel.<distro>-<release> |
12:03 |
Dyrcona |
typos-- |
12:03 |
Dyrcona |
Makefile..... |
12:03 |
Dyrcona |
Anyway, back to the mystery of why Vandelay is suddenly slow today..... |
12:04 |
|
jihpringle joined #evergreen |
12:25 |
Dyrcona |
@dunno |
12:25 |
pinesol |
Dyrcona: No, you're a puzzleheaded kraken! |
12:32 |
|
nfBurton joined #evergreen |
12:36 |
jeffdavis |
No need to get personal, pinesol. |
12:48 |
jeffdavis |
It would be interesting to see if that same PINES problem exists with a different version of ejabberd (e.g. by putting an Ubuntu 20.04 server in rotation temporarily - might be a lot of work for low risk of reward though) |
12:50 |
|
abowling joined #evergreen |
12:57 |
csharp_ |
jeffdavis: that occurred to me too |
13:02 |
rjackson_isl_hom |
still pounding head to wall - we have app servers that did not change and the ISBN.pm module is installed there |
13:03 |
Dyrcona |
rjackson_isl_hom: You need to install the things that I mentioned on the database server. It needs those Perl modules. |
13:04 |
rjackson_isl_hom |
OK - looking further |
13:08 |
jeff |
csharp_: have you captured network traffic (specifically / especially XMPP traffic) during one of the events? |
13:11 |
|
jihpringle joined #evergreen |
13:15 |
Dyrcona |
I'm also being told that bib records won't overlay. Anything useful in the logs? Doesn't look like it so far. |
13:16 |
Dyrcona |
There is way too much "noise" in the logs. |
13:18 |
|
awitter joined #evergreen |
13:24 |
Dyrcona |
Ah. Read the new ticket more carefully, and it's the same as the other OCLC not working ticket. They just worded it as overlays not working because the user intended to overlay with records from OCLC. |
14:12 |
|
jvwoolf joined #evergreen |
14:13 |
|
jvwoolf1 joined #evergreen |
14:15 |
Dyrcona |
phasefx: If were' using the new Stripe API, should Stripe.pm come into play? I'm getting reports of internal server errors when people try to pay with a credit card, and I'm seeing "Can't connect to api" errors from Stripe.pm. |
14:15 |
Dyrcona |
It has been one of those days. |
14:16 |
Dyrcona |
Since all of the problems seem to be related to us talking to outside vendors, I'm starting to think something is wrong with the network at the colocation facility. |
14:22 |
rfrasur |
one_of_those_days-- |
14:50 |
csharp_ |
jeff: not yet |
14:50 |
csharp_ |
I was experimenting with tshark earlier (TCP on port 5222 listening on "any" or "lo") |
14:53 |
Dyrcona |
Well, don't reboot your load balancer, ever.... :) |
14:54 |
Dyrcona |
My networking issues were caused by iptables rules not loading when the load balancer was rebooted last night. I think an update obliterated the script that was loading the rules. |
14:55 |
|
rfrasur joined #evergreen |
14:55 |
csharp_ |
@praise The Rules |
14:55 |
* pinesol |
The Rules is very kind and good-looking and always does what's best for the project |
14:58 |
csharp_ |
https://www.reddit.com/r/ProgrammerHumor/comments/sdhsaf/programming/ - saw this last night and it hit home |
14:58 |
csharp_ |
still in the I HATE PROGRAMMING stage of grief |
15:01 |
Dyrcona |
That gets me right in the feels. :) |
15:30 |
csharp_ |
*sigh* - with tshark running I'm not seeing the issue |
15:31 |
csharp_ |
could be coincidence, but I haven't seen an event in 30 mins |
15:38 |
csharp_ |
ha! got one |
15:38 |
csharp_ |
...and immediately another |
15:42 |
Dyrcona |
Any clue what the problem is? |
16:20 |
|
jvwoolf1 left #evergreen |
17:11 |
|
mmorgan left #evergreen |
18:00 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |