Time |
Nick |
Message |
03:22 |
|
jamesrf joined #evergreen |
03:22 |
|
bshum joined #evergreen |
03:22 |
|
mrisher joined #evergreen |
03:22 |
|
abowling joined #evergreen |
03:22 |
|
akilsdonk joined #evergreen |
03:22 |
|
rhamby joined #evergreen |
03:22 |
|
miker joined #evergreen |
03:22 |
|
phasefx joined #evergreen |
03:22 |
|
abneiman joined #evergreen |
03:22 |
|
JBoyer joined #evergreen |
03:22 |
|
laurie joined #evergreen |
03:22 |
|
jeffdavis joined #evergreen |
03:22 |
|
jonadab joined #evergreen |
03:22 |
|
yar joined #evergreen |
03:22 |
|
drigney joined #evergreen |
03:22 |
|
jweston joined #evergreen |
03:22 |
|
pinesol joined #evergreen |
03:22 |
|
csharp joined #evergreen |
03:22 |
|
pastebot joined #evergreen |
03:22 |
|
eby joined #evergreen |
03:22 |
|
troy__ joined #evergreen |
03:22 |
|
devted joined #evergreen |
03:22 |
|
ejk_ joined #evergreen |
03:22 |
|
Bmagic joined #evergreen |
03:22 |
|
dickreckard joined #evergreen |
03:22 |
|
awitter joined #evergreen |
03:22 |
|
book`_ joined #evergreen |
03:22 |
|
jeff joined #evergreen |
03:22 |
|
egbuilder joined #evergreen |
03:22 |
|
genpaku joined #evergreen |
03:22 |
|
kip joined #evergreen |
03:22 |
|
gmcharlt joined #evergreen |
03:22 |
|
yeats joined #evergreen |
03:22 |
|
dbs joined #evergreen |
03:22 |
|
RBecker joined #evergreen |
03:22 |
|
berick joined #evergreen |
03:22 |
|
dluch joined #evergreen |
06:01 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
07:26 |
|
Dyrcona joined #evergreen |
07:35 |
|
rjackson_isl_hom joined #evergreen |
08:05 |
|
mantis1 joined #evergreen |
08:34 |
|
collum joined #evergreen |
08:38 |
|
mmorgan joined #evergreen |
08:57 |
|
rfrasur joined #evergreen |
09:19 |
|
nfBurton joined #evergreen |
09:46 |
|
tlittle joined #evergreen |
10:18 |
csharp |
can anyone advise me on the best way to troubleshoot NOT CONNECTED TO THE NETWORK errors? |
10:18 |
csharp |
2020-12-14 09:24:44 brick04-head open-ils.actor: [ERR :103194:EX.pm:66:16079558581032824] Exception: OpenSRF::EX::Session 2020-12-14T09:24:44 OpenSRF::Utils::Logger /usr/local/share/perl/5.22.1/OpenSRF/Utils/Logger.pm:243 Session Error: o |
10:18 |
csharp |
pensrfpublic.brick04-head.gapines.org/_brick04-head_1607955858.346341_103282 IS NOT CONNECTED TO THE NETWORK!!! |
10:19 |
csharp |
what can I glean from that that might point me to potential causes? |
10:19 |
csharp |
_brick04-head_1607955858.346341_103282 - does this contain useful information? |
10:19 |
csharp |
it doesn't seem consistent from log error to log error |
10:20 |
csharp |
oh wait - maybe it is |
10:21 |
* csharp |
rolls up sleeves to dive into C code |
10:23 |
berick |
csharp: the main info of value there is the pid, brick, log trace, and time. there's nothing particularly meaningful in the error message, apart from 'not connected' |
10:24 |
berick |
there are likely errors preceding this one w/ more info |
10:25 |
Dyrcona |
csharp: Most of the time when I look into it, the NOT CONNECTED message comes $TIMEOUT_VALUE + 1 second after the request. |
10:25 |
csharp |
berick: Dyrcona: thanks for the pointers |
10:26 |
Dyrcona |
I just accept it as "normal." |
10:27 |
csharp |
our libraries will not accept that :-( |
10:27 |
csharp |
well, I mean that the resulting "system instability" on the end users' end is not acceptable to them |
10:28 |
|
dbwells joined #evergreen |
10:28 |
csharp |
and this is a "new" problem that popped up in the last couple of months, happening a couple of times per week |
10:28 |
csharp |
starting to feel like the bad old days of PINES |
10:28 |
Dyrcona |
We get it multiple times a day, and always have AFAIK. |
10:29 |
Dyrcona |
Throw more hardware at it. ;) |
10:29 |
csharp |
I did up the actor max_children in hopes of forestalling it |
10:29 |
Dyrcona |
I find it is not really related to running out of drones. |
10:29 |
csharp |
but it's like adding lanes to the freeway - they just fill up the new lanes |
10:30 |
Dyrcona |
Sometimes, it may be. |
10:30 |
csharp |
in these cases, when I discover the problem, actor drones are near 100% |
10:30 |
Dyrcona |
Usually looks like a client timing out, waiting on CStore (i.e. the database), going away, and the router not having anywhere to send the response that finally comes back. |
10:31 |
csharp |
2020-12-14 09:24:12 brick04-head open-ils.actor: [WARN:108187:Server.pm:200:16076285143703913435] server: no children available, waiting... consider increasing max_children for this application higher than 192 in the OpenSRF configuration if this message occurs frequently |
10:31 |
csharp |
that preceded the outage by about 30 seconds |
10:31 |
Dyrcona |
I don't always find that those messages coincide. |
10:31 |
Dyrcona |
The often do, though. |
10:32 |
Dyrcona |
And, I just now see this email from Nagios, so maybe I'll take a look: CRIT: 4 NOT CONNECTEDs returned this hour: (Top server this hour: 3 bd2-bh5) |
10:37 |
Dyrcona |
It's something else in our case. bd2-bh5 is only running 8% of open-ils.actor drones, 12/150. |
10:38 |
csharp |
ok |
10:38 |
csharp |
so is this only discoverable by filtering the logs for NOT CONNECTED errors? |
10:39 |
csharp |
osrf_control --diagnostic doesn't seem to know about this problem |
10:39 |
Dyrcona |
Well, no. Diagnostic only reports how many are currently running. And, the not connected often look like clients to me, but what do I know? :) |
10:40 |
Dyrcona |
Also, I think I just lost the firewall or something at CW MARS.... |
11:22 |
* mmorgan |
reads the backscroll with interest |
11:24 |
mmorgan |
We see the NOT CONNECTEDs at times as well. Does this manifest to the user as a frozen client? |
11:25 |
Dyrcona |
mmorgan: It could depending on what's going on. |
11:26 |
* mmorgan |
nods |
11:34 |
|
Christineb joined #evergreen |
11:46 |
csharp |
mmorgan: in our multi-brick setup, it appears to the end user to be "unstable" because it sometimes works (when they're hitting working bricks) |
11:47 |
mmorgan |
csharp: Ok, thanks, we have had that experience. |
11:48 |
berick |
csharp: testing the fixes for bug 1896285 might help |
11:48 |
pinesol |
Launchpad bug 1896285 in Evergreen "Use batch methods for multi-row grid actions" [Medium,Confirmed] https://launchpad.net/bugs/1896285 |
12:09 |
csharp |
berick: that's on my to-do for this exact reason - thanks |
12:09 |
|
jihpringle joined #evergreen |
12:14 |
Dyrcona |
Using nginx as a proxy on the brick head, I find it seems to be hit or miss with logging the remote IP in the apache logs. It appears to only be logging it for certain errors, but I need to do a more thorough investigation. |
12:16 |
Dyrcona |
Y'know, maybe its my log format. Never mind. |
12:21 |
Dyrcona |
Eh, no. RemoteIP appears to not be working. I'm getting 127.0.0.1 for pretty much all of the log entries. |
12:24 |
Dyrcona |
Hmm. Could be my nginx configuration is wrong..... |
12:34 |
Dyrcona |
Yeah. I think I have the wrong variable being used. |
12:34 |
* Dyrcona |
will fix it for tomorrow morning. |
13:11 |
|
alynn26 joined #evergreen |
13:21 |
tlittle |
In the Angular fm-editor, it defines what should be shown as placeholder text. Can you modify that per modal, or do you just currently auto-inherit that and that's the end of it? I'm looking at bug 1906862 again and want to confirm that I haven't just missed it somewhere that you can do that. |
13:21 |
pinesol |
Launchpad bug 1906862 in Evergreen "Angular Providers: Should not show text in entry fields" [Undecided,New] https://launchpad.net/bugs/1906862 |
13:55 |
|
sandbergja joined #evergreen |
14:00 |
berick |
tlittle: the placeholders hard coded. i could imagine a feature that disables placeholders, though, and/or lets you specify them |
14:04 |
tlittle |
berick thanks! When I was poking around earlier, I was pondering whether that would be something that you could do through the fm-editor TS file, kind of like how you can specify fieldorder. Even if it was just "show placeholders"=yes/no. |
14:05 |
berick |
tlittle: yep, .ts and the .html file. could add a @Input() hidePlaceholders = false then avoid adding them in the html if the value is true |
14:07 |
tlittle |
Oh neat! Maybe I'll take a crack at that. :) berick++ |
14:20 |
|
jihpringle joined #evergreen |
14:35 |
jeff |
How it started: UPDATE action.circulation AS circ SET due_date = '2020-04-16 23:59:59-0400' WHERE [...] |
14:35 |
jeff |
How it's going: UPDATE action.circulation AS circ SET due_date = '2020-12-28 23:59:59-0500' WHERE […] |
14:35 |
jeff |
(oh, we were so optimistic back in March!) |
14:45 |
mmorgan |
jeff: Or maybe just naive |
14:46 |
Dyrcona |
So, I'm not getting remote ips in my Apache logs with nginx as the proxy. I tried fixing the configration on a test vm, but I'm still getting 127.0.0.1 after restarting both nginx and apache. |
14:46 |
Dyrcona |
Does anyone have an example configuration that works? |
14:47 |
Dyrcona |
jeff: We still have rolling updates for due dates at two of our member libraries. |
14:52 |
|
jihpringle joined #evergreen |
14:55 |
Dyrcona |
I've tried passing $remote_addr and $proxy_protocol_addr in the X-Forwarded-For header but neither works. |
14:57 |
Dyrcona |
I started out with $proxy_add_x_forwarded_for. |
14:59 |
Dyrcona |
mod_remoteip is enabled and the directives to use the X-Forwarded-For header are set up, along with the internal proxy ip address. |
14:59 |
berick |
was about to ask |
15:05 |
|
laurie joined #evergreen |
15:05 |
berick |
Dyrcona: are you only seeing this issue with websockets requests? |
15:09 |
Dyrcona |
No. It's with regular Apache requests. |
15:10 |
Dyrcona |
AFAICT, it should be working, and I thought it was working once. |
15:10 |
Dyrcona |
I do see the remote ip on some SSL info/error messages. |
15:23 |
Dyrcona |
I guess I'll stick with what I've got since none of the other changes seem to work, either. |
15:29 |
|
mantis1 left #evergreen |
15:46 |
jeff |
I laughed: |
15:46 |
jeff |
Due to a recent COVID-19 exposure, the library is closed until Dec 28. Curbside service is also suspended. Your item(s) including TEN LESSONS FOR A POST-PANDEMIC WORLD will be held until service resumes. More info will be posted on the library web site. |
15:48 |
berick |
heh |
15:49 |
mmorgan |
Probably won't need that book for a while yet, anyway :-/ |
15:59 |
csharp |
jeff++ |
16:09 |
|
Cocopuff2018 joined #evergreen |
16:57 |
csharp |
happened again - at 4:30 p.m. EST, our open-ils.actor drone count was 12/192 - just now it's 192/192 |
16:57 |
csharp |
and a wall of NOT CONNECTED errors |
16:58 |
|
sandbergja joined #evergreen |
16:59 |
csharp |
berick: I tested your branches for bug 1896285 on a smallish test server and it kept the open-ils.actor count low but I saw a spike in pcrud drones (small use case) |
16:59 |
pinesol |
Launchpad bug 1896285 in Evergreen "Use batch methods for multi-row grid actions" [Medium,Confirmed] https://launchpad.net/bugs/1896285 |
16:59 |
csharp |
berick: would the fix for the patron buckets (which I don't think are widely used in PINES) help? |
17:00 |
berick |
the fixes only address the specific work flows |
17:00 |
csharp |
I'm interested in tracking down the exact call(s) that spiked this brick |
17:00 |
csharp |
ok |
17:00 |
csharp |
that's what I thought |
17:00 |
berick |
if you find more, i'll do what I can to patch |
17:00 |
csharp |
berick++ |
17:02 |
|
sandbergja joined #evergreen |
17:02 |
|
mmorgan left #evergreen |
17:05 |
csharp |
berick: looks like a crazy sh*t ton of these: 2020-12-14 16:45:37 brick01-head gateway: [ACT:61996:osrf-websocket-stdio.c:559:16079823286199641] [127.0.0.1] [] open-ils.actor open-ils.actor.ou_setting.ancestor_default.batch 178, ["cat.default_copy_status_normal"], |
17:05 |
csharp |
they started about 10/12 seconds before the NOT CONNECTED errors |
17:07 |
berick |
yeah, looks like it's called with each new copy, which could be a lot |
17:10 |
csharp |
I can confirm that the same call is repeated over and over during this morning's problems too |
17:10 |
berick |
csharp: mind adding a note to https://bugs.launchpad.net/evergreen/+bug/1896285 ? |
17:10 |
pinesol |
Launchpad bug 1896285 in Evergreen "Use batch methods for multi-row grid actions" [Medium,Confirmed] |
17:12 |
csharp |
berick: done - thanks! |
17:19 |
|
Dyrcona joined #evergreen |
17:20 |
Dyrcona |
I signed back in to say that we ran out of open-ils.actor drones this afternoon on brick 6, the one that I replaced this morning. It happened just a bit after I clocked out for the day. |
17:20 |
csharp |
Dyrcona: did you see the scrollback from the last 20-30 mins? |
17:20 |
Dyrcona |
We need to fix the cause of this instead of increasing the number of drones to paper over it. |
17:21 |
csharp |
yeah, that didn't help us at all |
17:21 |
csharp |
open-ils.actor.ou_setting.ancestor_default.batch 178, ["cat.default_copy_status_normal"] |
17:21 |
csharp |
see if that's happening in crazy numbers in your activity log |
17:22 |
berick |
i've reproduced and working on a patch now |
17:23 |
csharp |
berick: awesome |
17:25 |
Dyrcona |
Oh. I've seen that an more. Over 30,000 requests for the same setting from the same client within a matter of minutes. |
17:25 |
Dyrcona |
We're still on 3.2. |
17:26 |
csharp |
Dyrcona: good to know |
17:26 |
csharp |
we're on 3.4, heading to 3.6 next month |
17:40 |
Dyrcona |
We'll be going to 3.6 in April, probably. |
17:40 |
Dyrcona |
We make big jumps these days. |
17:41 |
|
sandbergja_ joined #evergreen |
17:41 |
|
sandbergja joined #evergreen |
17:43 |
sandbergja |
I have a hold that really seems like it should target a specific copy. action.hold_request_permit_test says that everything's good. But retargeting the hold never targets that (or any other) copy. Any tips for my next line of troubleshooting? |
17:45 |
sandbergja |
Never mind, it actually is targeting it properly. It just won't go into transit when we check it in, despite being the targeted copy |
17:46 |
sandbergja |
And never mind my never mind -- I was looking at the wrong column |
17:47 |
sandbergja |
nothing in current_copy |
17:56 |
|
dbwells joined #evergreen |
18:00 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
18:01 |
Dyrcona |
So, I tried restarting the service, and the Listener would not die. I had to kill it with -9, i.e. fire. |
18:01 |
|
sandbergja__ joined #evergreen |
18:03 |
|
jihpringle joined #evergreen |
18:04 |
berick |
csharp: fix pushed |
18:50 |
csharp |
berick: rock on - will test very soon |
20:21 |
|
sandbergja__ joined #evergreen |
20:33 |
csharp |
berick - tested fine on my test server - I'll let you know tomorrow how it looks with PINES data |
22:34 |
|
sandbergja__ joined #evergreen |