Evergreen ILS Website

IRC log for #evergreen, 2020-12-14

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
03:22 jamesrf joined #evergreen
03:22 bshum joined #evergreen
03:22 mrisher joined #evergreen
03:22 abowling joined #evergreen
03:22 akilsdonk joined #evergreen
03:22 rhamby joined #evergreen
03:22 miker joined #evergreen
03:22 phasefx joined #evergreen
03:22 abneiman joined #evergreen
03:22 JBoyer joined #evergreen
03:22 laurie joined #evergreen
03:22 jeffdavis joined #evergreen
03:22 jonadab joined #evergreen
03:22 yar joined #evergreen
03:22 drigney joined #evergreen
03:22 jweston joined #evergreen
03:22 pinesol joined #evergreen
03:22 csharp joined #evergreen
03:22 pastebot joined #evergreen
03:22 eby joined #evergreen
03:22 troy__ joined #evergreen
03:22 devted joined #evergreen
03:22 ejk_ joined #evergreen
03:22 Bmagic joined #evergreen
03:22 dickreckard joined #evergreen
03:22 awitter joined #evergreen
03:22 book`_ joined #evergreen
03:22 jeff joined #evergreen
03:22 egbuilder joined #evergreen
03:22 genpaku joined #evergreen
03:22 kip joined #evergreen
03:22 gmcharlt joined #evergreen
03:22 yeats joined #evergreen
03:22 dbs joined #evergreen
03:22 RBecker joined #evergreen
03:22 berick joined #evergreen
03:22 dluch joined #evergreen
06:01 pinesol News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
07:26 Dyrcona joined #evergreen
07:35 rjackson_isl_hom joined #evergreen
08:05 mantis1 joined #evergreen
08:34 collum joined #evergreen
08:38 mmorgan joined #evergreen
08:57 rfrasur joined #evergreen
09:19 nfBurton joined #evergreen
09:46 tlittle joined #evergreen
10:18 csharp can anyone advise me on the best way to troubleshoot NOT CONNECTED TO THE NETWORK errors?
10:18 csharp 2020-12-14 09:24:44 brick04-head open-ils.actor: [ERR :103194:EX.pm:66:16079558581032824] Exception: OpenSRF::EX::Session 2020-12-14T09:24:44 OpenSRF::Utils::Logger /usr/local/share/perl/5.22.1​/OpenSRF/Utils/Logger.pm:243 Session Error: o
10:18 csharp pensrf@public.brick04-head.gapines.org/_br​ick04-head_1607955858.346341_103282 IS NOT CONNECTED TO THE NETWORK!!!
10:19 csharp what can I glean from that that might point me to potential causes?
10:19 csharp _brick04-head_1607955858.346341_103282 - does this contain useful information?
10:19 csharp it doesn't seem consistent from log error to log error
10:20 csharp oh wait - maybe it is
10:21 * csharp rolls up sleeves to dive into C code
10:23 berick csharp: the main info of value there is the pid, brick, log trace, and time.  there's nothing particularly meaningful in the error message, apart from 'not connected'
10:24 berick there are likely errors preceding this one w/ more info
10:25 Dyrcona csharp: Most of the time when I look into it, the NOT CONNECTED message comes $TIMEOUT_VALUE + 1 second after the request.
10:25 csharp berick: Dyrcona: thanks for the pointers
10:26 Dyrcona I just accept it as "normal."
10:27 csharp our libraries will not accept that :-(
10:27 csharp well, I mean that the resulting "system instability" on the end users' end is not acceptable to them
10:28 dbwells joined #evergreen
10:28 csharp and this is a "new" problem that popped up in the last couple of months, happening a couple of times per week
10:28 csharp starting to feel like the bad old days of PINES
10:28 Dyrcona We get it multiple times a day, and always have AFAIK.
10:29 Dyrcona Throw more hardware at it. ;)
10:29 csharp I did up the actor max_children in hopes of forestalling it
10:29 Dyrcona I find it is not really related to running out of drones.
10:29 csharp but it's like adding lanes to the freeway - they just fill up the new lanes
10:30 Dyrcona Sometimes, it may be.
10:30 csharp in these cases, when I discover the problem, actor drones are near 100%
10:30 Dyrcona Usually looks like a client timing out, waiting on CStore (i.e. the database), going away, and the router not having anywhere to send the response that finally comes back.
10:31 csharp 2020-12-14 09:24:12 brick04-head open-ils.actor: [WARN:108187:Server.pm:200:16076285143703913435] server: no children available, waiting... consider increasing max_children for this application higher than 192 in the OpenSRF configuration if this message occurs frequently
10:31 csharp that preceded the outage by about 30 seconds
10:31 Dyrcona I don't always find that those messages coincide.
10:31 Dyrcona The often do, though.
10:32 Dyrcona And, I just now see this email from Nagios, so maybe I'll take a look: CRIT: 4 NOT CONNECTEDs returned this hour: (Top server this hour:       3 bd2-bh5)
10:37 Dyrcona It's something else in our case. bd2-bh5 is only running 8% of open-ils.actor drones, 12/150.
10:38 csharp ok
10:38 csharp so is this only discoverable by filtering the logs for NOT CONNECTED errors?
10:39 csharp osrf_control --diagnostic doesn't seem to know about this problem
10:39 Dyrcona Well, no. Diagnostic only reports how many are currently running. And, the not connected often look like clients to me, but what do I know? :)
10:40 Dyrcona Also, I think I just lost the firewall or something at CW MARS....
11:22 * mmorgan reads the backscroll with interest
11:24 mmorgan We see the NOT CONNECTEDs at times as well. Does this manifest to the user as a frozen client?
11:25 Dyrcona mmorgan: It could depending on what's going on.
11:26 * mmorgan nods
11:34 Christineb joined #evergreen
11:46 csharp mmorgan: in our multi-brick setup, it appears to the end user to be "unstable" because it sometimes works (when they're hitting working bricks)
11:47 mmorgan csharp: Ok, thanks, we have had that experience.
11:48 berick csharp: testing the fixes for bug 1896285 might help
11:48 pinesol Launchpad bug 1896285 in Evergreen "Use batch methods for multi-row grid actions" [Medium,Confirmed] https://launchpad.net/bugs/1896285
12:09 csharp berick: that's on my to-do for this exact reason - thanks
12:09 jihpringle joined #evergreen
12:14 Dyrcona Using nginx as a proxy on the brick head, I find it seems to be hit or miss with logging the remote IP in the apache logs. It appears to only be logging it for certain errors, but I need to do a more thorough investigation.
12:16 Dyrcona Y'know, maybe its my log format. Never mind.
12:21 Dyrcona Eh, no. RemoteIP appears to not be working. I'm getting 127.0.0.1 for pretty much all of the log entries.
12:24 Dyrcona Hmm. Could be my nginx configuration is wrong.....
12:34 Dyrcona Yeah. I think I have the wrong variable being used.
12:34 * Dyrcona will fix it for tomorrow morning.
13:11 alynn26 joined #evergreen
13:21 tlittle In the Angular fm-editor, it defines what should be shown as placeholder text. Can you modify that per modal, or do you just currently auto-inherit that and that's the end of it? I'm looking at bug 1906862 again and want to confirm that I haven't just missed it somewhere that you can do that.
13:21 pinesol Launchpad bug 1906862 in Evergreen "Angular Providers: Should not show text in entry fields" [Undecided,New] https://launchpad.net/bugs/1906862
13:55 sandbergja joined #evergreen
14:00 berick tlittle: the placeholders hard coded.  i could imagine a feature that disables placeholders, though, and/or lets you specify them
14:04 tlittle berick thanks! When I was poking around earlier, I was pondering whether that would be something that you could do through the fm-editor TS file, kind of like how you can specify fieldorder. Even if it was just "show placeholders"=yes/no.
14:05 berick tlittle: yep, .ts and the .html file.  could add a @Input() hidePlaceholders = false then avoid adding them in the html if the value is true
14:07 tlittle Oh neat! Maybe I'll take a crack at that. :)  berick++
14:20 jihpringle joined #evergreen
14:35 jeff How it started: UPDATE action.circulation AS circ SET due_date = '2020-04-16 23:59:59-0400' WHERE [...]
14:35 jeff How it's going: UPDATE action.circulation AS circ SET due_date = '2020-12-28 23:59:59-0500' WHERE […]
14:35 jeff (oh, we were so optimistic back in March!)
14:45 mmorgan jeff: Or maybe just naive
14:46 Dyrcona So, I'm not getting remote ips in my Apache logs with nginx as the proxy. I tried fixing the configration on a test vm, but I'm still getting 127.0.0.1 after restarting both nginx and apache.
14:46 Dyrcona Does anyone have an example configuration that works?
14:47 Dyrcona jeff: We still have rolling updates for due dates at two of our member libraries.
14:52 jihpringle joined #evergreen
14:55 Dyrcona I've tried passing $remote_addr and $proxy_protocol_addr in the X-Forwarded-For header but neither works.
14:57 Dyrcona I started out with $proxy_add_x_forwarded_for.
14:59 Dyrcona mod_remoteip is enabled and the directives to use the X-Forwarded-For header are set up, along with the internal proxy ip address.
14:59 berick was about to ask
15:05 laurie joined #evergreen
15:05 berick Dyrcona: are you only seeing this issue with websockets requests?
15:09 Dyrcona No. It's with regular Apache requests.
15:10 Dyrcona AFAICT, it should be working, and I thought it was working once.
15:10 Dyrcona I do see the remote ip on some SSL info/error messages.
15:23 Dyrcona I guess I'll stick with what I've got since none of the other changes seem to work, either.
15:29 mantis1 left #evergreen
15:46 jeff I laughed:
15:46 jeff Due to a recent COVID-19 exposure, the library is closed until Dec 28. Curbside service is also suspended. Your item(s) including TEN LESSONS FOR A POST-PANDEMIC WORLD will be held until service resumes. More info will be posted on the library web site.
15:48 berick heh
15:49 mmorgan Probably won't need that book for a while yet, anyway :-/
15:59 csharp jeff++
16:09 Cocopuff2018 joined #evergreen
16:57 csharp happened again - at 4:30 p.m. EST, our open-ils.actor drone count was 12/192 - just now it's 192/192
16:57 csharp and a wall of NOT CONNECTED errors
16:58 sandbergja joined #evergreen
16:59 csharp berick: I tested your branches for bug 1896285 on a smallish test server and it kept the open-ils.actor count low but I saw a spike in pcrud drones (small use case)
16:59 pinesol Launchpad bug 1896285 in Evergreen "Use batch methods for multi-row grid actions" [Medium,Confirmed] https://launchpad.net/bugs/1896285
16:59 csharp berick: would the fix for the patron buckets (which I don't think are widely used in PINES) help?
17:00 berick the fixes only address the specific work flows
17:00 csharp I'm interested in tracking down the exact call(s) that spiked this brick
17:00 csharp ok
17:00 csharp that's what I thought
17:00 berick if you find more, i'll do what I can to patch
17:00 csharp berick++
17:02 sandbergja joined #evergreen
17:02 mmorgan left #evergreen
17:05 csharp berick: looks like a crazy sh*t ton of these: 2020-12-14 16:45:37 brick01-head gateway: [ACT:61996:osrf-websocket-st​dio.c:559:16079823286199641] [127.0.0.1] [] open-ils.actor open-ils.actor.ou_setting.ancestor_default.batch 178, ["cat.default_copy_status_normal"],
17:05 csharp they started about 10/12 seconds before the NOT CONNECTED errors
17:07 berick yeah, looks like it's called with each new copy, which could be a lot
17:10 csharp I can confirm that the same call is repeated over and over during this morning's problems too
17:10 berick csharp: mind adding a note to https://bugs.launchpad.net/evergreen/+bug/1896285 ?
17:10 pinesol Launchpad bug 1896285 in Evergreen "Use batch methods for multi-row grid actions" [Medium,Confirmed]
17:12 csharp berick: done - thanks!
17:19 Dyrcona joined #evergreen
17:20 Dyrcona I signed back in to say that we ran out of open-ils.actor drones this afternoon on brick 6, the one that I replaced this morning. It happened just a bit after I clocked out for the day.
17:20 csharp Dyrcona: did you see the scrollback from the last 20-30 mins?
17:20 Dyrcona We need to fix the cause of this instead of increasing the number of drones to paper over it.
17:21 csharp yeah, that didn't help us at all
17:21 csharp open-ils.actor.ou_setting.ancestor_default.batch 178, ["cat.default_copy_status_normal"]
17:21 csharp see if that's happening in crazy numbers in your activity log
17:22 berick i've reproduced and working on a patch now
17:23 csharp berick: awesome
17:25 Dyrcona Oh. I've seen that an more. Over 30,000 requests for the same setting from the same client within a matter of minutes.
17:25 Dyrcona We're still on 3.2.
17:26 csharp Dyrcona: good to know
17:26 csharp we're on 3.4, heading to 3.6 next month
17:40 Dyrcona We'll be going to 3.6 in April, probably.
17:40 Dyrcona We make big jumps these days.
17:41 sandbergja_ joined #evergreen
17:41 sandbergja joined #evergreen
17:43 sandbergja I have a hold that really seems like it should target a specific copy.  action.hold_request_permit_test says that everything's good.  But retargeting the hold never targets that (or any other) copy.  Any tips for my next line of troubleshooting?
17:45 sandbergja Never mind, it actually is targeting it properly.  It just won't go into transit when we check it in, despite being the targeted copy
17:46 sandbergja And never mind my never mind -- I was looking at the wrong column
17:47 sandbergja nothing in current_copy
17:56 dbwells joined #evergreen
18:00 pinesol News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
18:01 Dyrcona So, I tried restarting the service, and the Listener would not die. I had to kill it with -9, i.e. fire.
18:01 sandbergja__ joined #evergreen
18:03 jihpringle joined #evergreen
18:04 berick csharp: fix pushed
18:50 csharp berick: rock on - will test very soon
20:21 sandbergja__ joined #evergreen
20:33 csharp berick - tested fine on my test server - I'll let you know tomorrow how it looks with PINES data
22:34 sandbergja__ joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat