Time |
Nick |
Message |
00:03 |
|
degraafk joined #evergreen |
07:19 |
|
collum joined #evergreen |
08:12 |
|
BDorsey joined #evergreen |
08:38 |
|
kworstell-isl joined #evergreen |
08:46 |
|
mmorgan joined #evergreen |
09:07 |
|
Dyrcona joined #evergreen |
09:09 |
|
dguarrac joined #evergreen |
09:34 |
|
sandbergja joined #evergreen |
10:01 |
Bmagic |
you guys |
10:02 |
Bmagic |
remember the login repeat thing? well, it didn't go away. And I went deeper. I was chasing down a possible idea that it had to do with an older database upgraded to 3.14. So I tried startiing with an older concerto set, from EG 3.11.1, and upgraded it up to 3.14.1, and setup EG to connect to it. No problem! |
10:04 |
Bmagic |
Because on a fresh concerto set, with EG 3.14.1, it's fine. So I figured it had something to do with and older db, and the upgrade scripts. But that wasn't it |
10:05 |
Bmagic |
long story short: it was this setting: auth.staff_timeout, was set to "0" |
10:05 |
Bmagic |
which apparently was fine in older versions of Evergreen, but not 3.14 |
10:14 |
Dyrcona |
I don't think setting that to 0 was ever meant to be fine. Looks like an unreported bug was fixed. |
10:22 |
|
stephengwills joined #evergreen |
10:25 |
mmorgan |
Bmagic++ |
10:48 |
Dyrcona |
Well, I have Aspen running, but it's not talking to a test Evergreen instance, yet. I suppose I could get answers in Slack, but I'll peruse the docs and the code first. |
10:51 |
|
kworstell-isl joined #evergreen |
11:26 |
pinesol |
News from commits: LP#2089419: fix parsing of offset/limit in C-based DB search methods <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=0458ae01ef6a84734efb7232bc1cdaf479dd3be8> |
11:53 |
|
jihpringle joined #evergreen |
11:55 |
Bmagic |
what's the best way to get that message out, about 3.14 respecting the auth timeout, even with a zero value? Email list? bug report? |
11:56 |
sandbergja |
Bmagic++ |
11:56 |
sleary |
maybe a little info callout in the MFA docs? |
11:57 |
sandbergja |
Could we add a db upgrade script that deletes settings that are 0? So that it happens automatically and people don't have to think about it/dig through the docs? |
12:00 |
* mmorgan |
likes the idea of an upgrade script. Also an upgrade note in the release notes? |
12:02 |
mmorgan |
Prior to mfa, what would be the effect of auth.staff_timeout = 0? Would that prevent timing out? Just wondering why someone might choose to set it that way. |
12:07 |
Bmagic |
I figured it meant "infinity" but I haven't looked at the code. |
12:08 |
|
Christineb joined #evergreen |
12:09 |
Bmagic |
this is what we came up with |
12:09 |
Bmagic |
update actor.org_unit_setting set value='"28800"' where value='"0"' and name='auth.staff_timeout'; |
12:09 |
Bmagic |
that's 8 hours, the thinking was if staff were used to never getting logged out, at least they wouldn't be too surprised in a single work day |
12:14 |
jeffdavis |
If I'm reading Open-ILS/src/c-apps/oils_auth_internal.c correctly, seems like 0 is the default timeout value if auth.staff_timeout is unset. |
12:14 |
jeffdavis |
Does the login issue occur on a 3.14 system where auth.staff_timeout is unset? |
12:15 |
Bmagic |
here's another finding: if I was using a browser that already had a workstation registered, I could login! (even with the timeout setting set to 0) - but if I needed to to the workstation registration dance, I would be in a login loop |
12:15 |
Bmagic |
jeffdavis: I didn't test that |
12:16 |
Bmagic |
I'll try it on a test machine |
12:16 |
mmorgan |
jeffdavis: I found that the issue does NOT happen when auth.staff_timeout is unset, but it would be great to see confirmation. |
12:17 |
Bmagic |
mmorgan jeffdavis: confirmed, no value (no row) in actor.org_unit_setting, I can still login |
12:18 |
Dyrcona |
jeffdavis: What branch are you looking at/ |
12:19 |
Bmagic |
FYI: I'm on main OpenSRF with the redis stuff merged |
12:19 |
|
jihpringle joined #evergreen |
12:19 |
Bmagic |
however, I've also tested on opensrf 3.3.2 (ejabberd) and I had the same problem/solution with the setting |
12:19 |
* mmorgan |
tested on 3.14.0 |
12:20 |
Bmagic |
mmorgan: can you confirm that a zero setting will result in the login loop? |
12:21 |
mmorgan |
Bmagic: Yes, I did observe that on 3.14.0, so can confirm. |
12:21 |
Bmagic |
and! you have to be sure you're not using a browser that has the workstation registered |
12:21 |
Bmagic |
oddly, if you've previously registered a workstation, you can login with the zero setting |
12:21 |
Dyrcona |
I wouldn't be surprised if MFA introduced JavaScript that's handling the timeout. |
12:22 |
jeffdavis |
Dyrcona: rel_3_14 (look for OILS_ORG_SETTING_STAFF_TIMEOUT) |
12:22 |
Bmagic |
it seems so straightforward talking about it here, but holy cow it too me so long to figure this out |
12:22 |
Dyrcona |
jeffdavis: I see the code in main, and it is indeed using 0 to mean no timeout in the c code. |
12:22 |
Bmagic |
too/\s |
12:23 |
mmorgan |
Bmagic: I had a workstation registered, BR1, then updated the settin to 0. Logged out and could NOT login again with that workstation. Login page just reloaded. |
12:23 |
Bmagic |
interesting |
12:24 |
jeffdavis |
I guess my point is that 0 seems to be a legit value for the staff timeout, so forcing it to a non-zero value is maybe not what we want to do here. |
12:27 |
|
collum joined #evergreen |
12:31 |
Dyrcona |
There isn't much in eg2 dealing with auth timeout. |
12:33 |
mmorgan |
Interesting. I can login and get successfully routed to a page that is not angular, like /eg/staff/circ/patron/search. I can search for a patron, but navigating to an eg2 page gives me the login screen. |
12:34 |
Bmagic |
mmorgan++ - that's what I was seeing |
12:34 |
Dyrcona |
Sounds like a case for git bisect.... |
12:34 |
Bmagic |
I think it's because the workstation registration was eg2 |
12:35 |
Bmagic |
funny though, I could manually open the eg workstation registration page (sometimes) and register a workstation on the eg side, then re-login with the workstation and not have the issue |
12:36 |
Dyrcona |
0 seems like a bad value for an infinite time out when the setting is meant to be an interval. -1 is probably better. |
12:36 |
Bmagic |
thinking about it more, the magic trick was probably having "route to" in the query string on the login page. Where the route to was an eg page and* the login page was eg |
12:37 |
Dyrcona |
I suspect you're overthinking it, and the problem is likely simpler than that. some eg2 code is getting the time out in seconds and treating 0 as 0, not as infinity. |
12:38 |
Bmagic |
I get it, yes, eg2 JS is treating 0 differently than eg. I understand that. I'm pontificating and going over in my head the various ways I was able to overcome the issue during testing. And I think it was when I was able to never touch eg2 during the auth process |
12:38 |
Dyrcona |
OK> |
12:41 |
Dyrcona |
The core eg2 auth service looks like it gets the timeout from the backend. |
12:43 |
Dyrcona |
Staff component looks like it uses the auth service. |
12:45 |
Dyrcona |
The staff mfa component does not appear to check the staff timeout. Looks like it has timeouts for webauth requests, though. |
12:49 |
Dyrcona |
The login component doesn't do anything directly with the timeout, either. |
12:49 |
Dyrcona |
There's code to log you out if the route_to path fails to match a given regex. |
12:50 |
Dyrcona |
Well, the oninit logs you out. |
12:50 |
Bmagic |
interesting |
12:52 |
Dyrcona |
I wonder what calls the login not allowed component? Not having staff login permission? |
12:57 |
Dyrcona |
I can't find anything that uses the auth service's authtime method other than itself. |
12:58 |
Dyrcona |
The MFA perl code doesn't seem to care about the staff timeout either. |
13:06 |
Dyrcona |
I wonder if the provisional auth session code might have something to do with it? |
13:07 |
Bmagic |
I assume it's setting the memcached expire date |
13:07 |
Dyrcona |
Line 441 of oils_auth_internal.c does this: osrfCachePutObject(authKey, provisionalSessionObject, (time_t) timeout); |
13:07 |
Dyrcona |
I was just looking at that when you said.... |
13:07 |
Bmagic |
I bet that's it |
13:08 |
Dyrcona |
Might be. I want to look again at how timeout is set in this case. |
13:08 |
Bmagic |
subsequent page loads, checks memcached for an auth token and doesn't find one: back to the login screen |
13:09 |
Dyrcona |
I guess the question is: Does regular login session do that? |
13:10 |
|
abowling joined #evergreen |
13:10 |
abowling |
Anyone have any quick thoughts? |
13:10 |
abowling |
[perl:error] [pid 61825] [client 67.59.82.208:0] Apache2::RequestIO::print: (32) Broken pipe at /usr/lib/x86_64-linux-gnu/perl5/5.30/Template.pm line 180 |
13:10 |
abowling |
[perl:error] [pid 61825] [client 67.59.82.208:0] Apache2::RequestIO::print: (32) Broken pipe at /usr/lib/x86_64-linux-gnu/perl5/5.30/Template.pm line 180 |
13:11 |
abowling |
It's only happening on one brick |
13:11 |
abowling |
related... |
13:12 |
abowling |
[perl:error] [pid 61346] [client 98.142.39.107:0] get_suggestions() failed: DATABASE_QUERY_FAILED, referer: https://catalog.sage.eou.edu/eg/opac/home?query=new%20moon;qtype=keyword;fi%3Asearch_format=book;locg=129;detail_record_view=1 |
13:14 |
Dyrcona |
Bmagic: oilsAuthInternalCreateSession does pretty much the same thing on line 405. |
13:15 |
Dyrcona |
That line has been around since 2015 at least. |
13:16 |
Dyrcona |
abowling: Are all of the dependencies installed on that one brick? "Broken pipe" usually means two applications can't talk to each other. Maybe 1 is missing? |
13:17 |
Dyrcona |
Or maybe 1 crashed. |
13:17 |
abowling |
Dyrcona: that seems to be the right track. However, I can't find what's missing. |
13:19 |
Bmagic |
abowling: does this command produce any "ERR" osrf_control -l --diagnostic ? |
13:19 |
Bmagic |
(ran as opensrf) |
13:19 |
abowling |
Bmagic: Nope. Clean as a whistle. |
13:20 |
Bmagic |
and you've restarted the stack? After that, you're still broken? |
13:20 |
abowling |
yes, sir |
13:20 |
abowling |
It's a scratcher |
13:20 |
Bmagic |
I'd track backward then, and run the configure step, followed by make , make install |
13:21 |
Bmagic |
what version of Evergreen? and is it running in a docker container? |
13:24 |
abowling |
3.13.5; no docker |
13:25 |
|
jihpringle joined #evergreen |
13:26 |
Dyrcona |
Bmagic: There is a difference between the normal create session code and the upgrade provisional session code. The former calls oilsGetAuthTimeout before setting it in the cache. The latter does not. I'm not saying that's the bug, but it might be worth looking into. |
13:32 |
Bmagic |
abowling: not sure then, rebuild? |
13:32 |
Bmagic |
Dyrcona++ |
13:32 |
Dyrcona |
abowling: Anything in the system logs about segfaults or running out of ram, oomkiller and the like? |
13:33 |
* Dyrcona |
suspects Redis OpenSRF of being a bit hungry when it comes to RAM. |
13:40 |
abowling |
Dyrcona++ |
13:41 |
abowling |
Dyrcona: it's always something! Ran out of f---ing space |
13:42 |
Dyrcona |
abowling++ They tell me drives are cheap. :) |
13:48 |
|
jonadab joined #evergreen |
13:49 |
|
mantis joined #evergreen |
13:52 |
Bmagic |
disk was full, classic |
14:29 |
|
Dyrcona joined #evergreen |
14:30 |
Dyrcona |
Nice. My home Internet is out. |
14:30 |
Bmagic |
home internet out, classic |
14:30 |
Dyrcona |
Right in the middle of running a command on a remote server no less. |
14:31 |
Bmagic |
emotions run high when the internet drops |
14:32 |
Dyrcona |
Maybe. I'm being pretty cool about it. It's not my equipment, I can ssh to my router's internal IP over the WiFi, but it cannot ping anything, so it is the telco equipment. |
14:33 |
Bmagic |
Dyrcona++ # level head |
14:39 |
Dyrcona |
Nice to have a phone to use as a fallback. |
14:40 |
Bmagic |
yes! I've often thought about the one major difference between <2010 and >2010, and it's phone tethering has helped (at least me) a lot |
14:41 |
Dyrcona |
I'm also on the same phone with Verizon support. |
14:42 |
Bmagic |
it's nice to be able to use the cell network to find Google results about your home internet carrier outage reports :) |
14:43 |
Bmagic |
hearing this for the first time, caught my ears: https://www.youtube.com/watch?v=6S4ToE8oGVw |
15:07 |
|
Dyrcona joined #evergreen |
15:12 |
Dyrcona |
Always the simple things.... |
15:12 |
Dyrcona |
@dunno |
15:12 |
pinesol |
Dyrcona: have you tried local mean solar time for the named city as the reference point? |
15:13 |
Dyrcona |
pinesol: No, but I did try turning it off and back on again. |
15:13 |
pinesol |
Dyrcona: http://images.cryhavok.org/d/1291-1/Computer+Rage.gif |
15:13 |
Dyrcona |
Yeah, pretty much. |
15:14 |
Dyrcona |
Oh, right. I was in the middle of installing some backports on a 3.7.4 test installation. |
15:15 |
Dyrcona |
I had just run the 'chown' command and when it seemed to take too long, that's when I knew something was up. :) |
15:23 |
Dyrcona |
Sometimes, you just gotta run desktop-clear.... |
15:30 |
|
mantis left #evergreen |
17:01 |
|
mmorgan left #evergreen |
18:24 |
|
jihpringle joined #evergreen |
21:38 |
|
stephengwills left #evergreen |
22:19 |
|
csharp_ joined #evergreen |
22:19 |
|
scottangel joined #evergreen |
22:19 |
|
Jaysal joined #evergreen |
22:22 |
|
book` joined #evergreen |