Time |
Nick |
Message |
06:01 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
07:42 |
|
collum joined #evergreen |
08:07 |
|
rjackson_isl_hom joined #evergreen |
08:15 |
|
mantis joined #evergreen |
08:36 |
|
mmorgan joined #evergreen |
09:14 |
|
rfrasur joined #evergreen |
09:20 |
|
Dyrcona joined #evergreen |
09:21 |
Dyrcona |
Who else thinks the pivot option should be removed from the reporter? |
09:26 |
csharp_ |
I think it doesn't offer a lot of value - Excel(orwhatever) does it so much better |
09:27 |
csharp_ |
most end users have no idea what it is, even if they wanted to use it |
09:27 |
rhamby |
I can agree with that. |
09:27 |
csharp_ |
we've tried to train our libraries to see the Evergreen reporter as just a source of raw data and not be depended on for end-user-friendly output |
09:27 |
Dyrcona |
It also crashes on our reports server with only 32GB of RAM. |
09:28 |
csharp_ |
oh - hmm |
09:28 |
rhamby |
If you're going to use a pivot table it's better to do it in the spreadsheet software. |
09:28 |
csharp_ |
huh - our reporter currently has 16GB of RAM and we don't see that trouble - it's running on blazing fast blades at ITS though |
09:29 |
Dyrcona |
I'm looking at what it takes to rip it out. I'll open a Lp bug after I do that. |
09:29 |
csharp_ |
storage is basically RAM in that case' |
09:29 |
csharp_ |
Dyrcona: I'm for it |
09:29 |
csharp_ |
as long as it doesn't have tendrils leading into a full redesign of the reporter |
09:29 |
Dyrcona |
Someone tried to run two circ reports yesterday with the pivot table options, and both reports exhausted the RAM on the server and the reports were put down by the OOM killer. |
09:30 |
Dyrcona |
Well, a full redesign of the reporter would not be a terrible idea. |
09:30 |
csharp_ |
very much agreed - just didn't want the removal of an annoying value-add as a driver for that redesign :-) |
09:31 |
Dyrcona |
Pretty much any time a report dies, two conditions have been met: 1) the reports server ran out of RAM and 2) the pivot options were used. |
09:31 |
csharp_ |
how many concurrent reports, btw? |
09:32 |
Dyrcona |
We allow up to 6 concurrent reports, usually never have more than two or three, except after several hours down time for an upgrade or whatever. |
09:32 |
Dyrcona |
I'll check the schedule for yesterday. |
09:32 |
csharp_ |
gotcha |
09:32 |
csharp_ |
we do 6 at a time too |
09:32 |
csharp_ |
at one point we were doing 12, but that was trouble |
09:37 |
Dyrcona |
We only had these two reports running at the time. Same report, probably. Looks like user scheduled it again at 7:00 pm when the one started at 5:35 pm hadn't finished, yet. |
09:38 |
Dyrcona |
Started getting memory warning emails around 6:30 PM, but I wasn't paying attention to work email at that time or after. |
09:39 |
Dyrcona |
Oh, well. I have a meeting in 20 minutes. I should at least look at the agenda. |
09:41 |
|
jvwoolf joined #evergreen |
09:48 |
|
Keith-isl joined #evergreen |
09:58 |
|
bshum joined #evergreen |
10:35 |
miker |
fwiw, removing pivot should be a matter of hiding some html elements. making it smarter (at UI reimplementation time) and showing it as "compare [aggregate colun] across [non-agg column]", and only when it's reasonable, would be possible, too |
11:19 |
csharp_ |
following up on yesterday's ejabberd findings - I can't tie the occurrences I'm looking at with a specific OpenSRF call yet |
11:19 |
csharp_ |
the message in the ejabberd log is 2021-12-15 10:20:42.390 [info] <0.31885.41>@ejabberd_c2s:process_terminated:271 (tcp|<0.31885.41>) Closing c2s session for opensrfprivate.brick03-head.gapines.org/open-ils.actor_listener_brick03-head.gapines.org_32971: Connection failed: timeout |
11:20 |
csharp_ |
when I've turned up ejabberd logging to debug it's apparently too much data and I end up with a truncated log where logging just stops midstream |
11:23 |
Dyrcona |
csharp_: I only ever use debug logging for brief periods, and I typically truncate the log before. |
11:24 |
Dyrcona |
It's not so useful in production. Better in a controlled environment where yours are the only requests happening. |
11:25 |
|
jihpringle joined #evergreen |
11:25 |
Dyrcona |
miker: The widgets look like their hidden to start with, and there is code to unhide them. I didn't get very far before my 10am meeting. I'm inclined to just yank all the pivot code. Let them use Excel (or LibreOffice)! |
11:26 |
Dyrcona |
there they're their, choose wisely. :) |
11:29 |
* Dyrcona |
sidles off to get lunch. |
11:29 |
csharp_ |
Dyrcona: yeah, that's the problem - this happens only sporadically and I have no idea what's causing it |
11:29 |
csharp_ |
so it's all or nothing :-/ |
11:29 |
* csharp_ |
digs around in the ejabberd source code for clues to where timeout is defined/handled |
11:36 |
miker |
csharp_: did you recently upgrade your perl? |
11:36 |
csharp_ |
yes - from 5.22? to 5.26 |
11:36 |
csharp_ |
(16.04 to 18.04 ubuntu) |
11:36 |
csharp_ |
currently on v5.26.1 |
11:38 |
csharp_ |
yes, 16.04 was on 5.22 |
11:40 |
miker |
csharp_: can you scan your logs for instances of "server: died with error" ? of particular interest are "Use of freed value" and "Can't kill a non-numeric process ID" |
11:40 |
miker |
re https://bugs.launchpad.net/opensrf/+bug/1953047 and https://bugs.launchpad.net/opensrf/+bug/1953044 |
11:40 |
pinesol |
Launchpad bug 1953047 in OpenSRF "Perl services can crash with a "Can't kill a non-numeric process ID" error" [Medium,New] |
11:40 |
pinesol |
Launchpad bug 1953044 in OpenSRF "Perl services can crash with a "Use of freed value in iteration" error" [Medium,Confirmed] |
11:41 |
miker |
in a high-drone-turnover environment, '044 can happen when a drone exits after max-requests at an inopportune time |
11:41 |
csharp_ |
I see the first two, but not 'Use of freed value' |
11:42 |
csharp_ |
I would *gladly* update OpenSRF to fix this though |
11:44 |
miker |
fwiw, perl 5.24 seems to be the boundary version where those two fixes are relevant. |
11:44 |
csharp_ |
ok, then I will apply them, by god |
11:45 |
csharp_ |
since they're perl, I can just hot patch them and rollback if necessary |
11:46 |
miker |
and they're both restricted to 1 file, so, easy peasy |
11:46 |
csharp_ |
yep |
12:23 |
mmorgan |
Has anyone gathered some experience with 3.7+ in a production database with saving/loading bib records? We're trying to get an idea of the effect of the symspell dictionaries for did you mean. |
12:26 |
csharp_ |
mmorgan: we're going to 3.8 next month - running it on a training/testing server that is production-ish, datawise |
12:26 |
csharp_ |
what should we be looking out for? |
12:26 |
jeff |
We're not relying on "did you mean" and did not populate the related tables as part of our upgrade. I've noticed a deadlock or two which have made me want to look into disabling the relevant triggers until I have time to look into it more. |
12:28 |
jihpringle |
mmorgan: we're running 3.7.0 and do not have "did you mean" turned on. In testing with it on we couldn't save any MARC records |
12:29 |
jihpringle |
we haven't tested with any of the post 3.7.0 "did you mean" fixes yet |
12:29 |
mmorgan |
csharp_: Our big concern is loading records with 856 links. We do a lot of that, but general saving and loading of bib records via vandelay is our concern, too. |
12:31 |
mmorgan |
We've just begun testing with production data and so far are seeing the records with 856 links taking MUCH longer, also getting general.unknown error for some records that have failed to load. |
12:31 |
mmorgan |
jeff: Not sure what you mean by a 'deadlock' ? |
12:32 |
mmorgan |
jihpringle: We've loaded some of the post 3.7.0 fixes. |
12:32 |
berick |
jeff: if you decide to disable, mind sharing what all you disable? |
12:33 |
mmorgan |
We're very early in testing, and were just wondering if others were ahead of us and had experiences regarding saving/loading records. |
12:38 |
mmorgan |
jihpringle: Have you done anything other than turning off "did you mean"? My thinking is that even with it off, the sysmpell dictionaries in the database would still be updated without disabling triggers like jeff mentions. |
12:38 |
csharp_ |
mmorgan: a deadlock is a postgresql level problem where two processes are waiting on each other to release the lock on a paritcular tuple ("row") |
12:39 |
mmorgan |
csharp_: Ah. Ok, thanks. I think I've seen log entries complaining about tuples. |
12:39 |
jeff |
mmorgan: the postgresql logs will log a deadlock when the database detects that process X and process Y (and maybe process Z) are all waiting on things that each other are waiting on. It's one of the reasons that pingest doesn't do certain bib-related ingest things in parallel. |
12:39 |
|
collum joined #evergreen |
12:40 |
jeff |
looks like we had one weird one on actor.usr_setting (which I don't have enough information to reproduce at this point), and the only other one was: |
12:40 |
jeff |
Process 456530: INSERT INTO biblio.record_entry... |
12:40 |
jeff |
Process 456523: SELECT asset.merge_record_assets("bre"... |
12:41 |
jeff |
that one's close enough that I'd look into symspell triggers being a contributing factor, but it might be something else also. |
12:41 |
csharp_ |
fwiw, when addressing bug 1931737 we disabled all the maintain_symspell_entries_tgr triggers on the metabib.*_entry tables |
12:41 |
pinesol |
Launchpad bug 1931737 in Evergreen "Did you mean breaks parallel reingest" [Undecided,Confirmed] https://launchpad.net/bugs/1931737 |
12:41 |
jeff |
If I improperly maligned symspell triggers in my deadlock mention earlier, I'm sorry. :-) |
12:42 |
csharp_ |
@blame symspell triggers |
12:42 |
pinesol |
csharp_: symspell triggers caused the white screen of death! |
12:42 |
jeff |
berick: I hadn't looked into it yet, but the triggers mentioned in the bug csharp_ just linked are the ones I had first in mind to look at disabling. It doesn't currently look like we're having enough issues to warrant me looking at it immediately, though. |
12:42 |
jeff |
now pinesol is improperly maligning the triggers! |
12:43 |
csharp_ |
@blame pinesol for blaming the triggers |
12:43 |
pinesol |
csharp_: itself wants the TRUTH?! itself CAN'T HANDLE THE TRUTH!! for blaming the triggers |
12:43 |
jeff |
is "Improperly maligning triggers..." the new "Reticulating splines..."? |
12:43 |
csharp_ |
@ana Improperly maligning triggers |
12:43 |
pinesol |
csharp_: Gleamingly gripping terrorism |
12:44 |
jeff |
Ah, that's the problem right there: your triggers are out of malignment! |
12:45 |
csharp_ |
@quote add <jeff> Ah, that's the problem right there: your triggers are out of malignment! |
12:45 |
pinesol |
csharp_: The operation succeeded. Quote #219 added. |
12:46 |
csharp_ |
when it comes down to it, aren't we all just "waiting for more data from parent"? |
12:48 |
* csharp_ |
is referring to the server error mentioned yesterday http://irc.evergreen-ils.org/evergreen/2021-12-14#i_497049 |
13:00 |
mmorgan |
testing a file of 1000 records with 856 links on our 3.7 test system is not looking good. 6% progress after an hour, and most failed. |
13:03 |
JBoyer |
There are a couple dym-related patches that you may not have depending on your 3.7.X version. Since they're all just function updates they can be applied anytime and absolutely should be if you don't have them. |
13:03 |
JBoyer |
Sadly, I don't actually have a list handy to refer to... |
13:05 |
jeff |
...and Launchpad search returns 101 open tickets on a search for "did you mean"... :-P |
13:05 |
jeff |
bug 1931626 |
13:05 |
pinesol |
Launchpad bug 1931626 in Evergreen "Did you mean: search suggestions exist for deleted records and can result in no hits" [Medium,Confirmed] https://launchpad.net/bugs/1931626 |
13:05 |
jeff |
bug 1931625 |
13:05 |
pinesol |
Launchpad bug 1931625 in Evergreen "Did you mean: diacritics cause erroneous search suggestions, resulting in no hits" [Undecided,New] https://launchpad.net/bugs/1931625 |
13:06 |
JBoyer |
The big ones are lp 1931162 (3.7.2) and lp 1947173 (3.7.3) |
13:06 |
pinesol |
Launchpad bug 1931162 in Evergreen "Did You Mean optimization fails for some data sets" [High,Fix released] https://launchpad.net/bugs/1931162 |
13:06 |
pinesol |
Launchpad bug 1947173 in Evergreen 3.7 "Did You Mean Symspell dictionary updates can significant slow record ingest" [High,Fix committed] https://launchpad.net/bugs/1947173 |
13:07 |
jeff |
bug 1947173 |
13:07 |
JBoyer |
sometimes Launchpad *can* search if you only want to look at fix committed and fix released. :D |
13:07 |
jeff |
looks like bug 1931162 made it into 3.7.2 |
13:07 |
pinesol |
Launchpad bug 1931162 in Evergreen "Did You Mean optimization fails for some data sets" [High,Fix released] https://launchpad.net/bugs/1931162 |
13:07 |
mmorgan |
1947173 we don't have in place, so that's one place to start |
13:08 |
JBoyer |
Yeah, the () was the version they were released in. I couldn't remember if they both made it out yet. |
13:08 |
mmorgan |
We should already have 1931162, but will double check |
13:11 |
* mmorgan |
confirms we do have 1931162 |
13:14 |
JBoyer |
taking another look at the search results it looks like the released patches are the only ones I was really worried about, but if you don't have both you really, really want to get both asap. |
13:15 |
JBoyer |
The two jeff posted aren't fixed yet but that appears to be it for now. |
13:16 |
mmorgan |
So we can apply 1947173, and next try disabling the maintain_symspell_entries_tgr triggers to see if those steps affect the behavior we're seeing. That should shed more light. |
13:16 |
mmorgan |
jeff++ |
13:16 |
mmorgan |
JBoyer++ |
13:16 |
mmorgan |
csharp_++ |
13:16 |
mmorgan |
jihpringle++ |
13:16 |
mmorgan |
If anyone discovers anything more, I'd be interested! |
13:19 |
JBoyer |
Fun fact about 1931162: it was initially found when accidentally searching for the word "metarecord" but it turns out that pretty much every ISBN search will trigger it, leaving you with Pg processes spinning for ages as they gather ISBNs in your system to recommend... |
13:24 |
jeff |
Well that's interesting... Chrome tells me that my https://host.foo.example.org/eg/opac/home and https://host.foo.example.org/eg/staff/home connections are secure and have valid certs, but if I dig down into [lock icon] -> Connection is secure -> Certificate is valid, for the /eg/staff/home tab it shows an expired certificate. |
13:25 |
jeff |
The certificate has not changed in the lifetime of my browser session. |
13:25 |
jeff |
The hostname in question points at a single IP fronted by a single nginx instance pointing at a single backend server. |
13:26 |
JBoyer |
I've seen that before also, but not tracked it down. How's the cert that apache's using look? |
13:26 |
jeff |
If I had to guess, I'd say that the expired certificate in question might be on the backend host... I'm just wondering how it's making its way to the browser, yet in a non-fatal way. |
13:27 |
jeff |
Hrm. Also need to determine if the paths in question are both going to the same backend. I think they are, but I should check. Again though, same question: why is my browser even getting a whiff of the backend cert? |
13:30 |
jeff |
co-worker reproduced, then did a shift-reload on the staff url, and the expired cert was no longer there. |
13:32 |
JBoyer |
Fun caching interactions, perhaps. I think it can take a shift-reload to force certs to be redownloaded |
13:32 |
jeff |
nope, backend apache instance is using a self-signed cert. |
13:33 |
jeff |
the cert in question would have been replaced in... February. |
13:33 |
jeff |
which predates this laptop (but not the account that Chrome is currently using for sync, so...) |
14:44 |
jeff |
Interesting. Record doesn't appear in keyword search but does show in series search for same terms. metabib.keyword_field_entry appears to have the keywords in the search, which is: who would win |
15:03 |
jeff |
somewhere along the process of looking at this, I found a special number of metabib.keyword_field_entry rows that matched one of my debug queries: 404. |
15:03 |
jeff |
:-P |
15:05 |
jeff |
also, I'm not sure I noticed before today that there's a reference in PostgreSQL documentation to The Sudbury Neutrino Detector. |
15:07 |
mmorgan |
:) |
15:17 |
* mmorgan |
is having another Weird Wednesday issue. Has anyone else had issues emailing bib records from the opac? |
15:17 |
mmorgan |
I get the preview, but when I click Email Now, no email, ever. |
15:19 |
mmorgan |
I can see the preview action trigger event in the database, that is complete, but that preview trigger is the only one I see. |
15:26 |
mmorgan |
Can anyone else confirm that they can successfully email bib records from their opac? |
15:38 |
|
jihpringle100 joined #evergreen |
15:54 |
collum |
mmorgan: I can confirm. I tried several times in our opac and did not receive an email. The preview event is in the database. |
15:56 |
mmorgan |
collum++ |
15:56 |
mmorgan |
Thanks, I will open a LP bug. |
15:56 |
* mmorgan |
was hoping it was just me :-( |
16:36 |
csharp_ |
@decide everyone or just me |
16:36 |
pinesol |
csharp_: go with just me |
16:36 |
|
jvwoolf left #evergreen |
16:38 |
berick |
almost sounds like a Prince song |
16:39 |
csharp_ |
and log4j sounds like a Prince logging utility :-) |
16:39 |
csharp_ |
Nothing Compares 2 Log4J |
16:49 |
JBoyer |
I quite enjoyed this log4j take: https://twitter.com/leanrum/status/1470954707120181253 |
17:01 |
|
mmorgan left #evergreen |
18:00 |
|
jihpringle joined #evergreen |
18:00 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
19:01 |
|
eglogbot joined #evergreen |
19:01 |
|
Topic for #evergreen is now Welcome to #evergreen (https://evergreen-ils.org). This channel is publicly logged. |
19:32 |
|
jihpringle joined #evergreen |