Time |
Nick |
Message |
01:56 |
|
awitter joined #evergreen |
06:00 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
07:18 |
|
rjackson_isl_hom joined #evergreen |
07:51 |
|
mantis1 joined #evergreen |
08:00 |
|
mantis1 joined #evergreen |
08:04 |
|
rfrasur joined #evergreen |
08:39 |
|
jvwoolf joined #evergreen |
08:40 |
|
mmorgan joined #evergreen |
08:50 |
|
collum joined #evergreen |
09:03 |
|
Dyrcona joined #evergreen |
09:57 |
Dyrcona |
Truncate action_trigger.event and a test of the daily a/t runner goes really fast. |
10:00 |
|
Christineb joined #evergreen |
10:08 |
Bmagic |
Dyrcona++ # burn it all down |
11:14 |
Dyrcona |
Well, it still takes a while to churn through 14,286 events. But collecting them into the table was really fast. :) |
11:48 |
csharp |
berick: new fix applied to PINES production - so far so good, drone-wise - I'll let you know if we hear complaints |
11:53 |
berick |
csharp: cool |
12:03 |
|
jihpringle joined #evergreen |
13:01 |
jeffdavis |
csharp++ # sometimes you gotta test in production |
13:06 |
csharp |
jeffdavis: yep, it's the only way to see if something that only emerges in production actually works! |
13:36 |
jeffdavis |
updated bug 1896285 - I think the 3.5 version is OK to go in rel_3_5 |
13:36 |
pinesol |
Launchpad bug 1896285 in Evergreen 3.5 "Use batch methods for multi-row grid actions" [Medium,Confirmed] https://launchpad.net/bugs/1896285 |
13:46 |
csharp |
jeffdavis: +1 |
14:17 |
|
alynn26 joined #evergreen |
14:33 |
|
sandbergja joined #evergreen |
14:53 |
Bmagic |
jeffdavis: I rolled that patch into our production 3.5 system last night. It's been more stable today |
14:58 |
Bmagic |
csharp: did you put opensrf_bug 1912834 on production as well? |
14:59 |
|
mantis1 left #evergreen |
15:06 |
|
pinesol` joined #evergreen |
15:11 |
|
sandbergja joined #evergreen |
15:20 |
csharp |
wow - something appears very broken here - something sent apparently hundreds of null ids to a staff search and it went for 6.5 minutes with the accompanying NOT CONNECTED errors |
15:21 |
csharp |
this is the activity.log call: https://pastebin.com/VnNexDjf |
15:23 |
csharp |
Bmagic: yes, we've applied that too |
15:24 |
Dyrcona |
@blame tsbere |
15:24 |
pinesol |
Dyrcona: tsbere stole bshum's tux doll! |
15:24 |
csharp |
(well, that's the "new fix" I was telling berick about earlier) |
15:24 |
csharp |
@seen tsbere |
15:24 |
pinesol |
csharp: tsbere was last seen in #evergreen 3 years, 36 weeks, 4 days, 1 hour, 3 minutes, and 17 seconds ago: <tsbere> er, not anoning |
15:25 |
berick |
csharp: that api is from the staff catalog. should be a quick fix |
15:25 |
Dyrcona |
It's nothing major, just you have to restart open-ils.circ after changing the circ.opac_renewal.use_original_circ_lib global flag. |
15:26 |
Dyrcona |
According to git blame, tsbere added the code that caches the setting. |
15:26 |
csharp |
Dyrcona: UNBELIEVEABLE |
15:26 |
csharp |
berick: awesome |
15:28 |
Dyrcona |
The code has been there since 2011, so you'd think I would have known.... |
15:28 |
Dyrcona |
@blame Dyrcona |
15:28 |
pinesol |
Dyrcona: Dyrcona is probably integrated with systemd |
15:28 |
Bmagic |
csharp: did you determine the reason for the cataloging (bucket?) issue? |
15:31 |
csharp |
Bmagic: berick did the work, I just applied it this morning |
15:31 |
berick |
the issue was the patch needed fixing |
15:31 |
Bmagic |
right, so, are you saying the bucket issue was related? |
15:31 |
csharp |
the bucket issue? sorry I may not be following |
15:32 |
csharp |
what the patch does is throttle concurrent OpenSRF requests so as not to overwhelm services like open-ils.actor |
15:33 |
csharp |
so a cataloger adding 50+ items to a bucket was creating 50+ actor drones and if you multiply that by $pines_catalogers, it gets hairy real fast |
15:34 |
berick |
csharp: you said something about buckets yesterday when you were finding misc. bugs from the patch |
15:34 |
berick |
(though none of the bugs were really bucket related) |
15:37 |
csharp |
berick: ah - thanks |
15:37 |
Bmagic |
right on - well, I'm going to merge that opensrf branch into production this evening as well. Having bug 1896285 - we noticed an improvement, but today, we still saw a couple of machines get out of control and die |
15:37 |
pinesol |
Launchpad bug 1896285 in Evergreen 3.5 "Use batch methods for multi-row grid actions" [Medium,Confirmed] https://launchpad.net/bugs/1896285 |
15:37 |
csharp |
Bmagic: yeah, I think the bucket thing was just shrinking resources affecting all kinds of things |
15:37 |
csharp |
the complaints were varied but all dealt with batch actions |
15:37 |
|
sandbergja joined #evergreen |
15:38 |
csharp |
and today has been relatively quiet after applying the latest version of the fix |
15:38 |
Bmagic |
throttling the number of requests to 5 at a time should help ALOT |
15:38 |
csharp |
yes, system-side has been back to what I think of as "normal" |
15:38 |
Bmagic |
that's encouraging, patching now... |
15:39 |
berick |
csharp: mind checking something? that null-blast issue, do you see vandelay api calls around the same time (just prior)? |
15:39 |
Bmagic |
Thinking back - I noticed a huge impact on the servers moving from XUL to the web based client |
15:39 |
berick |
just found that vandelay also uses that api, and may be more likely the culprit |
15:40 |
csharp |
berick: looking now |
15:40 |
Bmagic |
at that time, we upped the hardware on all of the bricks, fleet-wide, to cope with the onslaught of requests that the web client seemed to be doing. It seems this "shot-gun" blast from the Evergreen web client is not new |
15:40 |
berick |
Bmagic: IIRC, the XUL client natively limited the number of XHR requests to something like 8 at a time, so it always had a baked in limit |
15:41 |
Bmagic |
berick: I was wondering |
15:44 |
csharp |
berick: this looks like the culprit: open-ils.search open-ils.search.biblio.multiclass.query.staff {"limit":1001,"offset":0}, "(keyword:the prophets) site(PINES) |
15:45 |
berick |
csharp: huh, was trying stuff like that couldn't make it happen. looks like also protected in the code, but i'm clearly missing something.. |
15:45 |
csharp |
I'll try that call from srfsh to see what happens |
15:45 |
csharp |
or maybe it won't act the same from srfsh? |
15:46 |
csharp |
that came back super fast, so I guess not |
15:49 |
berick |
yeah, i can't get the catalog to send nulls, strange |
15:50 |
csharp |
open-ils.acq open-ils.acq.purchase_order.retrieve "<REDACTED>",, 7030, {"flesh_price_sum |
15:50 |
csharp |
mary":true,"flesh_provider":true,"flesh_lineitem_count":true} |
15:50 |
csharp |
that's right before as well |
15:51 |
csharp |
well, 2 seconds before |
15:52 |
berick |
csharp: does this return 0? select count(*) from vandelay.bib_match where eg_record is null; |
15:53 |
berick |
0 is exptected, but just to rule that out |
15:53 |
Bmagic |
zero for me |
15:53 |
csharp |
0 here too |
15:53 |
berick |
k |
15:54 |
csharp |
I can confirm that the record IDs before all the nulls all lead to records with "the prophet" somewhere in the record |
15:54 |
berick |
csharp: ok, good, that settles that |
15:56 |
berick |
csharp: any local diffs to origin/master for Open-ILS/src/eg2/src/app/share/catalog/catalog.service.ts or search-context.ts ? |
15:56 |
csharp |
lemme look |
15:56 |
csharp |
2021-01-26 14:59:41 brick01-head gateway: [ACT:60271:osrf-websocket-stdio.c:559:1611691160602718] [127.0.0.1] [] open-ils.search open-ils.search.biblio.record.catalog_summary.staff 191, [null,null,null,null,null,null,null,null,null,null] |
15:56 |
|
jonadab joined #evergreen |
15:56 |
csharp |
another example of fewer |
15:56 |
csharp |
it's only happened seven times today |
15:57 |
berick |
interesting |
15:58 |
berick |
or bib-record.service.ts |
15:59 |
csharp |
ah... |
15:59 |
csharp |
I think I found the issue |
16:00 |
csharp |
to allow searching in the (goddamned) search box on the staff page, I changed "let query = ts.query[idx];" to "let query = decodeURIComponent(ts.query[idx]);" around line 510 or so |
16:01 |
csharp |
I have a feeling that's causing an unexpected side effect |
16:01 |
csharp |
we may just have to remove that box until we can safely redirect to the eg2 version |
16:02 |
csharp |
that's in Open-ILS/src/eg2/src/app/share/catalog/search-context.ts btw |
16:05 |
csharp |
apparently we can't win on the OpenSRF throttling either - getting complaints of things not loading in acq - I'll try to get more specifics |
16:06 |
csharp |
argh! it was such a smooth upgrade server-side - these piddly issues are going to be the death of me |
16:07 |
|
sandbergja joined #evergreen |
16:56 |
|
sandbergja joined #evergreen |
17:00 |
Bmagic |
csharp: JS cache client-side might account for the patch not "taking hold" on some workstations |
17:26 |
|
mmorgan left #evergreen |
17:49 |
|
Cocopuff2018 joined #evergreen |
18:00 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
18:02 |
|
book` joined #evergreen |
19:04 |
|
Dyrcona joined #evergreen |
19:09 |
|
sandbergja joined #evergreen |
20:21 |
|
jonadab joined #evergreen |
20:41 |
|
Dyrcona joined #evergreen |
21:10 |
|
sandbergja joined #evergreen |
21:20 |
|
sandbergja joined #evergreen |
23:14 |
|
jamesrf joined #evergreen |
23:38 |
|
Cocopuff2018 joined #evergreen |