Time |
Nick |
Message |
02:54 |
|
beanjammin joined #evergreen |
06:30 |
pinesol_green |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
07:01 |
|
agoben joined #evergreen |
07:13 |
|
rjackson_isl joined #evergreen |
07:40 |
|
rlefaive joined #evergreen |
08:05 |
|
collum joined #evergreen |
08:17 |
|
Dyrcona joined #evergreen |
08:27 |
|
littlet joined #evergreen |
08:33 |
|
rlefaive joined #evergreen |
08:34 |
|
idjit joined #evergreen |
08:53 |
* Dyrcona |
wonders if he should ask in here or in #postgresql..... |
08:53 |
Dyrcona |
But, first, a paste! |
09:01 |
Dyrcona |
So, I came up with this script to remove old action_trigger events and event output this week: https://pastebin.com/34t5AY1z |
09:02 |
Dyrcona |
I have tested it on a copy of production data from Wednesday around midnight. |
09:02 |
Dyrcona |
It works, and it drops about 30GB of useless data from our database. |
09:03 |
Dyrcona |
My question is, when I run this in production, should I shut down cron jobs and anything else that may touch the action_trigger tables while it runs? |
09:03 |
Dyrcona |
Or, will normal database locking take care of any problems? |
09:04 |
Dyrcona |
My final, working, test ran in about 6 minutes 46 seconds on my test db server. |
09:06 |
|
jvwoolf joined #evergreen |
09:08 |
|
jvwoolf1 joined #evergreen |
09:12 |
|
lsach joined #evergreen |
09:17 |
JBoyer |
Dyrcona, I'd probably still pause them just out of an overabundance of caution, but I don't have a definitive answer. |
09:18 |
Dyrcona |
JBoyer: That's what I'm thinking, plus it would give me a chance to install a small customization to the web client. |
09:18 |
JBoyer |
and you don't have to drop those constraints only to truncate, if I don't drop them it takes minutes to delete a single output. :/ |
09:19 |
Dyrcona |
I don't know who to install js changes with doing the full install dance, and that requires me to stop services and unmount NFS because of changes I made to how we share. |
09:19 |
JBoyer |
(maybe not minute(s), I can't remember right now, but an absurd amount of time) |
09:20 |
Dyrcona |
JBoyer: It would not truncate action_trigger.event_output with the constraints in place. I even put the truncate of both tables on 1 line with action_trigger.event first. |
09:20 |
JBoyer |
I *think* you can just do the web client bits in the repo and then cp the results into place. I haven't tried it myself but everything in there is pretty well self-contained. |
09:20 |
Dyrcona |
I thought a simple copy didn't work for js changes. |
09:20 |
JBoyer |
I'm not saying you don't need to drop them, I'm saying you need to drop them for more things than just truncating. :) |
09:21 |
Dyrcona |
Oh, OK! I misunderstood. My bad. :) |
09:21 |
JBoyer |
a cp after doing the npm build-prod and all of that. then cp |
09:21 |
Dyrcona |
I'm on 3.0 so grunt all... :) |
09:21 |
JBoyer |
cp'ing the whole tree. You can't copy over one of the individual files. |
09:22 |
Dyrcona |
OK. I'll give that a try on a vm. I have one that I still need to update to 3.0. |
09:22 |
JBoyer |
Oh, in that case I think you can just copy over the top of them. you may have to restart apache though |
09:22 |
|
yboston joined #evergreen |
09:22 |
Dyrcona |
I think the delete might be helped if the triggers were dererrable as per my comment in the script. |
09:25 |
JBoyer |
I can do a quick check on that. |
09:27 |
Dyrcona |
So, over in #postgresql nickb suggests doing an explict table lock at the beginning of the transaction to prevent update to action_trigger.event. |
09:28 |
Dyrcona |
I think shutting down services and putting up a "we're down for maintenance page" is likely a better option, though I should probably lock the tables anyway. |
09:29 |
|
kmlussier joined #evergreen |
09:30 |
JBoyer |
Yeah, it might not be many, but some events could be updated between the select and the truncate. And locking people out of circ'ing for 5+ minutes will make you unpopular with staff. :) |
09:30 |
Dyrcona |
:) |
09:31 |
Dyrcona |
This will require planning.... |
09:33 |
Dyrcona |
Oh, good. Postgres died on the replicant server.... |
09:42 |
Dyrcona |
We've also had some issues with Apache processes spinning with high CPU since the upgrade, and I don't think it's a websockets issue or not the same that was patched, because we are on OpenSRF 3.0.1. |
09:47 |
Dyrcona |
Also, Overdrive integration appears to be not working, or sort of not working.... |
09:48 |
Dyrcona |
Overdrive's test environment requiring different credentials and configuration from production is less than ideal. |
09:56 |
JBoyer |
re: deferrable, it is helpful if you're doing much else in a transaction, but if deleting from action_trigger.evtent_output is the last thing before the commit it doesn't really make a difference. |
09:57 |
JBoyer |
Since it's ~20s per ateo for me I'll probably only ever drop constraint; delete; alter table; to clean it up. :/ |
09:57 |
|
terran joined #evergreen |
09:59 |
Dyrcona |
JBoyer: Ok. Have you tried the cleanup db function and configure the retention_interval? |
09:59 |
Dyrcona |
I was wondering if the function would work without deferred constraints. |
10:00 |
Dyrcona |
I want to test that but I have more immediate things going on. |
10:00 |
berick |
we're using retention_interval locally, but only after a big initial cleanup |
10:00 |
berick |
w/ truncates |
10:01 |
Bmagic |
Dyrcona++ # nice script |
10:01 |
Dyrcona |
Bmagic: I have an improvement to make. |
10:01 |
* Bmagic |
googling "ON COMMIT DROP AS" |
10:05 |
Bmagic |
very cool |
10:05 |
Dyrcona |
Well, ON COMMIT DROP and AS are separate things. :) |
10:06 |
Dyrcona |
Also, I just edited the past to include an exclusive lock on both tables. |
10:06 |
* Dyrcona |
wishes he could edit the past. :) |
10:06 |
Bmagic |
"When rows are dropped that meet this critera: copy them to this table" ? |
10:06 |
Dyrcona |
No. |
10:07 |
Dyrcona |
ON COMMIT DROP works with CREATE TEMP TABLE, to drop the temp table on a commit. |
10:07 |
Bmagic |
I see, drop the temp table with the commit at the bottom of this transaction |
10:07 |
Dyrcona |
AS works with CREATE [TEMP} TABLE to create the table from the results of the following SQL query. |
10:07 |
Dyrcona |
Right. |
10:08 |
Dyrcona |
The on commit drop is not necessary because the tables will be dropped when the script exits anyway, but I like to be explicit. |
10:08 |
Bmagic |
neat |
10:08 |
Dyrcona |
Plus, it saves space when you have multiple transactions in a script file. |
10:09 |
Bmagic |
What was your edit? It looks like the paste is updated. This looks new "LOCK TABLE action_trigger.event, action_trigger.event_output IN EXCLUSIVE MODE;" |
10:11 |
Dyrcona |
Yeah, that's the edit. I decided to lock the tables to prevent outside updates. |
10:12 |
Bmagic |
good idea |
10:12 |
Dyrcona |
I considered preventing selects, too. |
10:14 |
Bmagic |
These tables are a problem for me and probably many others as well! |
10:14 |
Bmagic |
Dyrcona++ |
10:14 |
Dyrcona |
Yeah. 3.0 adds a function to keep them cleaned up and the release notes recommend doing something like my script before starting that process. |
10:15 |
Bmagic |
yep |
10:15 |
Dyrcona |
As I mentioned before, that script drops 30GB from my copy of production. |
10:15 |
Bmagic |
What a great payoff |
10:17 |
Dyrcona |
Hmm. On my Apache's using a lot of CPU... I didn't apply the patches for the hash values being different when I installed OpenSRF for the upgrade. I wonder if that is the problem? |
10:18 |
Dyrcona |
Well, the cache key. Every brick/server has a different cache key. |
10:18 |
Bmagic |
my production bricks still report different keys |
10:18 |
Bmagic |
Have for years |
10:19 |
Dyrcona |
OK. I didn't really think it was a big deal. Back in November, miker indicated that it should really cause a problem. |
10:19 |
Dyrcona |
s/should/shouldn't/ ... :) |
10:20 |
* miker |
reads up |
10:23 |
terran |
berick: Any chance you could take a look at this? https://bugs.launchpad.net/evergreen/+bug/1774427 |
10:23 |
pinesol_green |
Launchpad bug 1774427 in Evergreen "Date of Birth sometimes off by a month in patron edit form" [High,New] |
10:24 |
miker |
Dyrcona: re your first thing this morn', yes, pause a/t while running that, lest the locks cause opensrf-level timeouts and cause things to stick in a weird state (because of the explicit exclusive lock) |
10:25 |
Dyrcona |
miker: Yeap. Thanks for confirming. |
10:25 |
miker |
as for different cache keys, it just means every browser may cache up to $brick_count identical copies of the data under different url keys |
10:27 |
berick |
terran: that's... fun |
10:27 |
miker |
and, now that the cache key is being used more liberally, the same goes for a lot of js and css, and maybe images. not just the autogen-generated stuff |
10:27 |
terran |
berick: :( |
10:29 |
Dyrcona |
miker: So, it would be a good idea, though not critical, to get all cache keys to be the same? |
10:32 |
miker |
terran: ew ... that looks like something is adding 1 based on the fact that Date().prototype.getMonth() returns the month as number in the a range of 0-11, but getDate() isn't actually being used... |
10:32 |
miker |
Dyrcona: aye |
10:33 |
terran |
miker: wouldn't that make all the months be off by one instead of only some of the months? |
10:33 |
miker |
hrm, yes |
10:35 |
miker |
AH HA |
10:35 |
miker |
months with 31 days are fine |
10:35 |
miker |
weird! |
10:37 |
terran |
!!! |
10:37 |
Dyrcona |
Something adding 30 days? |
10:38 |
miker |
Dyrcona: that or similar, it would seem |
10:38 |
miker |
well, no |
10:38 |
miker |
because adding 30 days to 71-01-05 would still shift the month |
10:39 |
berick |
ok, it's calling setMonth before setDate |
10:39 |
berick |
if the day is too high, it shifts the month to accomodate |
10:39 |
berick |
d = new Date() ; d.setMonth(1) |
10:40 |
berick |
demonstrates the issue |
10:40 |
miker |
berick: ah, and by default the DOM is today |
10:40 |
berick |
yeah |
10:41 |
dbwells |
So this bug only happens on the 31st of each month? |
10:41 |
miker |
well, but if you're in, say feb and the dob was march 31, the reverse would still be a problem. the DOM would shift |
10:41 |
Dyrcona |
More or less. |
10:41 |
Dyrcona |
And, yeah, I was thinking February would be special. |
10:41 |
miker |
so, we just need to use the constructor that looks like this: var d = new Date(year, month, day, hours, minutes, seconds, milliseconds); |
10:41 |
terran |
dbwells: In my testing it didn't matter what the day is |
10:42 |
berick |
miker: yeah, that's what I was thinking... |
10:42 |
miker |
terran: but it matters what day you test on! :) |
10:42 |
dbwells |
terran: right, what miker said :) |
10:42 |
terran |
miker: oh! |
10:42 |
terran |
:D |
10:42 |
berick |
ugh, but that creates dates in UTC. |
10:42 |
* berick |
is reminded why we don't do that now |
10:43 |
dbwells |
date-time stuff gets all the best bugs :D |
10:43 |
Bmagic |
I vote we change the way humans keep track of time |
10:44 |
berick |
we'd learn to parse epochs in no time |
10:44 |
Bmagic |
It should be base 2 |
10:44 |
JBoyer |
There's a reason there are more than 4 editions of this book: https://www.amazon.com/Calendrical-Calculations-Ultimate-Edward-Reingold/dp/1107683165/ |
10:45 |
JBoyer |
Also getMonth() returning 0-11 is a math crime. |
10:45 |
Bmagic |
JBoyer++ |
10:45 |
JBoyer |
(I have a copy, it's a little more dry than I was hoping) |
10:46 |
Dyrcona |
Dates are hard. (I'm not being facetious.) |
10:46 |
terran |
Bmagic: time is just a social construct anyway |
10:46 |
dbwells |
JBoyer: Just make sure to avoid the "Millennium Edition", that stuffs never good |
10:47 |
Bmagic |
It should be more like Date of birth: noonish, in the 80's. Leaving a little wiggle room for error |
10:47 |
JBoyer |
The edition of the book stops working each day at midnight. |
10:47 |
JBoyer |
dbwells++ |
10:48 |
miker |
berick: we can use moment-js I bet |
10:48 |
miker |
I have a call in a few minutes, so can't look up the details right now, but moment is pretty rad, and already loaded |
10:49 |
JBoyer |
Dyrcona, re: using the A/T cleanup script with any constraints in place: I'm not sure it matters if they're deferrable or not because you're waiting the same amount of time before completion, the only question is when you wait. |
10:50 |
Dyrcona |
JBoyer: I'm not clear how constraints affect truncate. I know that deferred constraints will allow deletes in any order as longs as data is deleted from both tables in the same transaction. |
10:51 |
Dyrcona |
My concern is just getting the truncate to succeed and not how fast or slow it was. |
10:53 |
berick |
miker: i think i'm overthinking. Date(y,m,d) returns the correct value, it's just repesented by default as UTC (at 5am) w/ toString. toLocaleString() shows correct value. |
10:53 |
berick |
i suspect that will work fine. i'll post a patch |
10:54 |
terran |
berick++ |
10:54 |
terran |
Thanks everyone!!! |
10:54 |
Dyrcona |
berick++ |
10:54 |
kmlussier |
berick++ |
10:54 |
miker |
berick: ah, because we only care about the date, we can just lop off the time part. of course |
10:54 |
miker |
berick++ |
10:56 |
Dyrcona |
So, I have apache2-websockets instances spinning out of control. |
10:58 |
Dyrcona |
Oh, that's lovely: [Wed May 30 10:53:18.447792 2018] [core:notice] [pid 3491] AH00051: child pid 14598 exit signal Segmentation fault (11), possible coredump in /etc/apache2-websockets |
11:00 |
Dyrcona |
Of course, those are dead and not spinning doing nothing. |
11:13 |
JBoyer |
Dyrcona, oh, I wasn't following the actual question, I assumed the truncate was a one-time cleanup before using the built-in cleanup. I would assume that deferrable would work, but I wasn't testing that before. I may give that a shot soon. |
11:20 |
Dyrcona |
So, we appear to be having issues with websockets. |
11:21 |
csharp |
berick++ |
11:21 |
Dyrcona |
I guess that's too vague. I'll have to do some research and open a Lp bug. |
11:21 |
csharp |
10 minutes from your push to applying to PINES production :-) |
11:22 |
berick |
csharp: nice :) |
11:23 |
* csharp |
rocks it like it's 2008-era PINES |
11:23 |
berick |
Dyrcona: strace showing what looping WS processes is helpful, especially if you can clarify (for example) if it's looping on an open socket, etc. and what that socket points to. |
11:23 |
terran |
I don't say it enough, but y'all rock. |
11:29 |
Dyrcona |
berick: Apparently no dice: strace: [ Process PID=3068 runs in x32 mode. ] |
11:30 |
Dyrcona |
I'm googling to see what I can do about that. I'll try the other pids, too. |
11:30 |
berick |
huh, i've never seen that before |
11:32 |
Dyrcona |
Interestingly, it attaches to one that isn't stuck, but produces only partial output. |
11:32 |
Dyrcona |
All of the stuck ones report running in x32 mode from strace. |
11:33 |
Dyrcona |
And, they're still running after issuing a kill to each one. |
11:34 |
Dyrcona |
Better luck on the other brick. |
11:34 |
Dyrcona |
I'm getting futex with a FUTEX_WAIT_PRIVATE |
11:35 |
miker |
csharp: I was going to say something about "like back in 2006" but back then we just DID IT LIVE ;) |
11:36 |
berick |
Dyrcona: for one that's spinning? |
11:36 |
Dyrcona |
yes. |
11:36 |
berick |
k, nothing else in the strace? |
11:37 |
|
rlefaive joined #evergreen |
11:37 |
Dyrcona |
berick: futex(0x7f162de1a1c8, FUTEX_WAIT_PRIVATE, 2, NULL |
11:37 |
Dyrcona |
On all three. |
11:38 |
miker |
Dyrcona: I often have to kill spinning websockets twice (threads, I think) if I use sig 15 (this is pre-patch websockets) |
11:38 |
Dyrcona |
I'm using strace -C -p 22381 should I use any other options? |
11:38 |
Dyrcona |
miker: OK. sometimes I've had to do it twice, sometimes not. |
11:39 |
berick |
Dyrcona: any gateway logs spewing? |
11:39 |
berick |
Dyrcona: and I assume all opensrf patches applied? |
11:40 |
miker |
Dyrcona: yes, it's not always for me either |
11:40 |
Dyrcona |
berick: I'll check the logs, and we're basically on OpenSRF 3.0.1. I installed rel_3_0 on Sunday night. |
11:41 |
Dyrcona |
I also verified that it's up to date with what's on the community repos. |
11:41 |
berick |
k |
11:43 |
Dyrcona |
It's also hit or miss finding the pids in the logs on the syslog server, or even the local host. |
11:43 |
Dyrcona |
Our logs are logging pids to apache logs, but I'm only finding some of these. |
11:44 |
berick |
Dyrcona: are all the futex calls identical? |
11:44 |
berick |
in the strace |
11:44 |
Dyrcona |
berick: They are. |
11:45 |
Dyrcona |
That could be strace trying to attach? |
11:46 |
Dyrcona |
https://meenakshi02.wordpress.com/2011/02/02/strace-hanging-at-futex/ |
11:46 |
Dyrcona |
That looks useful. I'm reading it. |
11:46 |
berick |
good call |
11:49 |
Dyrcona |
So, It looks like there are 3 threads spawned by one of these. |
11:50 |
berick |
sounds right |
11:50 |
Dyrcona |
Attaching to 1 gives me the futex output, another says it is in x32 mode, the third looks like it hung in select. |
11:50 |
berick |
web -> osrf thread; osrf -> web thread; idle timeout thread |
11:51 |
Dyrcona |
select ... wait4 repeat over and over again. |
11:51 |
berick |
ah, ok, so it's looping on select() ? |
11:52 |
Dyrcona |
yeah, lost ejabberd connection, maybe? |
11:52 |
berick |
maybe. latest patches should have addressed that though |
11:52 |
berick |
could have missed something... |
11:52 |
csharp |
miker: yep :-) |
11:54 |
Dyrcona |
berick: Same thing on all of them when I get to the right child process. |
11:55 |
csharp |
Dyrcona: berick: miker: this is an interesting discussion, because I've been seeing spinning websocket procs since applying the hotfix (compiling on another server and moving the .so file into place, restarting services), but I thought it was a Just Me™ problem |
11:55 |
csharp |
and I have to kill them twice too |
11:55 |
* berick |
nods |
11:56 |
Dyrcona |
Oh, cool. It's started on a 3rd brick head again. :) |
11:57 |
Dyrcona |
What's interesting (and it may just be coincidence) is I've only seen this on 4 out of 5 bricks so far. |
11:57 |
JBoyer |
This is not doing me any favors as I plan for an upgrade in 2 days. |
11:58 |
|
sandbergja joined #evergreen |
11:58 |
csharp |
JBoyer: for us, it's just a minor headache - nothing serious |
11:58 |
JBoyer |
Dyrcona, deferrable constraints doesn't allow you to do the truncates the way you do in your script, though it does work to do both at once: TRUNCATE TABLE action_trigger.event, action_trigger.event_output; does work, and will likely work with constraints as-is. |
11:59 |
berick |
Dyrcona: the looping select() calls are identical too? |
11:59 |
berick |
... and what are they exactly? |
11:59 |
csharp |
futex(0x7fad78e911c8, FUTEX_WAIT_PRIVATE, 2, NULL |
12:00 |
berick |
csharp: read up for futux / strace discussion |
12:00 |
JBoyer |
csharp, so kind of like the occasional dead spinning connection we have now? That's less alarming, downgraded to Bummer Status. ;) |
12:00 |
berick |
csharp: https://meenakshi02.wordpress.com/2011/02/02/strace-hanging-at-futex/ |
12:00 |
csharp |
JBoyer: yeah, definitely in the "bummer" category for us |
12:01 |
Dyrcona |
csharp: Do you have pre-3.0 OpenSRF libraries hanging around? |
12:02 |
Dyrcona |
I do, and I wonder if that might be an issue. (I doubt it, though.) |
12:02 |
Dyrcona |
berick: they are not all identical. |
12:03 |
Dyrcona |
The first looks like: select(0, NULL, NULL, NULL, {0, 481231}) = 0 (Timeout) |
12:03 |
Dyrcona |
wait4(-1, 0x7ffc153bfe0c, WNOHANG|WSTOPPED, NULL) = 0 |
12:03 |
Dyrcona |
After that the 481231 changes to 0. |
12:04 |
csharp |
select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) |
12:04 |
csharp |
wait4(-1, 0x7ffe3e7f64ec, WNOHANG|WSTOPPED, NULL) = 0 |
12:04 |
Dyrcona |
And, the 0 changes to 1. |
12:04 |
Dyrcona |
Yeah, like csharp's example. |
12:04 |
csharp |
select(0, NULL, NULL, NULL, {0, 185014}) = 0 (Timeout) |
12:04 |
csharp |
wait4(-1, 0x7ffe3e7f64ec, WNOHANG|WSTOPPED, NULL) = 0 |
12:04 |
csharp |
^^ my first lines |
12:04 |
gmcharlt |
berick: here's one you might find interesting: https://bugs.launchpad.net/evergreen/+bug/1774448 |
12:05 |
pinesol_green |
Launchpad bug 1774448 in Evergreen "web staff client can spam open-ils.auth.session.retrieve requests" [Undecided,New] |
12:05 |
Dyrcona |
The number in the {0, NUMBER} changes each time I strace the same process. |
12:05 |
Dyrcona |
And, it's different for each process. |
12:06 |
csharp |
Dyrcona: confirmed |
12:06 |
berick |
thanks Dyrcona csharp |
12:06 |
csharp |
@band add Dyrcona csharp |
12:06 |
pinesol_green |
csharp: Band 'Dyrcona csharp' added to list |
12:07 |
|
rlefaive_ joined #evergreen |
12:07 |
* Dyrcona |
is going to have some lunch. BBIAB. |
12:08 |
|
jihpringle joined #evergreen |
12:10 |
|
beanjammin joined #evergreen |
12:15 |
|
khuckins joined #evergreen |
12:19 |
|
rlefaive joined #evergreen |
12:20 |
|
dbs joined #evergreen |
12:24 |
Dyrcona |
berick: I don't know how useful the summary is, but this is what I got from letting strace run for a while: https://pastebin.com/hjLGkN2k |
12:26 |
|
yboston joined #evergreen |
12:30 |
|
rlefaive joined #evergreen |
12:38 |
|
dbs joined #evergreen |
12:52 |
Bmagic |
rhamby: above_the_treeline.pl needs a line removed from commit http://git.evergreen-ils.org/?p=contrib/equinox.git;a=blobdiff;f=above_the_treeline/above_treeline_export.pl;h=8c89a26734824a85e79af86a0f2cc249b388019f;hp=7397bd63ca063a5ceb2f661851645263b2d3f93f;hb=d152a4c3f0c1e8c44266a420d0aa56c1799dcd43;hpb=66c58ca349a077cfdcd3f257fea4823c40025e29 |
12:52 |
Bmagic |
if (defined $ftp_host and defined $ftp_user) { |
12:52 |
Bmagic |
is redundant and causes parsing error, FYI |
12:54 |
rhamby |
hmm must have snuck in when copying from a buffer at some point, fixed |
12:59 |
Bmagic |
right on |
13:01 |
csharp |
@blame the buffer |
13:01 |
pinesol_green |
csharp: the buffer musta been an Apple employee. |
13:02 |
rhamby |
Blame the Buffer will be my Wield Al Yankovic tribute band ..... |
13:06 |
* csharp |
's eyes cross trying to track the changes in the cataloging omnibus branch :-/ (bug 1773417) |
13:06 |
pinesol_green |
Launchpad bug 1773417 in Evergreen 3.0 "Webstaff Cataloging Cleanup Omnibus, May 2018" [High,New] https://launchpad.net/bugs/1773417 |
13:07 |
csharp |
I understand the rationale for the comprehensive approach, but I do much better with one issue per bug :-( |
13:14 |
Dyrcona |
csharp: Same here. |
13:15 |
Dyrcona |
The web client bugs have had a tendency to go comprehensive. :) |
13:16 |
Dyrcona |
@band add Blame the Buffer |
13:16 |
pinesol_green |
Dyrcona: Band 'Blame the Buffer' added to list |
13:18 |
JBoyer |
aw, shucks. I always end up on Blame the Buffer's overflow list. :( |
13:23 |
berick |
Dyrcona: interesting, more stuff going on there... is that just one thread? |
13:25 |
Dyrcona |
berick: Yes, just 1 thread. |
13:25 |
|
rlefaive joined #evergreen |
13:25 |
berick |
but in the raw output, it's just showing select(...) and wai4() looping? |
13:25 |
Dyrcona |
One that was spinning on select most of the time and the parent thread is still using between 95 and 100% cpu. |
13:26 |
Dyrcona |
No, the raw output eventually shows the write() and other calls along with a sigchld. |
13:27 |
Dyrcona |
berick: Here's a sample: https://pastebin.com/MrLuDkqu |
13:28 |
jeffdavis |
we've got spinning websockets procs here too, I thought it might be connected to the jabber error reported in bug 1773249 but haven't confirmed that yet |
13:28 |
pinesol_green |
Launchpad bug 1773249 in Evergreen "Retrieving copy templates in web client can result in NOT CONNECTED errors" [Undecided,New] https://launchpad.net/bugs/1773249 |
13:30 |
miker |
jeffdavis: that, and saving overlarge copy template blobs, can both cause the issue, IIUC |
13:31 |
Dyrcona |
I did some messages about copy templates when looking at the logs earlier. |
13:31 |
Dyrcona |
One of the pids that I managed to find in the apache logs, the last productive thing it did was retrieve copy templates. |
13:31 |
Dyrcona |
After that, tons of SSL errors. |
13:32 |
miker |
related, I've been investigating a chunking /request/ feature to split overlarge request messages up and have them reassembled in the listener |
13:32 |
csharp |
huh - hadn't put it together before, but I know we're getting complaints from catalogers (those brave enough to risk using the scary new web client) about copy template "weirdness" |
13:33 |
csharp |
intermittent issues that are basically unreproducible in our office environment |
13:33 |
csharp |
I'm sure we're dealing with massive JSON blobs |
13:33 |
Dyrcona |
Well, I'm going to finally kill some of these on one of the bricks. I'm getting load warnings for the brick head, now. |
13:33 |
Dyrcona |
I still have 'em spinning on two other bricks if more strace data is needed. |
13:35 |
jeffdavis |
It's not consistently reproducible in our environment either. Retrieving the same big JSON blob works sometimes and fails other times. I want to test the specific step of copying XUL templates to the new web client copy template user setting (see the load_remote_acp_templates() function in cat/volcopy/app.js), but haven't had time yet. |
13:43 |
Dyrcona |
Wow! |
13:44 |
Dyrcona |
One of them is now using 192.7% cpu and refuses to TERM. Time to KILL. |
13:44 |
|
rlefaive joined #evergreen |
13:45 |
Dyrcona |
And, a new one has joined the gang... :) |
13:48 |
Dyrcona |
So, I should look for NOT CONNECTED messages in the logs? |
13:48 |
miker |
Dyrcona: no, those come from the other side of the conversation. best to look in the ejabberd logs, actually |
13:49 |
|
rlefaive joined #evergreen |
13:50 |
miker |
that is, if the cause is ejabberd dropping connections to clients that send to-big stanzas in your case |
13:50 |
dbs |
On that chunking front, is it worth increasing max_stanza_size again? (ours is at 100000 in 2.12 to handle 65536 + multi-byte chars) |
13:51 |
dbs |
But I think we were in the millions before some of the chunking repairs in the 2.12 time frame resolved the worst of the issues? So maybe new moles popping up their heads. Ahh, memories... |
13:51 |
* Dyrcona |
looks. I don't think I changed max stanza size. |
13:52 |
csharp |
nothing in my ejabberd log for one that just popped up |
13:52 |
Dyrcona |
My max_stanza_size is 2097152. |
13:52 |
Dyrcona |
So, yeah, I don't think that's it. |
13:53 |
csharp |
max_stanza_size: 2000000 |
13:53 |
dbs |
haha, yay millions :) |
13:53 |
csharp |
@sing R.E.M. : 1,000,000 |
13:53 |
pinesol_green |
csharp: I see nothing, I know nothing! |
13:54 |
csharp |
pinesol_green: learn to sing |
13:54 |
pinesol_green |
csharp: Down time is a fact of business when you're a poor 501c3 corporation. |
13:58 |
dbs |
csharp++ |
13:59 |
dbs |
@eightball is mobius richer than sfc? |
13:59 |
pinesol_green |
dbs: Maybe... |
13:59 |
jeffdavis |
not finding ejabberd errors here either |
14:00 |
jeffdavis |
and our max_stanza_size is 4194304, I think we win? |
14:00 |
Dyrcona |
heh |
14:01 |
csharp |
@sing The Smiths : Some max_stanza_sizes are Bigger Than Others |
14:01 |
pinesol_green |
csharp: Have you tried taking it apart and putting it back together again? |
14:06 |
dbs |
@quote add <han_solo> I've got a baaad feeling about this... upgrade |
14:06 |
pinesol_green |
dbs: Error: You must be registered to use this command. If you are already registered, you must either identify (using the identify command) or add a hostmask matching your current hostmask (using the "hostmask add" command). |
14:07 |
csharp |
dbs++ |
14:08 |
csharp |
@quote add <han_solo> I've got a baaad feeling about this... upgrade |
14:08 |
pinesol_green |
csharp: The operation succeeded. Quote #187 added. |
14:09 |
|
dbs joined #evergreen |
14:10 |
dbs |
@quote add < jeffdavis> and our max_stanza_size is 4194304, I think we win? |
14:10 |
pinesol_green |
dbs: Error: You must be registered to use this command. If you are already registered, you must either identify (using the identify command) or add a hostmask matching your current hostmask (using the "hostmask add" command). |
14:10 |
dbs |
bah |
14:11 |
|
jeffdavis joined #evergreen |
14:11 |
|
jeffdavis joined #evergreen |
14:15 |
Dyrcona |
dbs did you identify yourself with bot to add the quote? if not, I can add it. |
14:16 |
berick |
strace is not helping me too much, unfortunately. suffice to say if someone can reliably reproduce, I'm all over it. |
14:17 |
Dyrcona |
That's the thing. I haven't seen it on a test environment, so I'm not sure what it triggering it. I'll look into the copy templates angle later. |
14:21 |
|
kmlussier joined #evergreen |
14:25 |
* Dyrcona |
wishes he could copy and paste text from an image, but doesn't have OCR for the clipboard. |
14:25 |
* berick |
wishes he could copy/paste from his eyes |
14:25 |
berick |
some day |
14:26 |
csharp |
@blame Black Mirror |
14:26 |
pinesol_green |
csharp: Black Mirror must eat cottage cheese! |
14:30 |
frank_g |
Hi all, I have a question, Why Does the search result item details display diferent information in the public opac and in the Eg staff client window? For example, in the staff window an item doesnt display Record details>>"Physical Description: regular print" and in the public opac it appears. |
14:30 |
|
abowling joined #evergreen |
14:31 |
pinesol_green |
[evergreen|Remington Steed] Docs: Add bullet number images removed in 2.10 - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=55d62d4> |
14:35 |
frank_g |
I think I found a bug "I guess", but when you enable/disable the Highlighting, the Record details displayed changes, Or Is this correct? |
14:36 |
frank_g |
Try to disable higlighting in this record: http://biblioteca.ipicyt.edu.mx/eg/opac/record/43693?contains=contains;_special=1;detail_record_view=0;qtype=item_barcode;query=LCI01143;locg=1;expand=marchtml |
14:40 |
JBoyer |
frank_g, I think there's a bug about that. Basically when using the highlight code it uses the new Display Fields entries, but the non-highlight code path still uses the old MODS-based data extraction. I'm not sure what the progress is on that one right now. |
14:40 |
terran |
frank_g: Weird! I can confirm the behavior on a demo server here: https://sb2.missourievergreen.org/eg/opac/record/248?query=ready;qtype=keyword;locg=1;detail_record_view=0 |
14:46 |
gmcharlt |
JBoyer: frank_g: I believe that the bug in question is https://bugs.launchpad.net/evergreen/+bug/1752434 |
14:46 |
pinesol_green |
Launchpad bug 1752434 in Evergreen "Search highlighting affects OPAC title display" [Low,Fix released] |
14:47 |
JBoyer |
gmcharlt++ |
14:47 |
* JBoyer |
was distracted |
14:48 |
frank_g |
JBoyer: terran gmcharlt pinesol_green thanks for the support |
14:54 |
kmlussier |
There are still issues even with that patch applied. bug 1770454 |
14:54 |
pinesol_green |
Launchpad bug 1770454 in Evergreen "Strange behavior for subject display in staff catalog" [Medium,Confirmed] https://launchpad.net/bugs/1770454 |
14:59 |
|
jihpringle joined #evergreen |
15:12 |
|
mmorgan joined #evergreen |
15:13 |
JBoyer |
So I'm fruitlessly looking for the template to edit to alter the html output of the supercat browse list. (server/opac/extras/browse/html/item-age/aou/offset/count |
15:14 |
JBoyer |
Anyone have a pointer for me to follow? I've noticed PINES has customized theirs. |
15:15 |
dbs |
JBoyer: I believe it's all in the Perl code, IIRC |
15:16 |
JBoyer |
That would explain my difficulty in tracking it down. I'll go grep about under perlmods/lib |
15:16 |
JBoyer |
dbs++ |
15:16 |
dbs |
http://git.evergreen-ils.org/?p=Evergreen.git;a=blob;f=Open-ILS/src/perlmods/lib/OpenILS/WWW/SuperCat/Feed.pm;h=56146cb988b39ca87db3eb5a5a2dfb55f0d84f5f;hb=refs/heads/rel_3_1#l635 |
15:16 |
dbs |
oh right, and then there's an XSLT that gets applied |
15:17 |
JBoyer |
Thanks! those 2 pieces of info have likely saved me a nice pile of time. |
15:17 |
JBoyer |
dbs++ # again! |
15:17 |
dbs |
my pleasure |
15:20 |
dbwells |
I don't know why we don't just use xsl for all of our templates. Oh wait, YES I DO. |
15:21 |
jeff |
dbwells++ |
15:22 |
JBoyer |
dbwells++ |
15:22 |
JBoyer |
The first thing I'm likely to change is the output format: HTML5. The second thing I'm likely to change is my mind. ;) |
15:23 |
Dyrcona |
:) |
15:23 |
JBoyer |
(Note: sloppy joke is sloppy. I'm not changing how any of this works, I'm just thinking about changing the doctype in the xsl.) |
15:26 |
terran |
frank_g: If you want |
15:27 |
terran |
frankg_g: Sorry, was called away. If you put a bug report in launchpad, I'll confirm it. |
15:38 |
mmorgan |
I'm having trouble with action triggers after a server relocation. After running for a while they seem to be stalled. |
15:39 |
mmorgan |
They're not changing state. |
15:40 |
mmorgan |
If I have a bunch of triggers in collected state, are they stuck? Or will they get processed? |
15:42 |
mmorgan |
s/triggers/events |
15:45 |
miker |
dbwells: we actually attempted that at the very beginning. we even experimented with delivering XML that referenced a stylesheet for the browser to apply, instead of doing it server side! |
15:47 |
miker |
XSLT is Turing-complete, what's the matter?! |
15:48 |
dbwells |
miker: They were the best of times, they were the worst of times :) |
15:49 |
* berick |
sends a 70M file through websockets to try and break it. |
15:50 |
miker |
berick: as a method param? |
15:50 |
dbwells |
berick: I first read that as "70 million files". berick doesn't mess around! |
15:50 |
berick |
miker: no, as a part of the response data |
15:50 |
miker |
ah |
15:50 |
miker |
poor browser! |
15:51 |
berick |
though maybe i'll try as a param too... |
15:51 |
berick |
dbwells: it may come to that :) |
15:53 |
|
rlefaive joined #evergreen |
16:01 |
berick |
finally finished. no issues, websockets fine. |
16:01 |
berick |
that's encouraging, anyway |
16:13 |
mmorgan |
Ok, so some of my triggers are running again, but I have a bunch of predue notices that are still stuck in state collected and reacting. Can I run the trigger command to pick those up or do I have to set their state back to pending in order to get them to process? |
16:15 |
|
rlefaive joined #evergreen |
16:19 |
|
rlefaive joined #evergreen |
16:25 |
|
rlefaive joined #evergreen |
16:42 |
|
rlefaive joined #evergreen |
16:48 |
|
rlefaive joined #evergreen |
17:02 |
mmorgan |
For the logs, I reset the states of the events to pending and was able to process them. Whew! |
17:04 |
kmlussier |
mmorgan++ |
17:05 |
|
rlefaive joined #evergreen |
17:09 |
|
yboston joined #evergreen |
17:19 |
|
mmorgan left #evergreen |
17:33 |
|
abowling left #evergreen |
18:05 |
|
rlefaive joined #evergreen |
18:30 |
pinesol_green |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
19:30 |
|
rlefaive joined #evergreen |
20:16 |
|
book` joined #evergreen |
20:25 |
|
rlefaive joined #evergreen |
20:28 |
|
rlefaive joined #evergreen |
20:57 |
|
gsams joined #evergreen |
21:13 |
|
Dyrcona joined #evergreen |