Time |
Nick |
Message |
02:06 |
|
Bmagic joined #evergreen |
02:51 |
|
jvwoolf joined #evergreen |
04:31 |
pinesol_green |
News from qatests: Test Success <http://testing.evergreen-ils.org/~live> |
06:50 |
|
JBoyer joined #evergreen |
07:09 |
|
rjackson_isl joined #evergreen |
07:18 |
|
agoben joined #evergreen |
08:17 |
|
jvwoolf joined #evergreen |
08:19 |
csharp |
ugh - another action trigger explosion over the weekend |
08:20 |
csharp |
something fails, kills the cstore drone and they spawn more cstores into oblivion and nothing else works |
08:20 |
* csharp |
works on re-running processes, then trying to nail down the "something" referred to in last statement |
08:37 |
csharp |
we might need to re-evaluate using A/T for anything other than truly convenience functions - something this fragile can't be depended on for anything mission critical |
08:37 |
csharp |
(sorry to all those I recommended this to at the conference) |
08:41 |
|
mmorgan joined #evergreen |
08:44 |
JBoyer |
csharp, just to check, how many granularities are you using? it's not really reasonable to only stick to a single Daily, for instance, once you reach a large size. |
08:44 |
JBoyer |
(I also have a lot of questions re: max children size and etc. but I realize you might not be looking for troubleshooting help right now.) |
08:44 |
csharp |
we've got it as granular as it can go |
08:45 |
csharp |
JBoyer: thanks, I will definitely appreciate the help after I finish the rescue/salvage operation :-) |
08:47 |
csharp |
honestly, I need to mop up all the mess, then create something that nagios/icinga can use to parse osrf_control --diagnostic to warn/crit on drone number thresholds |
08:48 |
csharp |
if someone already has something like that and is willing to share, that would rock |
08:49 |
csharp |
despite my temper tantrums this morning, I don't see us abandoning A/T anytime soon |
08:49 |
|
bos20k joined #evergreen |
08:53 |
|
jonadab joined #evergreen |
08:56 |
JBoyer |
There was an LP bug about adding nagios / icinga check support to osrf_control at some point, but I don't remember the status currently. |
09:08 |
|
jvwoolf1 joined #evergreen |
09:13 |
|
jonadab joined #evergreen |
09:21 |
|
Dyrcona joined #evergreen |
09:22 |
|
terran joined #evergreen |
09:38 |
|
yboston joined #evergreen |
09:43 |
|
kmlussier joined #evergreen |
09:45 |
|
jonadab joined #evergreen |
09:59 |
|
maryj joined #evergreen |
10:03 |
|
collum joined #evergreen |
10:04 |
|
ningalls joined #evergreen |
10:11 |
|
mmorgan joined #evergreen |
10:35 |
|
Christineb joined #evergreen |
10:36 |
|
jonadab joined #evergreen |
10:42 |
bshum |
Evergreen 2.10 EOL this week. Hmm, that always creeps up so fast :) |
10:42 |
|
mmorgan1 joined #evergreen |
10:53 |
|
sandbergja joined #evergreen |
10:58 |
csharp |
hmm - looks like my A/T outage might have been caused by the fine generator timing out (cstore) |
10:58 |
csharp |
still doing forensics |
10:58 |
csharp |
seriously considering creating an A/T server that is separate from other utility functions |
11:01 |
JBoyer |
1. Do they overlap? and 2. What all is done on this server now? holds / fines / A/T? |
11:01 |
JBoyer |
and single process or parallel? |
11:03 |
csharp |
1. yes 2. all of the above and holds/fines run parallel procs |
11:04 |
csharp |
this is a pretty beefy server, but it's possible they're competing for resources |
11:07 |
JBoyer |
We still haven't bothered to go parallel here, but it may not hurt. I do remember that the parallel processing loop had a timeout that wasn't present at all for the single process run. Moving all of the A/T stuff well after the fine generator may also help. |
11:08 |
csharp |
I'm just going to create another cluster node that runs A/T |
11:21 |
Dyrcona |
Something in 2.12 requires Pg 9.3, doesn't it? |
11:21 |
Dyrcona |
Or, is that going into 3.0. |
11:23 |
bshum |
I think 2.12 has something in it that requires PG 9.3 |
11:23 |
bshum |
Which is why we made it the min version |
11:24 |
Dyrcona |
That's what I seem to be remembering. |
11:24 |
Dyrcona |
Thanks. |
11:31 |
kmlussier |
I don't remember there being a specific feature that led to the decision. But it became the minimum required version in 2.11, and features may have been built since then that require 9.3. |
11:42 |
Dyrcona |
CONFLICT (content): Merge conflict in ChangeLog |
11:42 |
Dyrcona |
Well, that's fun.... Going from rel_2_12 to rel_2_12_2, but guess I'm actually ahead. |
11:57 |
|
mllewellyn joined #evergreen |
12:04 |
bshum |
kmlussier: I guess you're right, in re-reading old emails, it looks like the main reason we named PG 9.3 the minimum version was due to 9.1 being EOL by PostgreSQL community. |
12:04 |
bshum |
And that new feature stuff for 3.0 needs PG 9.4+ ? |
12:05 |
kmlussier |
bshum: Yes, for 9.4, there definitely is a new feature that requires it. |
12:06 |
csharp |
@marc 022 |
12:06 |
pinesol_green |
csharp: The ISSN, a unique identification number assigned to a continuing resource. (Repeatable) [a,y,z,2,6,8] |
12:13 |
csharp |
how would I find what's generating this call?: request en-US open-ils.cstore.json_query.atomic {"where":{"-or":[{"-and":[{"tag":"020"},{"subfield":"a"}]},{"-and":[{"tag":"022"},{"subfield":"a"}]},{"-and":[{"tag":"024"},{"subfield":"a"},{"ind1":1}]}],"record":"2048562?title=For%20the%20Philo%20of%20Philosophy&description=Thursday%2C%20June%2015%2C%206%3A00%20-%207%3A30%20p.m.%20Multipurpose%20Room%20C.%20 |
12:13 |
csharp |
Ever%20wonder%20what%20philosophers%20wonder%20about%3F%20Then%20our%20new%20book%20discussion%20group%2C%20For%20the%20Philo%20of%20Philosophy%2C%20is%20for%20you!%20We'll%20read%20philosophy%20books%20from%20the%20ancient%20Greeks%20to%20modern%20times.&enclosure=http%3A%2F%2Fgapines.org%2Fopac%2Fextras%2Fac%2Fjacket%2Fmedium%2Fr%2F2048562&pubDate=Thu%2C%2008%20Jun%202017%2019%3A40%3A50%20-0400"},"from |
12:13 |
csharp |
":"mfr","select":{"mfr":["tag","value"]},"order_by":[{"class":"mfr","field":"id"}]} |
12:14 |
csharp |
it's happening once every several seconds |
12:14 |
csharp |
if it were in activity.log, I'd be able to find the IP, but no such luck |
12:15 |
jeff |
that's likely to be an AC lookup. |
12:16 |
berick |
csharp: looks like maybe an added content isbn, etc. lookup |
12:16 |
berick |
but using a bugus record id |
12:16 |
berick |
from a poorly formatted tpac url |
12:16 |
csharp |
ah |
12:16 |
jeff |
i've been meaning to either strip or reject "bad" looking values there. |
12:17 |
jeff |
before they make it so far as to make those annoying log entries. |
12:17 |
csharp |
this is where the text included comes from: http://www.athenslibrary.org/books-more/book-clubs/philo-athens |
12:17 |
jeff |
(and indeed in this case they should be just annoying log entries, thankfully) |
12:17 |
csharp |
and the record is for the top title listed there |
12:17 |
csharp |
but I can't see where the URL is coming from |
12:24 |
JBoyer |
csharp, a grep '2048562?' gateway.<whatever>.log doesn't find anything? |
12:26 |
csharp |
JBoyer: nope :-( |
12:26 |
JBoyer |
Ick. :/ |
12:31 |
berick |
what about apache access log? |
12:31 |
mmorgan |
csharp: https://www.screencast.com/t/dyjnugMcFY |
12:31 |
mmorgan |
"Multipurpose Room" suggested a calendar entry somewhere. Could that be messed up? |
12:32 |
csharp |
mmorgan: oh - interesting |
12:32 |
csharp |
berick: yes! found the IP |
12:32 |
csharp |
for some reason I forget about those :-/ |
12:34 |
jeff |
yeah, those AC lookups would not hit the gateway, thus no gateway logs. |
12:41 |
* csharp |
calls Athens, drops mic, walks away |
12:42 |
csharp |
... and back to A/T fixin' |
13:19 |
csharp |
for the logs, regarding the crazy AC search: "It was a hotlink that someone pasted into our digital signage. It was pulling down a book cover image every time it refreshed. Sorry about that! We're making sure it doesn't happen again." |
13:26 |
|
jonadab joined #evergreen |
14:20 |
|
b_bonner left #evergreen |
14:22 |
Dyrcona |
@blam Somebody |
14:22 |
pinesol_green |
Dyrcona: We're going to need a bigger boat. |
14:22 |
Dyrcona |
@blame Somebody |
14:22 |
pinesol_green |
Dyrcona: Somebody stole bradl's tux doll! |
14:22 |
kmlussier |
@blame [somebody] |
14:22 |
pinesol_green |
kmlussier: I see nothing, I know nothing! WILL PERISH UNDER MAXIMUM DELETION! DELETE. DELETE. DELETE! |
14:23 |
JBoyer |
@blame [who] |
14:23 |
pinesol_green |
JBoyer: (who [<channel>] <question>) -- Answers <question> with a random nick from <channel>. <channel> is only necessary if the message isn't sent in the channel itself. stole bshum's tux doll! |
14:23 |
JBoyer |
Sad Trombone. |
14:24 |
csharp |
@blame [someone] |
14:24 |
pinesol_green |
berick forgot to give the gerbils their chocolate-frosted sugar bombs |
14:24 |
JBoyer |
csharp++ # irc hero |
14:26 |
bshum |
@who is [someone]'s hero |
14:26 |
pinesol_green |
is wsmoak 's hero. |
14:27 |
wsmoak |
:) |
14:30 |
kmlussier |
There are several people in here whom I might characterize as a hero. serflog is not one of them. |
14:31 |
bshum |
Since serflog is not a person, per say, I would agree with that statement kmlussier :) |
14:31 |
Dyrcona |
bshum: How bio-centric of you... ;) |
14:31 |
Dyrcona |
bots are people, too. :) |
14:32 |
kmlussier |
bshum: Then again, I'm always very happy when I find an answer in the logs. So maybe serflog is a hero. |
14:46 |
|
kmlussier joined #evergreen |
15:02 |
|
mmorgan1 joined #evergreen |
15:41 |
|
collum_ joined #evergreen |
15:41 |
|
Jillianne joined #evergreen |
16:11 |
JBoyer |
jeffdavis, since I know you've also seen services crash, can you check something for me? I happened to catch this go by today in my osrferror log: |
16:11 |
JBoyer |
No initial XMPP response from server |
16:11 |
JBoyer |
Exception: OpenSRF::EX::Jabber 2017-06-12T11:55:13 OpenSRF::Transport::SlimJabber::Client /usr/local/share/perl/5.18.2/OpenSRF/Transport/SlimJabber/Client.pm:162 Jabber Exception: Could not authenticate with Jabber server: |
16:11 |
JBoyer |
server: child process died: Exception: OpenSRF::EX::Jabber 2017-06-12T11:55:13 OpenSRF::Server /usr/local/share/perl/5.18.2/OpenSRF/Server.pm:495 Jabber Exception: Could not authenticate with Jabber server: |
16:12 |
JBoyer |
I'm wondering if it's possible for a higher-use service like search to get caught out between a successful fork() but the child dying before the syswrite() call. |
16:12 |
JBoyer |
Also, |
16:12 |
JBoyer |
berick++ |
16:13 |
JBoyer |
for the OpenSRF logging improvements. I'm going to get those in place ASAP to see what's what. |
16:17 |
berick |
JBoyer: interesting.. since jabber connecting dance is async, i could see that being the problem. |
16:18 |
JBoyer |
Yeah, the fork is fine and the child is returned because it's a higher demand period (i.e. not on the inactive list), for some reason it can't auth, boom, open-ils.search blows up. :/ |
16:19 |
JBoyer |
erlang.log is useless, ejabberd.log shows several connections, but they all appear to have the expected "Accepted legacy authentication for opensrf@..." lines with them. |
16:20 |
* JBoyer |
is planning to put together a big 'ol text document for the LP bug) |
16:31 |
pinesol_green |
News from qatests: Test Success <http://testing.evergreen-ils.org/~live> |
16:31 |
jeffdavis |
JBoyer: I'm not seeing errors like that in our logs |
16:32 |
Dyrcona |
JBoyer: I've only ever seen messages like that when ejabberd has crashed or hit some limit, that I recall. |
16:33 |
Dyrcona |
And, I haven't seen them in a while, except once recently, when I missed one of the settings on a test vm. |
16:33 |
Dyrcona |
You still may have encountered something new. |
16:34 |
berick |
so, i was testing hold targeter here recently and each parallel process was fetching 100k+ hold ids in "substream" mode (1 jabber message per id). this caused ejabberd to lock up for severl seconds while it routed all the messages. I wonder if the new bundling/chunking osrf 2.5 changes are having a simimlar effect. messages that used to be large, are now a blast of smaller messages, adding load to |
16:34 |
berick |
ejabberd. |
16:34 |
Dyrcona |
That would make a lot of sense. |
16:34 |
Dyrcona |
Or does make a lot of sense. |
16:35 |
miker |
berick: have you removed all shapers? |
16:36 |
miker |
because the "locking up" or pausing is, IME, the shaper saying "HOLY CRAP you're over the limit. I'll just pause you for a bit while the avg-over-time comes down" |
16:37 |
berick |
miker: we're not on osrf 2.5 here. just upgraded to eg 2.9 this weekend, woot! I mentioned as an example of ejabberb pooping itself. we have typical 2.4-era ejabberd confis |
16:38 |
berick |
er, configs |
16:39 |
berick |
@band Ejabberd Confit |
16:39 |
miker |
berick: oh, you can remove shapres from ejabberd whenever you like. no opensrf changes required |
16:39 |
Dyrcona |
pinesol_green: Are you awake? |
16:40 |
pinesol_green |
Dyrcona: You keep using that word. I do not think it means what you think it means. |
16:40 |
jeff |
pinesol_green++ appropriate |
16:40 |
pinesol_green |
jeff: I eat more coconut cream pie before breakfast than most people eat all day |
16:40 |
berick |
miker: hmm, wounldn't the point of such a shaper be to prevent that type of lockup? you got too much resources, I'm going to talk to someone else now? |
16:40 |
berick |
oh, but it' sprobably per user |
16:40 |
berick |
and everyone's opensrf.. |
16:40 |
gmcharlt |
berick: exactly |
16:40 |
miker |
berick: and, in fact, the traffic profile over ejabberd becomes smoother with 2.5, because there are no 2M+ stanzas floating around |
16:41 |
jeff |
yeah, the ejabberd shapers really aren't meant to be for resource utilization so much as flood/spam protection. similar to IRC sendq limits and rate limiting. |
16:41 |
berick |
JBoyer: so maybe try removing ejabberd traffic shapers? ^-- |
16:41 |
Dyrcona |
@band add Ejabberd and the Shapers |
16:41 |
pinesol_green |
Dyrcona: Band 'Ejabberd and the Shapers' added to list |
16:44 |
|
b_bonner joined #evergreen |
16:44 |
Dyrcona |
@blame [band] |
16:44 |
pinesol_green |
Dyrcona: Forget it, Jake. It's just National Donut Day. |
16:44 |
Dyrcona |
hah! |
16:44 |
jeff |
though now having made that claim i'm not finding an authoritative reference to point to, so... here's some salt. |
16:45 |
Dyrcona |
jeff: It makes sense though, given that the primary purpose of ejabberd seems to be chat. |
17:01 |
|
mmorgan joined #evergreen |
17:04 |
|
mmorgan left #evergreen |
17:10 |
|
maryj joined #evergreen |
18:31 |
|
b_bonner left #evergreen |
19:51 |
|
jvwoolf joined #evergreen |
21:11 |
|
Freddy_Enrique joined #evergreen |
21:19 |
Freddy_Enrique |
Hi Everyone! :) |
22:20 |
|
genpaku joined #evergreen |
22:34 |
|
Jillianne joined #evergreen |