Time |
Nick |
Message |
00:54 |
|
sandbergja joined #evergreen |
03:23 |
|
jamesrf joined #evergreen |
03:27 |
|
yar joined #evergreen |
05:02 |
pinesol |
News from qatests: Failed configuring websockets <http://testing.evergreen-ils.org/~live/test.22.html#2019-07-23T04:57:34,222253532-0400 -0> |
07:00 |
|
Dyrcona joined #evergreen |
07:01 |
Dyrcona |
Well, I just installed the configuration in production to proxy websocket connections through Apache with mod_proxy_wstunnel. Guess I'll know in a few hours how well that works. |
07:02 |
Dyrcona |
If this works well for us after a few days/weeks, I'm going to make a branch to remove the other proxy configurations and recommend this one in OpenSRF. |
07:09 |
|
rjackson_isl joined #evergreen |
07:12 |
jeff |
Dyrcona: why? |
07:13 |
Dyrcona |
Why, what? Why proxy websockets? Why remove the other proxy methods in favor of Apache? |
07:17 |
Dyrcona |
jeff: If the question is why remove the other proxy methods and proxy with Apache, the answer is because doing it with Apache is much simpler: It's 3 or four lines of configuration in eg_vhost.conf and installing an Apache module. |
07:19 |
Dyrcona |
With the other proxy methods running on the same host, you have to install another software package, copy a configuration file, setup that configuration so it works, change your Apache ports configuration, restart everything. Why install something else and maintain it when what you've already got can do it? |
07:19 |
csharp |
Dyrcona: so does that alleviate the need for alternate ports? does websocketd still run on 7682? |
07:20 |
Dyrcona |
csharp: Yes. |
07:20 |
csharp |
interesting... |
07:20 |
Dyrcona |
It's easier if you configure websocekt ports as 443. |
07:20 |
csharp |
right |
07:20 |
Dyrcona |
I'm proxying websockets from 7682 to 7680, but the simpler configuration is from 443 to 7682. |
07:21 |
* csharp |
looks forward to seeing your branch |
07:21 |
Dyrcona |
I've tested this on training and on all of my virtual machines, so I know it works. |
07:21 |
Dyrcona |
The question is, what happens under load? |
07:21 |
csharp |
yeah |
07:22 |
Dyrcona |
Since we get the most complaints about white screens on Tuesday afternoons, I figure I'll find out pretty quickly if it's a problem or not. |
07:23 |
Dyrcona |
I want to point out that this is completely different from doing the apache2-websocets thing. That was necessary with Apache 2.2, but Apache 2.4 has a different module to proxy websockets. |
07:24 |
Dyrcona |
I think apache2-websockets worked better with 2.2 than with 2.4, at least that seemed to be my experience, but since all of the supported distros ship with 2.4, why not use mod_proxy_wstunnel if it works? |
07:59 |
Dyrcona |
Hmm... I have two action trigger runners going with identical options... I thought they were supposed to make a lockfile to prevent that. |
08:41 |
Dyrcona |
Cute! This is going on, too: 2019-07-23 07:15:18 util2 open-ils.cstore: [ERR :18048:oils_sql.c:5898:1563880501180463] open-ils.cstore: Error with query [SELECT * FROM action.hold_request_regen_copy_maps( '14549535', '{18390375,18390375}' ) AS "action.hold_request_regen_copy_maps" ;]: 3505685 3505685: ERROR: duplicate key value violates unique constraint "copy_once_per_hold"#012DETAIL: Key (hold, target_copy)=(14549535, 183 |
08:41 |
Dyrcona |
eady exists.#012CONTEXT: SQL function "hold_request_regen_copy_maps" statement 2 |
08:43 |
|
mmorgan joined #evergreen |
08:49 |
|
jvwoolf joined #evergreen |
08:55 |
Dyrcona |
berick: Did you ever open an Lp bug with your findings regarding problems with creating a/t events from other a/t events? |
08:55 |
Dyrcona |
I also seem to recall someone saying that storage(?) drones could somehow take on the listener's PID.... |
08:56 |
Dyrcona |
I'm definitely having problems on the utility server for the past few days, but no clear reason why. |
08:56 |
Dyrcona |
I can't just reboot it, either, since it is also the bricks' NFS server. |
08:56 |
Dyrcona |
I did just restart services, and I'm dealing with the aftermath of that. |
08:57 |
|
yboston joined #evergreen |
09:00 |
|
jvwoolf1 joined #evergreen |
09:45 |
Dyrcona |
Interestingly, people are reporting getting logged out of Evergreen "randomly" and this happened before I made the proxy change this morning. |
09:45 |
Dyrcona |
When I checked memcached yesterday, everything looked OK... |
09:46 |
Dyrcona |
I can't really tell if things are better on my utility server, either. When things go wrong, they all go wrong at once, and often when I change something totally unrelated, just to pickle the red herrings. |
09:48 |
Dyrcona |
Oh, nice. memcached died over night. |
09:49 |
Dyrcona |
Eh, no... too many open files on both of them... |
09:51 |
berick |
Dyrcona: i didn't open an LP. i've mentioned it a few times and no one else seemed to have the problem, and I never could figure out what the problem was, so I let it be. |
09:54 |
Dyrcona |
berick: OK. |
09:54 |
|
khuckins joined #evergreen |
09:55 |
Dyrcona |
Well, proxying via apache is a bust. I'm running max apache children. I was afraid that might happen. |
10:00 |
|
Christineb joined #evergreen |
10:11 |
csharp |
Dyrcona: :-( |
10:12 |
Dyrcona |
And, everything else seems to be going wrong at the same time, well nearly.... |
10:13 |
Dyrcona |
We also got a lot of these on both memcached servers: Jul 23 09:49:54 mem1 systemd-memcached-wrapper[962]: message repeated 30 times: [ accept4(): Too many open files] |
10:13 |
|
sandbergja joined #evergreen |
10:14 |
Dyrcona |
However, the latter actually started yesterday before I made the proxy change, so it may or may not be related, but I think with 450 apaches going.... |
10:15 |
Dyrcona |
Wow! curr_connections 699 |
10:15 |
Dyrcona |
curr_connections 708 |
10:17 |
|
jvwoolf joined #evergreen |
10:21 |
Dyrcona |
What's also interesting is now that I've undone the proxy setup, load is higher on the brick heads, but we're running our normal, 60 or so apache process per brick head. |
10:22 |
Dyrcona |
I think we need to replace ldirectord with haproxy, but I suppose I'll look at using haproxy or just nginx for proxying websocktes only. I've set that up on a test server, but dunno how it works under load. |
10:25 |
Dyrcona |
I assume that everyone is using nginx as a proxy per the README instructions? |
10:26 |
Dyrcona |
And, you're doing it on each brick head? |
10:28 |
Dyrcona |
Oh, nice.... /tmp/action-trigger-LOCK exists for a process that isn't running and that didn't stop our --run-pending a/t runner from starting up. I assumed it was the one that used that particular lock file because no granularity. |
10:30 |
berick |
we run nginx on each brick, but they are not yet under full browser client load |
10:30 |
Dyrcona |
Oh, and there are two of them running, again.... WTH happened? This started on Saturday. I didn't do anything on the server at all last week. |
10:31 |
Dyrcona |
berick: I don't know how many of our libraries are using the web staff client, but I'm not even sure that all of the small know it's a thing. |
10:31 |
Dyrcona |
Our bigger libraries are using it. |
10:32 |
Dyrcona |
csharp: How many of your members are using the web staff client? Do you know? |
10:34 |
mmorgan |
Dyrcona: FWIW all of our libraries are using the web client exclusively. |
10:34 |
Dyrcona |
mmorgan: You have 28 members? |
10:34 |
berick |
so far, nginx rarely even appears in 'top'. it's practically napping. i don't expect it to change significantly as the data (it's already essentially proxying) moves to websockets. |
10:36 |
mmorgan |
Dyrcona: close, 26 |
10:36 |
Dyrcona |
Oh, you lost a couple to HELM, too. :) |
10:36 |
* Dyrcona |
considers switching to Koha. |
10:37 |
mmorgan |
Sounds like you're having a Bad Evergreen Day. |
10:38 |
|
stephengwills joined #evergreen |
10:38 |
Dyrcona |
mmorgan: Do your members report "white screen" problems? Do you a proxy? If so, which one? |
10:40 |
Dyrcona |
Cool! Now, I've got 3 competing a/t runners.... |
10:41 |
mmorgan |
Dyrcona: We have not gotten many reports of white screens in production, we expected it to be a problem because we saw them on our training server, and warned our members, but it doesn't seem to be much of an issue. |
10:41 |
mmorgan |
I need to defer to mdriscoll (in a meeting) about the setup. |
10:44 |
Dyrcona |
mmorgan: Thanks. |
10:48 |
csharp |
Dyrcona: all are using the web client |
10:48 |
Dyrcona |
And what about your proxy setup? |
10:48 |
csharp |
we built a XUL client just in case but no one in the field has indicated that they need it |
10:49 |
csharp |
using nginx |
10:49 |
Dyrcona |
nginx on each brick head, right? |
10:49 |
csharp |
Dyrcona: yep |
10:50 |
csharp |
no complaints about white screens since moving away from Hatch local storage |
10:52 |
Dyrcona |
I probably need to reboot my utility server, but I can't. |
10:53 |
Dyrcona |
That means rebooting all of the brick head and drones because NFS. I need to set up a separate NFS server. |
10:53 |
berick |
Dyrcona: doesn't help you now, but we just migrated nfs and memcached to a VM that only does those 2 things specifically so we could more easily update/manage the utility server. |
10:53 |
berick |
you beat me to it :) |
10:54 |
Dyrcona |
:) |
10:54 |
Dyrcona |
We have memcached on two vms. |
10:55 |
Dyrcona |
I was considering hardware for NFS, but I suppose a VM could handle it. |
10:57 |
berick |
we used to run 2 memcache's, but losing 1 of 2 was practically as bad as losing 1 of 1 (almost worst, since it was slighly less obvious what the problem was). and either way, you have to stop everything and modify configs to get up and running again. |
10:57 |
berick |
after a while I failed to see the point of running 2 |
11:04 |
jeff |
Did I create a wishlist for persistent auth backend? |
11:05 |
* jeff |
looks |
11:05 |
jeff |
If not, I think I will try and do that today. |
11:06 |
Dyrcona |
Ah ha! The hourly action trigger runner removes the lock file, or at least the lock file for the one that I started manually has disappeared and the hourly a/t runner is also running. |
11:07 |
Dyrcona |
berick: We had 2.5GB in memcached between the two. I suppose I could cram all of that into 1 vm, but do I want to? |
11:08 |
Dyrcona |
This is the first time I recall having trouble with memcached here, and it was too many connections on both of them. |
11:08 |
Dyrcona |
Err, I worded that wrong. It was 2.5GB on EACH memcached server, so 5GB between the two. |
11:11 |
Dyrcona |
I thought the --granularity a/t runners are supposed to make their own lockfiles? |
11:11 |
Dyrcona |
Like action-trigger-HOURLY-LOCK or summat like that. |
11:13 |
jeff |
Something I noticed the other day is that action_trigger_runner.pl removes its lockfile as soon as it starts getting responses from open-ils.trigger.event.run_all_pending. That should happen after the events have moved from pending to collected, but before they've been fired / reacted to. I don't think that that's a problem (because I don't think any other runner will pick up those events now that they're |
11:13 |
jeff |
no longer pending), but it is something to keep in mind when looking at things. |
11:15 |
jeff |
Dyrcona: and yes, action_trigger_runner.pl should append the granularity to the lockfile name if --granularity and --granularity-only are set (and the latter is set always in current code if --granularity is true) |
11:15 |
jeff |
(don't use granularity of "0" :-) |
11:15 |
Dyrcona |
jeff: I don't think you're right about that. |
11:16 |
Dyrcona |
I know you looked yesterday, but last time I looked --granularity without --granularity-only behaves differently, and I'm witnessing it right now. |
11:16 |
jeff |
it does behave differently with --run-pending vs --process-hooks |
11:17 |
jeff |
i finished most of my digging the other day and re-read your words and i'm pretty sure you were correct in your statements from last month. |
11:17 |
Dyrcona |
I'm running --granularity without --granularity-only and I'm getting no lock file, plus it is deleting the lock file for the process with no granularity. |
11:18 |
* Dyrcona |
adds --granularity-only to the crontab entries. |
11:18 |
jeff |
the final thing that surprised me (and you noted it your comments) was that a --process-hooks without a granularity would create events for event defs that had a granularity value, which is different from how --run-pending works (you have to specify the granularity to get those events run) |
11:18 |
jeff |
weeird. |
11:18 |
jeff |
because: |
11:18 |
jeff |
$opt_gran_only = $opt_granularity ? 1 : 0; |
11:18 |
jeff |
$opt_lockfile .= '.' . $opt_granularity if ($opt_granularity && $opt_gran_only); |
11:19 |
Dyrcona |
jeff: That's not what happens. |
11:19 |
Dyrcona |
no granularity makes action-trigger-LOCK |
11:19 |
Dyrcona |
granularity without granularity-only makes no lock file. |
11:21 |
Dyrcona |
How 'bout we rewrite action_trigger_runner.pl to use Cronscript.pm, that would solve the lock file thing. |
11:22 |
Dyrcona |
And this comment suggest programming by coincidence and not design: #XXX need to figure out why this is required... |
11:25 |
Dyrcona |
And, those two lines, being where they are, *should* make --granularity-only obsolete, and I should get a .hourly lockfile, but I don't. |
11:25 |
Dyrcona |
I do get it for password reset with runs with --granularity-only, so something is definitely not happening with that code. |
11:25 |
jeff |
are you sure the lockfile isn't just being removed quickly? have you confirmed with inotify or strace or something that the lockfile isn't being created at all? |
11:26 |
jeff |
hrm. |
11:29 |
Dyrcona |
No, I haven't confirmed that, but the processes were both still running when it was gone. |
11:31 |
jeff |
That's expected if they had gotten to the point where open-ils.trigger.event.run_all_pending had returned a response (and it returns an early "found" response after collecting and before reacting). |
11:32 |
jeff |
This has relevance to something I'm currently working on, so I'm trying an empirical test or two here to compare results. |
11:34 |
Dyrcona |
So, why would I have no granularity a/t runners going with a lock file for for a defunct process in it? |
11:34 |
Dyrcona |
It seems to be working with a --granularity-only process. |
11:35 |
Dyrcona |
There are two places in there where the lock file can be unlinked. That's bad. |
11:35 |
* jeff |
nods |
11:35 |
Dyrcona |
And, that's probably a part of my touble. |
11:36 |
jeff |
Also, there's no trap for removing the lockfile when the script is interrupted by something unexpected (or unusual, like Ctrl-C). |
11:36 |
Dyrcona |
Right, I think that's where my stray lock file came from. |
11:36 |
jeff |
Could be! |
11:36 |
Dyrcona |
But that should have stopped the later ones from starting. |
11:37 |
Dyrcona |
Well, I know it is. I ran kill.... |
11:37 |
jeff |
And action_trigger_runner.pl would not have unlinked it unless it contained its pid. |
11:37 |
Dyrcona |
Right, but it should also not have started another runner, either. |
11:39 |
Dyrcona |
jeff: What do you think about reimplementing this with Cronscript.pm? |
11:39 |
Dyrcona |
You implied you were working on something related to a/t runner earlier. |
11:40 |
jeff |
It will start one, but if either --granularity or --process-hooks is in use on the new one, the lockfile will not cause the script to die, but will cause it to wait for $max_sleep, hoping that the lockfile will go away. |
11:42 |
Dyrcona |
That's just wrong. It should either die immediately or wait max sleep and then die if the lockfile is still there. |
11:42 |
Dyrcona |
It also looks like the lock files are deleted too soon, at least for my taste. |
11:42 |
jeff |
So if you have a stale lockfile sitting around, and action_trigger_runner.pl jobs that use that same lockfile name running from cron with --process-hooks or --granularity, they'll pile up doing little else but sleeping and then eventually die()ing with the "Someone else has been holding the lockfile..." message. |
11:43 |
Dyrcona |
Right, but that should go to an email that someone competent should get so they can see what's wrong and delete the lock file if need be. That's how everything else works on UNIX. |
11:43 |
jeff |
That's what I was calling attention to above, that the lock file is deleted after the events have been collected / had build_environment run, but before firing / reacting. |
11:43 |
Dyrcona |
Right. It should only be removed at the end, in my opinion. |
11:45 |
jeff |
That "Someone has been holding" die message gets emailed to the MAILTO in the cron job, in my experience. Also, I've seen the pattern elsewhere of monitoring the age of lock files and warning on long-lived ones. Not sure I've seen anyone look for lockfiles containing PIDs of no-longer-running processes, though it might help in this scenario. |
11:45 |
Dyrcona |
Right, I get them and I check. |
11:46 |
Dyrcona |
I'm not sure I'd want it to check the pid to see if the pid is running and proceed if the pid isn't running. The condition that caused the previous one to die may still hold true. |
11:46 |
jeff |
It is a bit unusual/surprising in terms of what I think of as a lockfile, and there isn't a comment in code to explain why, though I suspect it might be an optimization for large A/T runs where it's acceptable to have two sets of events being reacted to as long as no events are being handled by two instances of the script. |
11:47 |
jeff |
Oh, I wasn't suggesting that -- I was just suggesting monitoring on the "lockfile exists with PID that isn't running" and alerting on it, not saying "I can break this lock, the PIDs not here anymore". |
11:48 |
jeff |
But it was a musing, not a proposed solution. |
11:48 |
Dyrcona |
Ok. gotcha. |
11:48 |
Dyrcona |
Well, I'm gonna eat and clean up the mess I made after lunch. :) |
11:48 |
jeff |
Enjoy! |
11:48 |
Dyrcona |
Have you opened a Lp for your ideas? I would subscribe to your newsletter. :) |
11:48 |
jeff |
heh |
11:49 |
|
yboston joined #evergreen |
12:01 |
|
aabbee joined #evergreen |
12:38 |
Dyrcona |
Still don't know why it's so slow all of a sudden. |
12:39 |
Dyrcona |
Couldn't be the 19,511,527 rows in action_trigger.event could it? :) |
13:12 |
|
jvwoolf joined #evergreen |
13:28 |
* dbs |
wonders how Dyrcona enjoyed eating his mess |
13:29 |
|
yboston joined #evergreen |
14:07 |
Dyrcona |
The hourly granularity seems to be doing something while the no granularity --run-pending that I started an hour and a half ago seems to have stopped doing anything after the lock file went away. |
14:18 |
jeff |
What kind of events is the non-granular one expected to catch? Is the action_trigger_runner.pl process still present for that one? |
14:19 |
Dyrcona |
jeff: We run a non-granular --run-pending for things like hold notifications, etc. |
14:19 |
Dyrcona |
I've been looking at counts of the event states in the database as the runners are going. |
14:30 |
jeff |
your non-granular --run-pending should have a line in logs ending in: CALL: open-ils.trigger open-ils.trigger.event.run_all_pending , 0 |
14:33 |
jeff |
grepping for run_all_ in your logs for the utility server (wherever open-ils.trigger called by your utility server is logging) shouldn't have too much extra noise. You should have a "trigger: run_all_events completed firing events" line that matches your open-ils.trigger PID that was logged on the "run_all_pending , 0" pattern above. |
14:33 |
jeff |
Though, if you see another call with that same PID before you see the "completed firing events" line, it may not have made it that far. |
14:34 |
|
khuckins joined #evergreen |
14:34 |
jeff |
I'm not sure what would stall it in a way that you'd then not see counts change as states changed. |
14:34 |
jeff |
But I'd be interested to find out. :-) |
14:36 |
Dyrcona |
jeff: States change for a while, then nothing changes, except the number of pending events, until I start another runner. |
14:36 |
Dyrcona |
The stalled is still sitting there doing I don't know what. |
14:38 |
jeff |
At this point I'd try to find the PID for the open-ils.trigger backend that was servicing that action_trigger_runner.pl (using the "open-ils.trigger.event.run_all_pending , 0" pattern) and spend some time looking at logs. |
14:39 |
jeff |
Do you have events stuck in a 'found' state? |
14:40 |
Dyrcona |
Yes, 2. |
14:40 |
jeff |
recent dates (any) on those two events? if so, and if they stick around for much longer, I'd look and see if they were related to your stalled action_trigger_runner.pl |
14:42 |
Dyrcona |
Jeff, I know they are. They were added today, and reset them to pending at least once. |
14:42 |
jeff |
ah |
14:42 |
Dyrcona |
Last time I tried resetting them to pending, they went straight to found. |
14:42 |
* Dyrcona |
is starting to really hate Perl. It looks someone let a chimpanzee loose on a typewriter. |
14:43 |
Dyrcona |
Ok. That last comment wasn't helpful. |
14:44 |
Dyrcona |
jeff: Do you have any suggestions? |
14:44 |
jeff |
interesting. are they unusual or uncommon in any other way that you can see, like based on an infrequently-encountered event def that has a cleanup of DeleteTempBiblioBucket or something? |
14:44 |
Dyrcona |
They're courtesy notices. |
14:44 |
jeff |
the idea there being that something like a cleanup might be blowing up. |
14:44 |
jeff |
huh. |
14:48 |
Dyrcona |
They're for the same user, two different circs, so they have the same output. |
14:49 |
Dyrcona |
Interesting, I fired the one and it went to valid. |
14:50 |
Dyrcona |
I fire it again and it goes to found. |
14:50 |
jeff |
I'd be tempted to kill off the hung runner, reset the events to pending, and then re-run with --debug-stdout and perhaps --verbose for good measure. |
14:50 |
Dyrcona |
I need to fire group events for the two don't I. |
14:51 |
Dyrcona |
Well, looks like I may have two hung runners at this point. |
14:52 |
Dyrcona |
Be nice if the Trigger methods had some documentation for OpenSRF to show. |
14:54 |
Dyrcona |
That put them to complete. |
14:55 |
jeff |
debugging a running system is fun. you probably want to turn off the normally scheduled non-granularity --run-pending runner, but you want to make sure to not forget it in its off state. :-) |
14:57 |
Dyrcona |
Well, I had it off, but turned it back on to pick up the back log, but looks like all 3 are stuck. |
14:57 |
Dyrcona |
I've got a lot of collected events that arent' moving. |
15:02 |
jeff |
I'd again recommend looking at logs, but that's going to be time consuming. |
15:02 |
|
mmorgan1 joined #evergreen |
15:05 |
csharp |
I find it useful to throw an SQL query into "watch" (e.g. select ed.id, ed.name, e.state, count(*) from action_trigger.event_definition ed join action_trigger.event e on (e.event_def = ed.id) where ed.id = 23 group by 1, 2, 3 order by 2, 3; ) |
15:06 |
jeff |
csharp++ \watch is an underutilized tool, imo. :-) |
15:06 |
csharp |
agreed |
15:07 |
|
sandbergja joined #evergreen |
15:11 |
Dyrcona |
Trigger seems to still be working on the events even though the runners were stopped. |
15:12 |
Dyrcona |
I'm using a csore script to reset the events to pending and some are coming back complete. |
15:13 |
Dyrcona |
heh... s/csore/cstore/, though I feel sore about now. :) |
15:34 |
Dyrcona |
So far no debug output though events are changing in the database. |
15:35 |
jeff |
I wouldn't expect debug output until after collection is done and the first response comes back from open-ils.trigger.event.run_all_pending, which will be {"status":"found"} |
15:35 |
jeff |
That's roughly when the lock file is removed by action_trigger_runner.pl |
15:38 |
jeff |
I should have asked before, but do you run values >1 for collection and/or reaction? |
15:39 |
jeff |
(defined in opensrf.xml, in app_settings for open-ils.trigger) |
15:39 |
Dyrcona |
Yeah, OK. |
15:39 |
|
Dyrcona joined #evergreen |
15:40 |
* Dyrcona |
sighs. |
15:41 |
Dyrcona |
Glad I know enough to use tmux. |
15:42 |
Dyrcona |
Yeah, now there's a lot of junk on the screen. :) |
15:46 |
Dyrcona |
This isn't as useful as a real debugger. I should try getting the Perl debugger working on this again. |
15:48 |
Dyrcona |
There just seems to be a long pause in between each output. |
15:51 |
jeff |
during the phase where it's reacting? SendEmail passing to something that's taking a while to return because of something like a DNS lookup timeout? |
15:53 |
Dyrcona |
Dunno. We're only using parallel of 3. |
15:53 |
Dyrcona |
We have history with that. Perhaps, I should set it to 6? |
15:54 |
Dyrcona |
It could be email taking a while. |
15:54 |
Dyrcona |
We've been having fun the Baker & Taylor's FTP lately, maybe something's wonky on the network. |
15:56 |
Dyrcona |
I'm seeing 3 or 4 emails hit the mail server about the same time. |
16:16 |
jeff |
Forgot to quantify: what's "a long pause" in seconds, and is "between each output" lines of --debug-stdout? I don't know offhand how those are batched, especially in a parallel setup. Sorry. I was mostly thinking it was going to be useful when you had a pair of events that seemed to be hanging the runner. |
16:17 |
jeff |
But it would be interesting to see if telnetting to port 25 of your configured smtp server from your utility server returns a 220 banner quickly or takes a while to time out on trying to look up the hostname for the ip of your utility server or similar. |
16:19 |
jeff |
(our utility server is the configured smtp server, and then it relays to another smtp server.) |
16:30 |
Dyrcona |
Ours is the same. |
16:30 |
Dyrcona |
I'm tailing the mail log and that seems to be going by much more quickly than the action trigger output. |
16:31 |
Dyrcona |
It's getting about 3 to 4 messages at a time and taking about 5 to 6 seconds to complete them. |
16:32 |
Dyrcona |
That's total. It's maybe a second or two per message. We're relaying through someone else. |
16:49 |
|
b_bonner joined #evergreen |
16:49 |
|
mnsri_away joined #evergreen |
16:51 |
|
rashma_away joined #evergreen |
17:05 |
|
mmorgan joined #evergreen |
17:06 |
|
jvwoolf left #evergreen |
17:07 |
|
mmorgan left #evergreen |
17:32 |
pinesol |
News from qatests: Failed configuring websockets <http://testing.evergreen-ils.org/~live/test.22.html#2019-07-23T17:03:45,457223219-0400 -0> |
17:41 |
|
sandbergja joined #evergreen |
18:09 |
jeffdavis |
So, it looks like the master branch in working/Evergreen.git hasn't received updates in a couple of months. |
18:10 |
jeffdavis |
It used to auto-update from the main Evergreen repo, I believe. |
18:22 |
|
sandbergja joined #evergreen |
19:01 |
Dyrcona |
So, forty minutes after the last output from the a/t runner, and about that long after it stopped, it looks like events are still being processed. |
19:01 |
bshum |
jeffdavis: Oh that's kind of interesting to see |
19:01 |
bshum |
I wonder if something changed when we moved the git server |
19:10 |
|
homerpubliclibra joined #evergreen |
19:12 |
jeff |
likely! |
19:13 |
Dyrcona |
Um... working and and origin are on the same server, though... |
19:14 |
Dyrcona |
Is github up to date? |
19:14 |
Dyrcona |
Could be some post push hooks are missing. |
19:35 |
* csharp |
looks |
19:35 |
csharp |
appears to be firewall denials - I'll put in a ticket |
19:35 |
csharp |
*sigh* |
19:37 |
csharp |
github is up to date, which adds credibility to my firewall theory |
19:38 |
csharp |
oh - interesting - the git server can clone from github, but not from itself |
19:39 |
csharp |
git clone https://github.com/evergreen-library-system/Evergreen.git works fine |
19:39 |
csharp |
git clone git://git.evergreen-ils.org/Evergreen.git hangs indefinitely |
19:46 |
csharp |
ah - not firewall - hosts file |
19:47 |
csharp |
/etc/hosts had the old IP so it wasn't resolving |
19:48 |
csharp |
not sure how it updates... I'll dig into it |
19:51 |
bshum |
csharp++ |
19:53 |
Dyrcona |
I think it fires when someone pushes. |
19:54 |
csharp |
quick, somebody push!! |
19:54 |
Dyrcona |
It's a post receive hook in git. |
19:54 |
csharp |
I see |
19:54 |
csharp |
I was just looking at that |
19:55 |
csharp |
well, if it doesn't fix itself after the next push, and I don't notice, please someone let me know and I'll resume investigation |
19:55 |
csharp |
pretty sure the bad IP was the issue though |
21:27 |
|
stephengwills joined #evergreen |