Time |
Nick |
Message |
07:06 |
|
collum joined #evergreen |
07:58 |
|
Rogan joined #evergreen |
08:20 |
|
BDorsey joined #evergreen |
08:36 |
|
mmorgan joined #evergreen |
08:47 |
|
mantis1 joined #evergreen |
08:53 |
|
mantis2 joined #evergreen |
09:05 |
|
rfrasur joined #evergreen |
09:16 |
|
Dyrcona joined #evergreen |
09:45 |
Dyrcona |
Does the lock file for an a/t runner only stick around during the process-hooks part? (I suppose I should check the code.) |
09:54 |
Dyrcona |
Yeah, the run_pending function removes the lockfile if the PID matches the running process. I recall some discussion about this before. I'll try the IRC logs. |
09:56 |
Dyrcona |
My problem is that I set up a process to "fix" stuck auto-renewal events to run at 8:00 am every day. Looking in the database, some of those events didn't get complete until 8:21 am this morning. My fix program uses the same lockfile as the a/t runner so they won't run at the same time. |
09:56 |
jeff |
grabbing some old notes, I have: |
09:56 |
jeff |
> lockfile is checked and removed (if contains our pid), generally on first response, then check_lockfile is set to zero and we don't check again. |
09:56 |
jeff |
> Do we delete the lock file early because once we've gotten one response all the work has been done and there's no need for the lock file to prevent something like overlap? |
09:56 |
Dyrcona |
However, if the run-pending stage removes the lock file. My program may break stuff. |
09:57 |
Dyrcona |
jeff: I'll look more closely, but I think pending events are still being processed. |
09:58 |
Dyrcona |
Also, today is an exceptional day. Because of the long weekend we had nearly 40,000 auto-renewals to process. Long weekends happen often enough that I should somehow take them into account. |
09:58 |
Dyrcona |
I ran the fix process manually today around 9:20 am because the check process said 8 events were stuck. Tomorrow would be the first day that it runs automatically. |
09:59 |
sharpsie |
Dyrcona: I've seen the lockfile absent with the A/T process still running many times |
09:59 |
sharpsie |
it's always confused me but I've never dug in |
10:00 |
Dyrcona |
So, jeff's observation that the lockfile is removed at the first response is true, but I don't think all of the work has been done. |
10:03 |
jeff |
yeah, I mentally had "work" in quotes there. :-) |
10:03 |
Dyrcona |
I think the logic goes that it does no harm if another process starts working on pending events, assuming it is using open-ils.trigger calls to do so. My process, however, doesn't do that. It "resets" some events using cstore and then fires them again. |
10:03 |
* jeff |
nods |
10:06 |
Dyrcona |
Having the lockfile stick around until it's finished might also interfere with a scheduled --run-pending, like CWMARS has for every half hour.... |
10:08 |
Dyrcona |
If that starts up, and there's a lockfile, it might delay some events from being processed. |
10:09 |
Dyrcona |
It shouldn't hurt things like a daily granularity-only run, which I'm looking at in my case. Do we want to complicate the lock file code, though, to do different things if there's a granularity or not? |
10:13 |
Dyrcona |
I have some options: 1) Revert my crontab change so that a check SQL runs automatically and I still run the fix program manually, 2) change the time that the fix program runs in the schedule, 3) change the fix program to somehow wait for the currently a/t process to finish (there are a couple of way that I can do this besides the lock file), or 4) leave things as they are and deal with the mess on days when the daily even |
10:13 |
Dyrcona |
s for a long time. |
10:15 |
Dyrcona |
I think I'll go with #3 and have it also wait for a running a/t runner process with the "--granularity daily" option to finish before it starts its work. First thing it does is run a cstore json query to see if it has any work to do. |
10:16 |
Dyrcona |
jeff++ sharpsie++ |
10:38 |
|
dguarrac joined #evergreen |
10:51 |
|
BrianK joined #evergreen |
11:08 |
|
kworstell-isl joined #evergreen |
11:09 |
|
smayo joined #evergreen |
11:20 |
|
kworstell-isl joined #evergreen |
11:30 |
|
jihpringle joined #evergreen |
11:54 |
|
Stompro joined #evergreen |
12:31 |
|
collum joined #evergreen |
12:41 |
|
collum joined #evergreen |
12:49 |
Dyrcona |
I am testing my modified fix script on a VM with data from Sunday, so it has plenty of auto-renewals. |
12:51 |
Dyrcona |
Thousands of events have gone from "pending" to "collected" state, and the lock file is still there. I guess I can look at the run all pending events code to see when the first response comes back. |
12:52 |
Dyrcona |
My use of the same lock file does prevent my fix program from running: Script already running with lockfile /tmp/action-trigger-LOCK.daily at /usr/local/share/perl/5.30.0/OpenILS/Utils/Cronscript.pm line 151. |
13:12 |
|
pinesol joined #evergreen |
13:14 |
|
sharpsie joined #evergreen |
13:16 |
Dyrcona |
The answer to how long does it take for the lock file to go away is "It depends." The first event has to be processed and respond. |
13:19 |
Dyrcona |
First, they are all collected, and then they're passed off to the reactor(s). One of them has to finish, so they all have to be "collected," first. |
13:20 |
Dyrcona |
Yeahp. I now have some "complete" and other states, and the lock file is gone. |
13:21 |
Dyrcona |
And, my fix-autorenewals script is now just hanging out, waiting on the process to go away. |
13:22 |
Dyrcona |
I wonder if some of the issues with events being left in strange states has to do with the Multisession code. I seem to recall someone suggesting that before. |
13:33 |
berick |
Dyrcona: easy enough to experiment by changing opensrf.xml configs for that. |
13:34 |
Dyrcona |
berick: Yeah. I'm thinking about it. I have multiple vms and multiple copies of the data to play with. |
13:38 |
Dyrcona |
I should set two vms up and run the production cron on them for a week or so, then compare the results. |
13:39 |
Dyrcona |
It looks liked the first grouped event going to found may be enough to get rid of the lock file. I wasn't watching that closely. |
13:41 |
Dyrcona |
Hm.. maybe not "the first." It looks more like after all of the grouped events are 'found'. |
13:51 |
|
_collum joined #evergreen |
14:12 |
|
eby joined #evergreen |
14:16 |
Dyrcona |
I'm also testing that filter on 2-day pre-due courtesy notices that I mentioned last week. On the vm where I'm using it, it's processing about 15,000 fewer of those events than on the other. |
14:17 |
Dyrcona |
Guess I'll find something else to do and come back to that in an hour or so. :) |
14:52 |
|
berick joined #evergreen |
15:00 |
Bmagic |
chromeos impresses me |
15:03 |
Bmagic |
4GB ram, a dinky 2ghz "mediatek" arm cpu. < $300. 13 hour battery, 14 inch screen. Install the linux subsystem, then you install any* ubuntu GUI program you need, AKA Quassel Client, Remote Destop, whatever. It's fast, boots in a few seconds. |
15:05 |
Bmagic |
You're not going to run Evergreen on it, but that's what the remote desktop is for. Remote a big rig somewhere. |
15:10 |
berick |
Bmagic: i've run EG on one :) but it was stupid slow |
15:10 |
Bmagic |
I can imagine |
15:11 |
Bmagic |
It's the hardware more than ChromeOS's fault (I would bet) - if you bought the heavy machinery, with ChromeOS + linux subsystem + Evergreen, I bet it would work better? |
15:12 |
berick |
yeah. I had a cheapo chromebook |
15:12 |
Bmagic |
I'm excited about the 13 hour battery more than anything. No more lugging around a power adapter |
15:13 |
Bmagic |
of course, this all requires other infrastructure for me to do my work - another computer with lots of memory and CPU. The chromebook is just a looking glass |
15:23 |
Dyrcona |
I use a Pinebook with similar specs to do real work, other than running VMs on it. |
15:23 |
Dyrcona |
Just replace ChromeOS with Linux and you're good to go. |
15:29 |
|
mantis2 left #evergreen |
15:47 |
Dyrcona |
So far, I've got two that failed the hooks lookup to create the AutorenewNotify events. I'm running the daily events along with the 2-day courtesy notices because they often overlap. |
15:49 |
Dyrcona |
I think I'll set it up tomorrow so that 1 vm is configured with parallel reactors/collectors and the other not. Then, I'll put the production crontab in place to run these events and see what happens for a few days. |
16:00 |
Dyrcona |
Unrelated: I'm starting to rethink many things that I thought were a good idea at the time, like letting users specify per hold email and SMS notification addresses. We get a few tickets now and then from staff who don't understand how they get bounce emails for addresses not in the patron record. |
16:03 |
jeff |
per-hold email addresses aren't a thing that I'm aware of. |
16:04 |
jeff |
email notification on holds is a boolean per-hold, but not a text field. |
16:04 |
jeff |
you can have a different phone number per hold and a different text number (AND different carrier) per hold, though. |
16:05 |
jeff |
(we do none of that) |
16:09 |
Dyrcona |
The ticket is about the bounce from a text. We recently added real addresses as the senders of texts through email to SMS gateway, because it seems to help. |
16:10 |
Dyrcona |
And, jeff, you're right. I should have looked it up before spouting off in channel, but different numbers seemed like a good idea once. Now, I think the cons outweigh the pros. |
16:58 |
|
smayo joined #evergreen |
17:17 |
|
mmorgan left #evergreen |
19:01 |
|
pinesol joined #evergreen |
19:03 |
|
sharpsie joined #evergreen |
20:01 |
Bmagic |
something is up with the git repo. I'm getting connection timed out for OpenSRF and Evergreen (same server I understand) - I wonder if there is some kind of auto-blocking mechanism? I've tried from two different IP's with no luck. However, from a different location in the world, no problem. Just AWS's Virginia data center |
20:05 |
Bmagic |
it was working a few hours ago from AWS. It could be AWS |
20:30 |
|
jonadab joined #evergreen |