Time |
Nick |
Message |
06:00 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
07:30 |
|
rjackson_isl_hom joined #evergreen |
07:47 |
|
collum joined #evergreen |
08:36 |
|
mantis joined #evergreen |
08:36 |
|
mmorgan joined #evergreen |
08:53 |
|
rfrasur joined #evergreen |
09:34 |
|
jvwoolf joined #evergreen |
09:36 |
|
Dyrcona joined #evergreen |
09:50 |
Dyrcona |
I'm getting "DATABASE UPDATE FAILED" for some attempted vandelay imports, but PostgreSQL is not logging any errors? Anyone have any suggestions? |
09:55 |
berick |
Dyrcona: i'd check for errors in opensrf logs. sometimes cstore/pcrud will error before they actually issue any queries |
10:15 |
Dyrcona |
berick, i was about to do that but got sidetracked by somethings else. |
10:20 |
Dyrcona |
Trouble with that is having action triggers running produces a lot of "noise." |
10:31 |
Dyrcona |
marc_stream_importer stopped working on our training server after we upgraded to 3.7.2, and I'm trying to figure out why. It's more difficult when the hold targeter is running and it adds about 10MB of log data in a minute. |
10:40 |
Dyrcona |
So, after stopping everything, dropping the crontab, restarting services, starting the stream importer, and trying again, still nothing. The queued bib record entry is there with a database update failed, but no error in the postgresql log and none in the syslog, or /openils/var/log/* files. |
10:41 |
berick |
Dyrcona: nothing in vandelay.queued_bib_record.error_detail (or .import_error) |
10:41 |
berick |
? |
10:43 |
|
rjackson_isl_hom joined #evergreen |
10:44 |
Dyrcona |
I was looking at vandelay.queued_bib_record.error_detail and seeing this "DATABASE_UPDATE_FAILED : The attempt to write to the DB failed." That's why I asked. |
10:45 |
berick |
ah, we've come full circle |
10:45 |
Dyrcona |
However, it looks like the restart of services may have done the trick. The last two records I sent don't have error detail. I'm going to add more fields to my select. |
10:50 |
Dyrcona |
Ah, yes... Bilbio Record-Entry, Frodo's librarian cousin. |
10:51 |
Dyrcona |
So, it didn't automatically create a bib record or appear to update any. I was kind of expecting that, but I don't really know what to expect. I'm sure they'll tell me it is still broken. |
10:55 |
Dyrcona |
Nope. import_time is null. Hasn't been a successful import since 2021-11-30 11:15:15.851648-05. So, it worked once a few days ago, but not since, and I have not touched it until this morning. |
10:57 |
Dyrcona |
This Evergreen 3.7.2 on Pg 10, though it worked on Evergreen 3.5.3 with Pg 10. |
10:57 |
Dyrcona |
IDK, maybe I'll try a reinstall later. Sometimes, rm -rf /openils/* and reinstalling everything helps. |
11:00 |
Dyrcona |
Still curious that I get a DATABASE_UPDATE_FAILED and no error in the postresql-10-main.log. |
11:01 |
Dyrcona |
It logged my bilbio.record_entry typo with a relation does not exist. |
11:10 |
Dyrcona |
I wonder if my patch for bib match quality ratio is interfering, but our cataloger specified a ratio of 1.0 and it looked dead simple to add it. |
11:11 |
Dyrcona |
I think the patch was there on 3.5.3. It was in the branch, but doesn't mean the importer with that feature got installed. |
11:25 |
Dyrcona |
Think I'll back that code out and try it that way. |
11:25 |
Dyrcona |
Well, I'll edit it in situ on training. |
11:32 |
Dyrcona |
Nah, didn't help. :( |
11:33 |
Dyrcona |
Another mysterious database error. |
11:34 |
Dyrcona |
I ran autogen during all of this at one point, too. I'm pretty sure the IDL is up to date. |
11:40 |
Dyrcona |
Yeah, the installed IDLs are the same as in the branch.... |
12:16 |
|
jihpringle joined #evergreen |
12:56 |
Dyrcona |
berick: I'm starting to think that this could be a case of something unexpected happened, and we're falling back to reporting a database error happened on the presumption that is the problem. IIRC, the XUL client used to do that in some of its switch/case type error checking things. The default/else was to throw a network error happened. |
13:10 |
Dyrcona |
Oh ho! I missed this earlier: Dec 7 10:35:20 training open-ils.vandelay: [ERR :21108:CStoreEditor.pm:155:1638891159210936] editor[1|1873642] error rolling back database transaction |
13:12 |
Dyrcona |
2021-12-06 10:35:23.668 EST [19925] evergreenevergreen WARNING: there is no transaction in progress |
13:14 |
Dyrcona |
The fact that they're 3 seconds apart kind of bugs me. I ignored those before in the PostgreSQL logs because they're warnings, not errors in PG. |
13:29 |
Dyrcona |
I wonder if it is failing to update the queued_bib_record table during the import process. |
14:12 |
|
jihpringle joined #evergreen |
15:05 |
Dyrcona |
So, I gave up on the previous issue for a bit, but while watching command output scroll by on a different server, I started wondering if maybe the "did you mean" code is slowing down the creation and updates of bibs to the point where the Vandelay code ends up timing out. |
15:05 |
alynn26 |
Dyrcona, that had that issue with it in SITKA |
15:06 |
jeffdavis |
yeah there's a bug for it |
15:08 |
jeffdavis |
bug 1947173 - there's a fix but it's not in 3.7.2 |
15:08 |
pinesol |
Launchpad bug 1947173 in Evergreen 3.7 "Did You Mean Symspell dictionary updates can significant slow record ingest" [High,Fix committed] https://launchpad.net/bugs/1947173 |
15:08 |
Dyrcona |
alynn26++ jeffdavis++ |
15:09 |
Dyrcona |
At one point today, I was thinking there was a bug that I might want to look at. |
15:09 |
Dyrcona |
I'll throw that patch on our training server and see what happens. |
15:13 |
mmorgan |
So back here, I asked about white screen issues: http://irc.evergreen-ils.org/evergreen/2021-11-18#i_495687 |
15:14 |
mmorgan |
We've had scattered reports of the same thing from many of our libraries. Some workstations, not all. |
15:15 |
alynn26 |
we have traced most of our white screen issues down to Hatch and Chrome. We do not see them with just Chrome or hatch and firefox. |
15:16 |
mmorgan |
We've had some success disabling all plugins and add ons that are not used for Evergreen, but we're still not sure what the exact issue is. This morning I observed the phenomenon on my own workstation. |
15:17 |
mmorgan |
It does seem to have something to do with Hatch. My Chrome has the Hatch Chrome plugin installed, but Hatch is not enabled in Evergreen. |
15:18 |
mmorgan |
This doc shows screenshots of what the screens look like, and the console: https://docs.google.com/document/d/1d5dC1LhgZgPbXeFDWcmbCjRCotFl3fZSC5-GlJVyBic/edit?usp=sharing |
15:18 |
jeff |
do you have steps that you can reliably reproduce? |
15:18 |
mmorgan |
When we see the whilte screen, this is the last line in the console: |
15:18 |
mmorgan |
sending to Hatch: {"key":"eg.workstation.all","action":"get","msgid":1} |
15:20 |
mmorgan |
I didn't want to destroy my test environment, but I *think* this is happening only on angularjs screens. |
15:20 |
mmorgan |
Any thoughts? |
15:20 |
jeff |
are there any clues in hatch.log? |
15:20 |
alynn26 |
That is what we are seeing. We are also seeing it with some of the Angular screens. but not much. |
15:22 |
mmorgan |
dumb question time: Where will I find hatch.log? |
15:22 |
csharp_ |
mmorgan: you can also look at the "background page" for the Hatch extension |
15:22 |
jeff |
on Windows, it will likely be in %userprofile%\.evergreen |
15:23 |
jeff |
on masOS/Linux, likely in ~/.evergreen |
15:23 |
jeff |
you can usually enter %userprofile% in Start -> Run |
15:23 |
jeff |
unless you've changed the location in logging.properties |
15:24 |
jeff |
or unless your file permissions on logging.properties prevent it from being read, in which case I forget what the default is. might be "no log" |
15:26 |
berick |
cd 'C:\Program Files(x86)\Hatch' ; \.hatch.bat test |
15:27 |
berick |
can also be useful |
15:27 |
mmorgan |
Hmm. Not seeing a .evergreen directory for my userprofile |
15:28 |
berick |
correction: .\hatch.bat |
15:29 |
mmorgan |
csharp_: I see the background page for the Hatch extension, but don't see anything that looks like an error there. |
15:32 |
JBoyer |
mmorgan, the Hatch logs are in C:\ProgramData\Hatch\hatch.log. ProgramData may be hidden since it's not for touching. |
15:33 |
mmorgan |
JBoyer: Ok, thanks! hatch.bat also just told me that :) |
15:34 |
jeff |
ah. THAT must be the default when the native message host can't read logging.config. :-) |
15:35 |
JBoyer |
That's the default period unless you change it, since "you" don't have any hatch settings, only the system: https://git.evergreen-ils.org/?p=Hatch.git;a=commit;h=0c0265fa967f7038818639a92b8d099401653a18 |
15:36 |
jeff |
ah. lies, damned lies, and comments in config files. |
15:39 |
mmorgan |
Went to Checkin, see the white screen. hatch.log shows: |
15:39 |
mmorgan |
2021-12-07 15:35:31 FINEST MessageWriter sending: {"clientid":155,"msgid":1,"message":"OK","content": |
15:39 |
mmorgan |
followed by all the workstations I have registered (of which there are a lot) |
15:40 |
mmorgan |
"status":200} |
15:41 |
jeff |
and no corresponding "Hatch responded to message ID 1" in the browser console? |
15:42 |
mmorgan |
When I refresh the screen, it loads. I refresh my view of the log, and see exactly the same entry after the screen successfully loads. |
15:43 |
mmorgan |
jeff: I think that is true, but will confirm. |
15:43 |
jeff |
what version of Evergreen and Hatch are you running? |
15:44 |
jeff |
guessing Hatch 0.3.2, but not wanting to assume. |
15:45 |
mmorgan |
jeff: Confirmed: no corresponding "Hatch responded to message ID 1" in the browser console. |
15:47 |
mmorgan |
I refresh, the screen loads, and I see "Hatch responded to message ID 1" in the browser console. |
15:48 |
mmorgan |
jeff: Hatch 0.3.2 |
15:50 |
mmorgan |
I logged into the client on 3 different servers this morning, production, test, and dev. I am no longer seeing the problem in production, but am still seeing it in test and dev. |
15:51 |
mmorgan |
Neglected to say that this was all in the same Chrome window, different tabs, and all showed the problem. |
15:53 |
berick |
i wonder if it's getting hung up on something else.. |
15:53 |
berick |
mmorgan: go to chrome://inspect/#workers and click on 'inspect' for https://HOSTNAME/js/ui/default/staff/offline-db-worker.js |
15:54 |
berick |
see if anything in there complains when the page fails to load |
15:57 |
JBoyer |
mmorgan, what's "a lot" of workstation names? I wonder if there may be some kind of data size limit for native extension communication. |
15:57 |
mmorgan |
Hmm. When I go to chrome://inspect/#workers, then navigate to the checkout page that doesn't load, I DO NOT SEE that server listed. I refresh the checkout page, it loads, and now I see that server listed. |
15:58 |
JBoyer |
or it's that. :) |
15:58 |
jeff |
mmorgan: what version of Evergreen? did you mention already? |
15:58 |
mmorgan |
3 servers, 3 versions :) |
15:59 |
berick |
might have to refresh the /inspect/ page after opening the checkout tab |
16:00 |
mmorgan |
3.6.4, 3.7.2 and another flavor of 3.6, I believe. |
16:01 |
mmorgan |
berick: the inspect page seems to refresh itself. The listed servers blink on and off depending on whether the screens load or not. |
16:01 |
Dyrcona |
If I apply that patch for the symspell stuff on a system where it has already been ingested once before, do I have to do it again? |
16:02 |
berick |
mmorgan: hm, it's inconsistent for me. just tested again and had to refresh. *shrug* |
16:03 |
jeff |
i wonder what you'd see if you go to chrome://extensions, enable Developer mode, and then select the "background page" link for Hatch Native Messenger. |
16:04 |
mmorgan |
berick: tried refreshing several times when on an incomplete screen. The server doesn't show under Shared workers. |
16:05 |
jeff |
I get the behavior you've described 100% of the time when I have the hatch browser extension installed but no native messaging host (no Java bits). 100% as in it's not intermittent / fixed with a reload, it just fails every time. |
16:06 |
berick |
mmorgan: ok, i see. my next step would be: get to a successfull page load -> open offline inspect -> refresh and hope it breaks -> then see if the offline inspector shows anything interesting |
16:06 |
berick |
refresh the EG page, i mean, and hope it breaks |
16:06 |
jeff |
so I wonder if when inspecting the extension you'll have any Hatch disconnected / Reconnecting to hatch / Connecting to native messaging host... type logs. |
16:07 |
Dyrcona |
Looks like I do.... |
16:08 |
jeff |
I just wonder if (given a not-intentionally-broken environment like I just created) the extension console logs are different when you see / don't see the symptoms. |
16:09 |
jeff |
I also suspect that this is going to be annoying to us, because I'm fairly certain that we have chrome profiles where Hatch (the extension) is present but the native messaging host is not. |
16:09 |
jeff |
Hrm. |
16:09 |
mmorgan |
jeff: I've seen that behavior too, but this is definitely different. |
16:10 |
* jeff |
nods |
16:10 |
jeff |
I'm not saying that your issue is caused by the native messaging host not being present -- it wouldn't ever work then. I just wonder if the extension's console log messages give you any clues. |
16:10 |
berick |
jeff: indeed i have to disable the hatch extenion when i hop on my mac. we probably need an escape *cough* hatch |
16:10 |
jeff |
berick: *groan* but yes. |
16:11 |
berick |
if nothing else, a timeout on the first hatch query -- it should be fast -- before marking in inaccessible |
16:12 |
berick |
hm, no, it has to be intentional |
16:12 |
jeff |
don't set isAvailable unless you can ping the native messaging host itself through the extension? |
16:13 |
berick |
... or fail very loudly so you know you don't have access to your Hatch if you are accustomed to having access |
16:13 |
berick |
jeff: yeah pretty much |
16:13 |
mmorgan |
berick: Ok, got a successful page load on checkout, opened Inspect, navigated to checkin, which broke. Then got a pop-up: |
16:14 |
mmorgan |
DevTools was disconnected from the page. Once page is reloaded, DevTools will automatically reconnect. |
16:15 |
mmorgan |
I reload, the screen loads, and the console is now logging again. |
16:15 |
berick |
mmorgan: ok. that popup is normal for a reload, so no surprise there. if it's the last thing you see there, then the worker probably isn't the problem |
16:15 |
berick |
ugh |
16:16 |
berick |
i've never seen this type of behavior where Hatch says it's resonding but it's not |
16:17 |
berick |
which makes me think something else might be gumming up the works before the browser client can check for hatch responses |
16:17 |
jeff |
mmorgan: are you able to share a screenshot of the extension console log contents when you have the problem / don't have the problem? |
16:18 |
mmorgan |
jeff: Just a sec. |
16:18 |
* mmorgan |
has waaay too many windows open now :) |
16:18 |
jeff |
:-) |
16:22 |
mmorgan |
jeff: I added the screenshot to the top of https://docs.google.com/document/d/1d5dC1LhgZgPbXeFDWcmbCjRCotFl3fZSC5-GlJVyBic/edit?usp=sharing |
16:23 |
mmorgan |
All the entries shown on the screenshot are the result of the successful load. The popup happens when I go to a screen that does not load. |
16:23 |
jeff |
Hrm. that doesn't look like the console for the extension. |
16:24 |
berick |
yeah that's shared worker console |
16:24 |
jeff |
did you get there by going to chrome://extensions, ensuring that Developer Mode was on, and then selecting "background page" for the Hatch extension? |
16:24 |
mmorgan |
Oh. Right. Too many windows :) Just a sec. |
16:30 |
mmorgan |
Sorry, called away, just another sec. |
16:30 |
alynn26 |
I am seeing it constantly with Evergreen 3.7.1 /Chrome Version 95.0.4638.69/ Hatch 0.3.2 |
16:32 |
alynn26 |
This is all I see in the Extension Console. new port connected with id 327 |
16:32 |
alynn26 |
extension.js:113 Removing port 327 on tab disconnect |
16:32 |
alynn26 |
extension.js:92 Received message from browser on port 327 |
16:33 |
mmorgan |
jeff: Just replaced the image in the doc. The 3 highlighted rows are the result of the unsuccessful load, the subsequent 3 are the successful load. |
16:34 |
|
jvwoolf left #evergreen |
16:35 |
jeff |
hah! |
16:37 |
jeff |
interesting. I wonder if the onDisconnect listener *inside* the onConnect listener is removing our tabid from browserPorts, thus breaking the attempt to postMessage in onNativeMessage above. |
16:38 |
jeff |
(all in extension.js) |
16:38 |
jeff |
just a guess based on the ordering of the messages in your console output. |
16:44 |
berick |
jeff: it sure looks that way. which raises the question, why is the extension getting a disconnect event.. |
16:44 |
berick |
confirmed my extension logs look more saner.. new -> receive -> remove |
16:45 |
jeff |
I think it's the disconnect event from the previous tab reload. |
16:46 |
jeff |
the disconnect event from the tab before-reload arriving / being handled "after" the post-reload connect message -- maybel. |
16:46 |
jeff |
s/maybel/maybe/ |
16:47 |
alynn26 |
I blame Google Chrome. |
16:48 |
berick |
enabling timestamps in the console log could provide some clarity |
16:48 |
berick |
jeff: unable to reproduce, but i see what you are saying. |
16:49 |
berick |
that would be odd, though, if the browser disconnected a tab after it reconnected the same tab |
16:49 |
mmorgan |
So I have just "fixed" one of my sessions by clicking around on other screens. Went to Action triggers, opened an event def, and then cancelled. Now my screens are loading fine. |
16:50 |
mmorgan |
In the extension background page, the order is remove -> new -> received for every page load |
16:51 |
berick |
mmorgan: i assume you've tried a full "clear site data" on the domain? |
16:51 |
mmorgan |
alynn26: I'm with you! |
16:52 |
mmorgan |
berick: No, is that in dev tools? |
16:52 |
berick |
mmorgan: yeah, dev tools -> Application tab -> Storage -> Clear Site Data button |
16:55 |
mmorgan |
berick: Cleared site data. All pages are still loading fine. |
16:58 |
* mmorgan |
thinks this is not going to get solved today. |
16:59 |
mmorgan |
Hope when I boot up my computer tomorrow, I'll have another opportunity to poke. |
16:59 |
mmorgan |
jeff++ |
16:59 |
mmorgan |
berick++ |
17:00 |
mmorgan |
alynn26++ |
17:00 |
mmorgan |
JBoyer++ |
17:05 |
|
mmorgan left #evergreen |
18:00 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |