Time |
Nick |
Message |
06:00 |
pinesol |
News from qatests: Failed Installing Evergreen database pre-requisites <http://testing.evergreen-ils.org/~live//archive/2022-02/2022-02-15_04:00:02/test.39.html> |
08:00 |
|
collum joined #evergreen |
08:36 |
|
mantis1 joined #evergreen |
08:38 |
|
mmorgan joined #evergreen |
09:12 |
|
Dyrcona joined #evergreen |
09:19 |
Dyrcona |
If I get this message while loading MARC records "no mapping found for [0x80]": a) is that a warning or an error (i.e. does the record load anyway) and b) anyone had to fix these before and got any tips that might save me an hour or so of fumbling around? |
09:20 |
Dyrcona |
The records in the file claim to UTF-8, but we all know how that works out in reality. |
09:26 |
Dyrcona |
Well, I can say that those messages are not making it to MARC::Record->warnings. Because I would log those to my log file. |
09:28 |
Dyrcona |
Grr. Because I botched the shortname in 856$9, they're not showing up in my custom view.... |
09:29 |
Dyrcona |
That means I can't get a quick count. |
09:30 |
Dyrcona |
I can also say that the messages didn't trigger my error handler in the eval because there's no error log. |
09:31 |
Dyrcona |
So, I guess they were inserted, even with the junk characters. |
09:31 |
Dyrcona |
@blame Vendor records |
09:31 |
pinesol |
Dyrcona: Vendor records is probably integrated with systemd |
09:35 |
|
jvwoolf joined #evergreen |
09:39 |
* Dyrcona |
reloads the database to give it another go. |
09:49 |
Dyrcona |
Think I'll redirect stderr to a file. I'm not sure where these messages are coming from, either when I create the MARC::Record from the raw marc data, or when doing the insert. |
09:49 |
Dyrcona |
Most likely the former. |
09:58 |
JBoyer |
Bmagic_, I'm told the cert for the MOBIUS bugsquash server could use a refresh sometime. |
10:01 |
Bmagic_ |
oh! will do |
10:41 |
csharp_ |
jvwoolf: re: slow reports, have you done any query analysis to see what it's doing? (e.g. EXPLAIN/EXPLAIN ANALYZE) |
10:41 |
csharp_ |
note that EXPLAIN ANALYZE actually runs the query, so if it's timing out, that may not work |
10:55 |
|
rfrasur joined #evergreen |
10:59 |
Dyrcona |
So, back to my MARC::Charset thing from earlier. I basically copied the example from the MARC::Lint manpage and ran that on my MARC input file, and it doesn't report any character set issues, though it does complain about punctuation and the subfield 9 in the 856. |
11:06 |
Dyrcona |
Oh, interesting. It looks like MARC::File::USMARC->next returns raw MARC and doesn't create a MARC::Record object. |
11:09 |
Dyrcona |
On, never mind. It does. I was looking at _next(). |
11:09 |
csharp_ |
Dyrcona: I've seen that kind of thing when processing records with garbage characters too |
11:10 |
csharp_ |
pretty sure were able to find them in the text and replace/remove them |
11:10 |
Dyrcona |
csharp_: Yeah. It happens a lot. I may just jam the records in anyway. |
11:10 |
csharp_ |
abowling had to do some of that during our most recent library migration |
11:11 |
Dyrcona |
I'm still waiting on a database reload before I do another run. |
11:11 |
Dyrcona |
If it looks like single characters, I may just use sed or something like that on the file. |
11:12 |
csharp_ |
yeah - pretty sure we had a cataloger load the file in MARCEdit at some point and do a simple find/replace |
11:12 |
csharp_ |
that was back before I was more comfortable with regexes |
11:13 |
Dyrcona |
So, I don't see the bad character message from MARC::File::USMARC because it builds the MARC::Record differently. It adds the fields as it finds them, does some checking of its own. My program reads the raw MARC from the file and does MARC::Record->new_from_usmarc(). |
11:14 |
Dyrcona |
Some of these look like "smart" quotes. Probably Windows-1252, again. :( |
11:14 |
csharp_ |
yeppers |
11:14 |
Dyrcona |
I wish people would learn that you can't just copy and paste into a MARC record. |
11:14 |
csharp_ |
edit in Word, paste into MARC |
11:15 |
csharp_ |
WYSIWY(don't actually)G |
11:15 |
Dyrcona |
:) |
11:15 |
Dyrcona |
Everyone should just use unicode. It's 2022 all ready. |
11:16 |
* Dyrcona |
uses en_US.UTF-8 on all his accounts. |
11:18 |
csharp_ |
ǝpoɔıu∩ ƃuısn ǝq pןnoɥs ǝuoʎɹǝʌƎ |
11:18 |
Dyrcona |
:) |
11:19 |
berick |
heh |
11:25 |
mmorgan |
csharp_++ |
11:26 |
mmorgan |
though my head is now spinning a bit :) |
11:37 |
Dyrcona |
csharp_: Part of the point of what I'm testing is to avoid cataloging staff from using MarcEdit to edit the 856s....They say it's the most time consuming step. |
11:37 |
Dyrcona |
Since I've got a Perl program to do that part, I may just add code to clean up certain junk characters. |
11:38 |
csharp_ |
good plan |
11:39 |
Dyrcona |
Yes, I've assigned myself to look at Bmagic's record loader, but don't think I'll use it for this process, just yet: Bug1947898 |
11:39 |
Bmagic |
ty! |
11:39 |
Bmagic |
Dyrcona++ |
11:39 |
Dyrcona |
Missed a space: Lp 1947898 |
11:39 |
pinesol |
Launchpad bug 1947898 in Evergreen "Enhanced MARC importer script electronic_marc_import.pl" [Wishlist,Confirmed] https://launchpad.net/bugs/1947898 - Assigned to Jason Stephenson (jstephenson) |
11:54 |
Dyrcona |
The reload finished in time for me to start another test run before lunch, so here goes. |
11:55 |
|
jihpringle joined #evergreen |
12:33 |
Dyrcona |
Yeah, after looking up the Windows-1252 character set, I'm pretty sure that what I'm seeing are the Euro symbol (0x80), double dagger (0x87), and right single quotation mark (0x92). |
12:34 |
Dyrcona |
There may be others, but I remember those three codes from the output. |
12:36 |
Dyrcona |
Hmm. Must be an iconv module for Perl... |
12:38 |
jvwoolf |
csharp_: That (or something close to that) was going to be my next step until the conversation here yesterday. |
12:39 |
jvwoolf |
The thing is that these are templates that I built and nothing about them has changed, they just all of a sudden started timing out. |
12:40 |
Dyrcona |
jvwoolf: Done a reindex or a vacuum on the database lately? |
12:40 |
jvwoolf |
Yep, this weekend |
12:40 |
Bmagic |
JBoyer: bugsquash is back in business. Had to replace the machine |
12:40 |
Dyrcona |
Well, that's not it, then. :) |
12:41 |
jvwoolf |
Dyrcona: Now, that you mention it, it was just a vacuum |
12:41 |
Dyrcona |
And, on my character set issue, looks Encode has a from_to function. Think I'll give that a whirl. |
12:41 |
jvwoolf |
It's been a while since we reindexed |
12:42 |
Dyrcona |
Well, reindex can take a long time, though IIRC you can reindex just 1 table. It should almost never be necessary. |
12:43 |
Dyrcona |
Except when Pg or O/S upgrades break things, like character collation.... :) |
12:43 |
mmorgan |
jvwoolf: All of a sudden after a system change or upgrade? Or just all of a sudden with no changes. |
12:45 |
|
collum joined #evergreen |
12:46 |
jvwoolf |
mmorgan: I'm trying to piece that together myself. We had call number reports time out sometimes prior to upgrading, but prior to our last major upgrade they started to time out every time. |
12:46 |
csharp_ |
jvwoolf: I would definitely take a problem query, append it to EXPLAIN and see what it's trying to do - feel free to share via pastebin or https://explain.depesz.com/ |
12:47 |
jvwoolf |
We had both a minor upgrade and an OS upgrade within that time, so I'm trying to pin down if this started happening before or after either one of those. |
12:47 |
jvwoolf |
Minor Evergreen upgrade, that is |
12:47 |
Dyrcona |
jvwoolf: Did you upgrade to Ubuntu 20.04 perchance? That would definitely require a reindex in Pg. |
12:47 |
csharp_ |
jvwoolf: tables can grow past a point where the indexes vs. sequential scan decision is made correctly by the planner |
12:48 |
csharp_ |
but the planner doesn't "know" that so it works on bad statistics |
12:48 |
Dyrcona |
And, that's also likely it. jvwoolf said she had 30 million? deleted call number rows. |
12:48 |
jvwoolf |
Dyrcona: No, 18.04 |
12:48 |
csharp_ |
which should be fixable via ANALYZE but isn't always in my experience |
12:49 |
csharp_ |
jvwoolf: also, which version of PG? |
12:49 |
jvwoolf |
csharp_: 9.6 |
12:49 |
csharp_ |
ok, same here |
12:49 |
Dyrcona |
For Ubuntu 20.04, there's this: https://elephanttamer.net/?p=61#brokenindexes |
12:50 |
Dyrcona |
Also, upgrading to Pg 14, it's best to do a dump and restore, rather than in place upgrade, IIRC. |
12:50 |
csharp_ |
was about to say that a dump/restore isn't a bad idea occasionally anyway, though we've done in-place for the last several |
12:51 |
Dyrcona |
We're using Pg 10. Haven't noticed any real performance differences with Pg 9.6. |
12:51 |
Dyrcona |
At Pg 11, most things start getting faster, but others seem to get hit. |
12:52 |
Dyrcona |
We do a dump and restore at least whenever we get new hardware. |
12:52 |
Dyrcona |
I do them weekly to keep some test databases up to date. Anyway, getting off topic. |
12:54 |
Dyrcona |
Going back to my record load/character issues, the message isn't coming from MARC::Record, either. I've got a program to dump the 856s that uses the same code to read the records and it doesn't peep. |
12:56 |
Dyrcona |
I wonder if I can just convert the raw MARC data or if I need to do it field by field.... |
13:04 |
Dyrcona |
Now, it's just looking like some garbage and not necessarily Windows 1252... Probably a mix.... :( |
13:07 |
Dyrcona |
I think I know the source of the messages though. It looks like they're possibly coming from NFD() when the MARC is cleaned to go in the database. I can check that hypothesis real quick. |
13:09 |
Dyrcona |
And, no. That's not it, either. |
13:11 |
Dyrcona |
g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/local/share/perl/5.26.1/MARC/Charset.pm line 308, <GEN1> chunk 1752. |
13:18 |
Dyrcona |
Still don't know where that's coming from. I'm not using MARC::Charset, and doesn't look like the modules that I use do either. It must be the database throwing that at me.... |
13:57 |
JBoyer |
Dyrcona, MARC::XML and friends are used in the ingest trigger functions, so depending on where you're actually seeing that message output the database is a likely source. |
13:59 |
Dyrcona |
JBoyer: Yeah. It doesn't seem to be coming from anything running in my Perl code outside of the database, but I didn't think warnings from the database would just show up in my output. |
14:00 |
Dyrcona |
Oh, never mind. They will because of the DBI options. |
14:05 |
Dyrcona |
Hrm.... Doesn't show up in the postgresql logs, either..... Also, I'm using: {PrintError => 0, RaiseError => 1, AutoCommit => 1} |
14:05 |
Dyrcona |
Puzzling evidence. |
14:18 |
JBoyer |
I'm not sure what "stderr" translates to in a pg loaded process like those, which is where I assume MARC::Charset sends that. |
14:23 |
Dyrcona |
Yeah... |
14:26 |
* Dyrcona |
sings along with Sting: "Mais nous pouvons faire ce que nous voulons..." |
14:30 |
|
Keith_isl joined #evergreen |
14:57 |
|
collum joined #evergreen |
15:00 |
|
Keith__isl joined #evergreen |
15:34 |
Dyrcona |
Looking at a the output in a file, some appear to be a multibyte sequence, but not something that is valid in UTF-8: 0x0080 0x009D. |
15:40 |
Dyrcona |
Aight, so it is a mangled UTF-8 "smart quote." The sequence should probably be 3 bytes: \xe2\x80\x9d. |
15:41 |
Dyrcona |
Ah, this character appears before it: â |
15:43 |
Dyrcona |
Which is \xe2..... |
15:43 |
Dyrcona |
MARC::Charset is apparently not dealing with it correctly, or I need to update my Unicode support.... |
15:54 |
Dyrcona |
Ugh.... I somehow killed the program.... Not sure what I did. |
16:00 |
Dyrcona |
gmcharlt: It looks like MARC::Charset doen'st handle \xe2\x80\x9d correctly. |
16:02 |
Dyrcona |
I wonder what happens if I replace those in Perl with a "? |
16:11 |
Dyrcona |
Also, if the file is UTF-8, and that is a valid UTF-8 sequence, what's MARC::Charset got to do with it? |
16:12 |
Dyrcona |
I'ma just leave it alone and reload the file again tomorrow after a db reload. |
16:13 |
mmorgan |
Going home and coming back again is always a good approach :) |
16:14 |
Dyrcona |
Unfortunately, I am at home. |
16:14 |
Dyrcona |
And, I'll be tomorrow, too. :) |
16:15 |
mmorgan |
There's always turning it off and on again! |
16:15 |
Dyrcona |
I don't think this should be my problem. Things actually look good for a change with the data. I was blaming the vendor, but it looks like MARC::Charset and/or Evergreen's use of it is at fault. Unless this one record isn't set to UTF-8 or something when it should be. |
16:16 |
Dyrcona |
Also, I have little context for the warnings. I'd have to search the MARC for the offending text/codes. |
16:16 |
Dyrcona |
No record number, which tells me this isn't coming from my code because all errors and warnings are logged with the record number from the file. |
16:23 |
|
abowling joined #evergreen |
16:24 |
abowling |
after a 3.7 update, links in 856 are no longer appearing in the opac. the library has custom templates, but i diffed the relevant ones and it seems nothing has changed. any ideas on what i might be misssing? |
16:25 |
Dyrcona |
Is it the bootstrap OPAC? I think there's a bug for that. |
16:26 |
abowling |
*besides including an s too many? |
16:26 |
abowling |
Dyrcona: yes |
16:26 |
abowling |
checking launchpad now... |
16:26 |
Dyrcona |
abolwing: Lp 1950394 |
16:26 |
pinesol |
Launchpad bug 1950394 in Evergreen 3.7 "Electronic resource links can fail to display in Bootstrap OPAC" [Medium,New] https://launchpad.net/bugs/1950394 |
16:27 |
abowling |
Dyrcona++ |
16:28 |
Dyrcona |
I'm not sure if Elaine's comment regards the patch (it sounds like it), or if she's saying that she doesn't see the bug. |
16:29 |
Dyrcona |
I guess since I |
16:29 |
Dyrcona |
I'm having fun loading resources in a test database, I could check that one out, too... |
16:38 |
Dyrcona |
Alright, so, some of the records were not set to UTF-8 in the LDR, and I suspect that at least one of them was giving me this warning/error. |
16:39 |
abowling |
Dyrcona: thanks again! that fixed it. |
16:40 |
Dyrcona |
abowling: Cool. If you could update the Lp bug, that would be great. |
16:41 |
abowling |
update re: ehardy's comment, or just confirm that it works? |
16:41 |
abowling |
the latter of which i will do, gladly |
16:44 |
Dyrcona |
Yeah, just confirm it works. I'll take a look at it tomorrow. |
16:47 |
|
jvwoolf left #evergreen |
16:47 |
abowling |
Dyrcona: will do |
17:04 |
|
mmorgan left #evergreen |
18:00 |
pinesol |
News from qatests: Failed Installing Angular web client <http://testing.evergreen-ils.org/~live//archive/2022-02/2022-02-15_16:00:02/test.29.html> |
18:23 |
|
jeff joined #evergreen |
18:59 |
|
jonadab joined #evergreen |