Time |
Nick |
Message |
07:14 |
|
rjackson_isl_hom joined #evergreen |
07:20 |
|
collum joined #evergreen |
07:54 |
|
BDorsey joined #evergreen |
08:38 |
|
mmorgan joined #evergreen |
08:47 |
|
mantis1 joined #evergreen |
09:10 |
|
Dyrcona joined #evergreen |
09:40 |
* Dyrcona |
wishes grep had an option to change the "newline" character. |
09:41 |
Dyrcona |
Or the line/record separator. I guess that's partly why awk exists. |
09:48 |
Dyrcona |
Turns out that they aren't copyright characters causing my issues. Looks like they're supposed to n with tilde and e with acute accent, at least the majority seem to be those two. |
09:50 |
Dyrcona |
However, they show up in my output as a combination of two other characters. |
09:50 |
Dyrcona |
At least when I grep the rejected records converted to xml. |
09:55 |
Dyrcona |
Yeahp.... \xC3\xA9 is rendered as A with circonflex, copyright symbol, when it should be e with acute. |
09:56 |
Dyrcona |
This is not a problem with the input as far as I can tell, and may not be an issue until it hits the database. |
10:00 |
Dyrcona |
I'm not finding any actually invalid Unicode sequences in the files. |
10:03 |
Dyrcona |
Lowercase n with tilde is showing up in my console as the two single byte characters represented by its component bytes. Something isn't handling multibyte UTF8 properly somewhere. |
10:04 |
Dyrcona |
So, It's probably the load program that I had so much trouble getting Unicode to work with in the past few months. |
10:09 |
|
jvwoolf joined #evergreen |
10:15 |
Dyrcona |
Defintitely coming from MARC::Record->new_from_usmarc() in the load program, regardless of whether or not I set the file handle to UTF-8 or not. |
10:22 |
Dyrcona |
MARC::Charset is up to date (1.35). |
10:28 |
Dyrcona |
MARC::Charset shouldn't be involved and doesn't look like it is. The error is not coming directly from MARC::File::USMARC::decode() either. |
10:29 |
Dyrcona |
I tried installing the latest Encode.pm and no difference. |
10:30 |
Dyrcona |
Oh, I may stand corrected: https://metacpan.org/module/MARC::File::USMARC/source#L172 |
10:31 |
Dyrcona |
Bingo! Source of my error, and it is Encode.pm or the edition of Unicode available to Perl on my system. |
10:32 |
Dyrcona |
https://metacpan.org/module/MARC::File::Encode/source#L35 |
10:39 |
Dyrcona |
Or, maybe it's just "The Unicode Bug..." :( |
10:45 |
Dyrcona |
Looks like I might be able to avoid this by converting the records to MARCXML first. |
10:53 |
Dyrcona |
Outside of a MARC context, I can't make decode crash on those characters. |
11:16 |
|
BDorsey joined #evergreen |
11:20 |
Dyrcona |
Well, it blows up elsewhere using MARC::File::XML: 2 :129: parser error : Input is not proper UTF-8, indicate encoding ! |
11:20 |
Dyrcona |
Bytes: 0xA9 0x22 0x20 0x69 |
11:21 |
|
jihpringle joined #evergreen |
11:25 |
Dyrcona |
@monologue |
11:25 |
pinesol |
Dyrcona: Your current monologue is at least 23 lines long. |
11:39 |
Dyrcona |
Ha! I want a goto for a valid reason. I want a label outside my main loop that I can branch to when there is an error. Otherwise, I don't want to interfere with the flow. |
11:42 |
Dyrcona |
Maybe I just need to change my loop to a do while. |
11:46 |
Dyrcona |
Well, MARC::File::XML just makes it worse. More records get spit out that way. |
11:52 |
Dyrcona |
I think the increase in errors comes from the records with invalid lengths and indicators getting mangled when converted to XML. |
12:31 |
csharp_ |
Dyrcona: fwiw, I usually learn from your monologues :-) |
12:31 |
csharp_ |
Dyrcona++ |
12:36 |
mmorgan |
Dyrcona++ |
12:37 |
Dyrcona |
Thanks! |
12:41 |
Dyrcona |
I used yaz-marcdump to convert the binary MARC to XML, and that was after I had preprocessed the file file from the vendor. So it could be that my preprocessor program is writing junk, but I can't find bad UTF-8 in it. |
12:42 |
Dyrcona |
It just looks like when going through the MARC modules, Encode suddenly doesn't like otherwise valid \XC2 and \xC3 sequences. |
12:56 |
Dyrcona |
Very interesting: If I use a program to split the binary file into records using \x1E\x1D as the input record separator and then rune the decode('UTF-8', $raw_record), I get no errors, so the issue is definitely coming from the MARC modules somehow. |
12:57 |
Dyrcona |
That's using the preprocessed file. I'll see what happens with the files directly from the vendor. |
12:58 |
Dyrcona |
Ditto... Zero errors. |
13:06 |
|
jihpringle joined #evergreen |
13:13 |
csharp_ |
Dyrcona: I know you've been working on this for days and have probably ruled this out, but I've seen stupid stuff where the \XC2 literal characters were themselves mis-encoded somehow |
13:14 |
csharp_ |
as in literally "\XC2" where one of those was some unicode character that escaped notice |
13:16 |
Dyrcona |
csharp_: I originally thought that is what the problem was, or rather a MARC-8 \xC2 that got into the UTF-8 data. \xC2 in MARC-8 is the P with a circle sound copyright symbol, and \xC3 is the regular copyright symbol. |
13:17 |
Dyrcona |
However, the input actually has valid UTF-8 sequences and using Encode::decode on the raw data on a record by record basis does not output any errors. The errors come when MARC::File::USMARC::decode() is run on a record. |
13:18 |
csharp_ |
ah |
13:19 |
Dyrcona |
Hmm... I have another idea..... |
13:23 |
Dyrcona |
I tried MARC::File::USMARC::decode() on the files from the vendor and there are no errors. When I run it on the preprocessed file, I get errors. So, my preprocessor must be doing something wrong, even though decode UTF-8 like the raw input.... |
13:25 |
Dyrcona |
If I have to set the output stream to UTF-8, I'll be upset. I spent several days fiddling with that before and I swear that I got it right..... |
13:27 |
Dyrcona |
So, I'm already setting binmode on the output to :utf8. Maybe I should do :bytes or :raw? |
13:32 |
Dyrcona |
I guess :raw isn't a thing.... |
13:33 |
Dyrcona |
We may have a winner. Using :bytes produced a smaller output file. |
13:33 |
Dyrcona |
I get the same number of records. |
13:35 |
Dyrcona |
Huzzah! USMARC::decode and Encode::decode both like all of the records, now! |
13:35 |
Dyrcona |
csharp_++ mmorgan++ #evergreen++ For putting up with me. |
13:38 |
Dyrcona |
csharp_++ again for suggesting an encoding issue. Looks like my preprocessor was double encoding some characters. |
14:03 |
csharp_ |
Dyrcona: oh wow |
14:11 |
Dyrcona |
This line in the perlunicode documentation is misleading: Use the ":encoding(...)" layer to read from and write to filehandles using the specified encoding. |
14:11 |
Dyrcona |
I suspect it only applies if you're not manually decoding the data, which the MARC code does. |
14:32 |
jeffdavis |
bug 1979345 adds a new permission to govern the hold pull list; is that an OK change to include in a point release, assuming there's a release note? |
14:32 |
pinesol |
Launchpad bug 1979345 in Evergreen "Angular Holds Pull List Doesn't Scope" [Medium,Confirmed] https://launchpad.net/bugs/1979345 |
14:34 |
csharp_ |
I would probably wait until the next release |
14:34 |
csharp_ |
unless the Ang list is so broken that we need it more urgently |
14:34 |
Dyrcona |
I was typing something along the lines of what csharp_ said. |
14:46 |
jeffdavis |
I wouldn't say it's broken, it just has no access controls. 3.8 added the ability to view any library's pull list (previously you could only view it for your workstation location). |
14:46 |
jeffdavis |
We're going live with the new perm when we upgrade to 3.9 this weekend, I'll reconcile myself to having to renumber our permissions at some point. :) |
14:48 |
|
rfrasur joined #evergreen |
14:59 |
Dyrcona |
We've had to do things like that after upgrades before because we've backported things from future releases. |
15:00 |
Dyrcona |
On the subject of MARC and encoding, just to complicate things, if you're pulling records from Evergreen via DBI as MARCXML and then converting them to USMARC to write to a file. You have to set the output stream to utf8 encoding, or you get errors reading the output file. |
15:02 |
Dyrcona |
It's always fun to relearn things like this every 3 or so years. :) |
15:10 |
Dyrcona |
jeffdavis: One thing that I often do is wait for the code to make it into master, then I cherry-pick the commits into my local branch so I have the correct id numbers and db upgrade codes. This makes generating the db upgrade script for future upgrades easier. |
15:11 |
jeffdavis |
hm, and I guess there's nothing preventing that commit from going into master even if it doesn't get backported to 3.8/3.9 |
15:12 |
Dyrcona |
If I add it before that, I'll go back and cherry-pick the commits that changed the upgrade script name and assigned the upgrade number for the same reason. |
15:13 |
Dyrcona |
Yes, that's true about going into master. Also, keep in mind you've only heard from two of us here. I'm not totally opposed to backporting permissions, particularly if the thing is broken without it or if it was an oversight in a new feature. |
15:13 |
Dyrcona |
If the permission was something that someone thought would be nice to have later, then its more of a feature than a bug fix to my mind. |
15:49 |
* mmorgan |
reads up |
15:51 |
mmorgan |
jeffdavis: Not an answer to your question, but the Holds Shelf has always allowed viewing the Shelves of other org units. |
16:03 |
jeffdavis |
I think we ultimately want a perm to control that too, I'll have to check. |
16:07 |
|
jvwoolf left #evergreen |
16:14 |
jihpringle |
yes, if we haven't already we're planning on submitting the hold shelf scope as a bug too |
16:14 |
jihpringle |
it cause patron privacy issues for us |
16:23 |
mmorgan |
jihpringle: Understood. Patron privacy is a priority for us also, but our libraries share patrons. |
16:25 |
mmorgan |
Do you limit access to patrons in other parts of Evergreen with permissions? Like in Patron search? |
16:27 |
jihpringle |
mmorgan: we do limit in other places, but a lot of it is based on opt in boundaries through the library settings |
16:28 |
mmorgan |
Ah. Ok. |
16:30 |
* mmorgan |
was wondering about someting like the depth of the VIEW_USER permission |
16:38 |
jihpringle |
we have that perm set pretty high up our org tree, I think so that staff are able to see the patrons who've opted in from other libraries |
16:39 |
jihpringle |
we want Library A to be able to see patron X from Library B that's opted into Library A, but we don't want Library A to be able to see the entire hold shelf or pull list for Library B |
16:45 |
jeffdavis |
we do also use VIEW_USER depth to restrict access in some cases - for example, public library staff can't view users at post-secondary libraries |
16:46 |
mmorgan |
jihpringle: Makes sense. A little too "In your face". We strongly discourage showing patron names on the pull list at all. We have default columns set for the pull list that exclude patron information. |
16:46 |
mmorgan |
Can't prevent people from adding it though. |
16:47 |
jihpringle |
ya, we try and discourage people from including patron info on the pull list but as you said we can't prevent people from adding it |
16:52 |
|
mmorgan1 joined #evergreen |
17:15 |
|
mmorgan1 left #evergreen |
17:50 |
|
degraafk_ joined #evergreen |
17:51 |
|
troy_ joined #evergreen |
22:55 |
|
jeffdavis_ joined #evergreen |
22:56 |
|
jmurray_isl joined #evergreen |
23:04 |
|
akilsdonk joined #evergreen |