Time |
Nick |
Message |
06:00 |
pinesol |
News from qatests: Failed Running pgTAP tests <http://testing.evergreen-ils.org/~live//archive/2022-03/2022-03-01_04:00:02/test.42.html> |
07:13 |
|
rjackson_isl_hom joined #evergreen |
07:16 |
|
collum joined #evergreen |
08:00 |
|
mantis joined #evergreen |
08:01 |
JBoyer |
^ A test about spaces around values was *also* testing whether or not URIs were deleted when records are changed. We don't do that anymore, and now the test is updated. |
08:23 |
pinesol |
News from commits: Update Test for LP1722827 After LP1482757 <https://git.evergreen-ils.org/?p=Evergreen.git;a=commitdiff;h=e9c6750312c9c1ba001119e918ac84f4bcd017a6> |
08:33 |
|
mmorgan joined #evergreen |
08:40 |
csharp_ |
JBoyer++ |
08:40 |
|
jvwoolf joined #evergreen |
08:41 |
mmorgan |
JBoyer++ |
08:43 |
|
Dyrcona joined #evergreen |
08:47 |
|
JBoyer joined #evergreen |
09:00 |
|
Keith-isl joined #evergreen |
09:01 |
|
rfrasur joined #evergreen |
09:09 |
|
mmorgan1 joined #evergreen |
09:24 |
* Dyrcona |
mumbles "smart quotes...." |
09:27 |
Dyrcona |
Also, just plain junk in these records. |
09:33 |
Dyrcona |
I wonder if we're using an outdated Unicode standard?--That "we" is meant to be vague, i.e. not necessarily Evergreen. |
09:39 |
Dyrcona |
So, it looks like the preprocessing script does something to the records. Maybe I need to tell MARC::Record not to mangle the characters, somwehow? |
09:56 |
Dyrcona |
Or, maybe it doesn't.... Comparing dumps of the processed versus raw records, the relevant bits of the busted records look the same. |
10:08 |
Dyrcona |
I wonder if converting the records to marcxml will make a difference? |
10:10 |
Dyrcona |
So, do I convert them with yaz-mardump or with MARC::File::XML? |
10:15 |
Dyrcona |
Right, so when I use yaz-marcdump to convert the input records to marcxml, my editor shows the characters correctly. For some reason, I think my editor is treating the dumps as latin-1, even when I tell it they are UTF-8. What I suspect is MARC::Record and friends are mangling the characters because I'm working with binary MARC. |
10:16 |
Dyrcona |
I have little proof, other than what I see in the files through my editor, and that the records get mangled by my preprocessor Perl program. |
10:17 |
Dyrcona |
I will adapt my preprocessor and loader to work with marcxml and see what happens. |
10:18 |
Dyrcona |
BTW, the input records say they are UTF-8 in the leader. |
10:18 |
* Dyrcona |
quacks. |
10:19 |
Dyrcona |
Hmm... Should I use MARC::File::XML on these records, or should I use LibXML? I can do what I want with either..... |
10:23 |
* Dyrcona |
should write a MARC mode for Emacs. It couldn't be that hard... :) |
10:26 |
Dyrcona |
@monologue |
10:26 |
pinesol |
Dyrcona: Your current monologue is at least 15 lines long. |
10:28 |
Dyrcona |
So, yeah, the MARC::Record code that I'm using is mangling the characters. |
10:30 |
Dyrcona |
When I tell Emacs to use UTF-8 with one of the files, I get this: "...encountered characters it couldn’t encode..." followed by a list of characters that won't paste into my IRC client. |
10:30 |
Dyrcona |
The error message is much more detailed. |
10:31 |
Dyrcona |
I open the original MARC file and the mangled characters show up correctly in Emacs. |
10:31 |
Dyrcona |
Proof! |
10:55 |
|
rjackson_isl_hom joined #evergreen |
11:01 |
Dyrcona |
I wonder if the problem is how I'm reading the binary files. I open it via IO::File with the record separator set to \x1e\x1d because records with smart quotes would break with MARC::Batch or MARC::File::USMARC. After I get the raw MARC, I feed that to MARC::Record. I suspect that is where the breakage occurs. |
11:01 |
Dyrcona |
Could be that I need to decode the data before passing it to MARC::Record? |
11:01 |
|
rjackson_isl_hom joined #evergreen |
11:01 |
Dyrcona |
Or would that be encode? |
11:03 |
Dyrcona |
Well, I can try it and see. |
11:03 |
Dyrcona |
That would be quicker than switching to MARCXML. |
11:03 |
Dyrcona |
@monologue |
11:03 |
pinesol |
Dyrcona: Your current monologue is at least 26 lines long. |
11:07 |
Dyrcona |
Another clue: using decode on the raw input gives me several "wide character at" messages. |
11:07 |
Dyrcona |
I may need to set the encoding to UTF-8 with IO::File.... |
11:08 |
Dyrcona |
So, decode on the input doesn't work at least not without encode on the output. |
11:09 |
Dyrcona |
So, I'm already setting binmode on the input and output. |
11:09 |
Dyrcona |
Maybe that's my problem? |
11:11 |
Dyrcona |
Right. Removing the binmodes and doing the manual decode seems to work. I'm going to remove the decode to see what happens. |
11:12 |
Dyrcona |
Ok, so the decode is not necessary, either. |
11:14 |
Dyrcona |
Let's see what happens if I set the input file to utf-8 binmode. I suspect it was the output file that was the problem. |
11:16 |
Dyrcona |
Yeah, that still produces the correct output. |
11:17 |
Dyrcona |
Ok. setting the output to utf-8, but not the input, also works. I think it's doing both that causes the problem, and if not, I'm being gaslighted..... |
11:18 |
Dyrcona |
No! why does it work, now? |
11:18 |
Dyrcona |
What else changed? |
11:19 |
Dyrcona |
Oh, wait... was it just using the Encode module? |
11:22 |
Dyrcona |
I'm stumped..... |
11:23 |
Dyrcona |
*facepalm* |
11:28 |
Dyrcona |
All right, I take that facepalm back. It wasn't that obvious.... Encoding in Perl can be tricky.... |
11:31 |
Dyrcona |
So, it is double encoding. Looks like I can specify encoding on either the input or output, but not both. |
11:33 |
Dyrcona |
Or, even not at all, since Perl defaults to UTF-8. |
11:33 |
Dyrcona |
@monologue |
11:33 |
pinesol |
Dyrcona: Your current monologue is at least 45 lines long. |
11:34 |
Dyrcona |
Thanks, ducky! |
11:34 |
Dyrcona |
I should check the load script. Not sure it does anyone encoding. |
11:38 |
Dyrcona |
Also, TRAMP++ EMACS++ # I edited the file on the remote test server, committed it to the repo on that test server, then pushed it to the main repo, all using Emacs running on my laptop.... BTW, I know you can do that in vim, but most vim users don't know that. :) |
11:41 |
Dyrcona |
It would be nice if there was an easy way in git to apply a commit to a different, almost identical file. There probably is, and I just don't have the magic sauce. |
11:41 |
Dyrcona |
@monologue |
11:41 |
pinesol |
Dyrcona: Your current monologue is at least 50 lines long. |
12:10 |
|
jihpringle joined #evergreen |
12:45 |
|
collum joined #evergreen |
17:04 |
mmorgan1 |
qexit |
17:04 |
mmorgan1 |
Oops. wrong window. Broke Dyrcona' monologue. :-( |
17:04 |
|
mmorgan1 left #evergreen |
17:18 |
|
jvwoolf left #evergreen |
17:48 |
|
Keith_isl joined #evergreen |
18:00 |
pinesol |
News from qatests: Testing Success <http://testing.evergreen-ils.org/~live> |
18:25 |
|
jihpringle joined #evergreen |
20:25 |
|
JBoyer joined #evergreen |
21:11 |
|
Keith-isl joined #evergreen |