Evergreen ILS Website

IRC log for #evergreen, 2022-03-01

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat

All times shown according to the server's local time.

Time Nick Message
06:00 pinesol News from qatests: Failed Running pgTAP tests <http://testing.evergreen-ils.org/~live//arch​ive/2022-03/2022-03-01_04:00:02/test.42.html>
07:13 rjackson_isl_hom joined #evergreen
07:16 collum joined #evergreen
08:00 mantis joined #evergreen
08:01 JBoyer ^ A test about spaces around values was *also* testing whether or not URIs were deleted when records are changed. We don't do that anymore, and now the test is updated.
08:23 pinesol News from commits: Update Test for LP1722827 After LP1482757 <https://git.evergreen-ils.org/?p=E​vergreen.git;a=commitdiff;h=e9c675​0312c9c1ba001119e918ac84f4bcd017a6>
08:33 mmorgan joined #evergreen
08:40 csharp_ JBoyer++
08:40 jvwoolf joined #evergreen
08:41 mmorgan JBoyer++
08:43 Dyrcona joined #evergreen
08:47 JBoyer joined #evergreen
09:00 Keith-isl joined #evergreen
09:01 rfrasur joined #evergreen
09:09 mmorgan1 joined #evergreen
09:24 * Dyrcona mumbles "smart quotes...."
09:27 Dyrcona Also, just plain junk in these records.
09:33 Dyrcona I wonder if we're using an outdated Unicode standard?--That "we" is meant to be vague, i.e. not necessarily Evergreen.
09:39 Dyrcona So, it looks like the preprocessing script does something to the records. Maybe I need to tell MARC::Record not to mangle the characters, somwehow?
09:56 Dyrcona Or, maybe it doesn't.... Comparing dumps of the processed versus raw records, the relevant bits of the busted records look the same.
10:08 Dyrcona I wonder if converting the records to marcxml will make a difference?
10:10 Dyrcona So, do I convert them with yaz-mardump or with MARC::File::XML?
10:15 Dyrcona Right, so when I use yaz-marcdump to convert the input records to marcxml, my editor shows the characters correctly. For some reason, I think my editor is treating the dumps as latin-1, even when I tell it they are UTF-8. What I suspect is MARC::Record and friends are mangling the characters because I'm working with binary MARC.
10:16 Dyrcona I have little proof, other than what I see in the files through my editor, and that the records get mangled by my preprocessor Perl program.
10:17 Dyrcona I will adapt my preprocessor and loader to work with marcxml and see what happens.
10:18 Dyrcona BTW, the input records say they are UTF-8 in the leader.
10:18 * Dyrcona quacks.
10:19 Dyrcona Hmm... Should I use MARC::File::XML on these records, or should I use LibXML? I can do what I want with either.....
10:23 * Dyrcona should write a MARC mode for Emacs. It couldn't be that hard... :)
10:26 Dyrcona @monologue
10:26 pinesol Dyrcona: Your current monologue is at least 15 lines long.
10:28 Dyrcona So, yeah, the MARC::Record code that I'm using is mangling the characters.
10:30 Dyrcona When I tell Emacs to use UTF-8 with one of the files, I get this: "...encountered characters it couldn’t encode..." followed by a list of characters that won't paste into my IRC client.
10:30 Dyrcona The error message is much more detailed.
10:31 Dyrcona I open the original MARC file and the mangled characters show up correctly in Emacs.
10:31 Dyrcona Proof!
10:55 rjackson_isl_hom joined #evergreen
11:01 Dyrcona I wonder if the problem is how I'm reading the binary files. I  open it via IO::File with the record separator set to \x1e\x1d because records with smart quotes would break with MARC::Batch or MARC::File::USMARC. After I get the raw MARC, I feed that to MARC::Record. I suspect that is where the breakage occurs.
11:01 Dyrcona Could be that I need to decode the data before passing it to MARC::Record?
11:01 rjackson_isl_hom joined #evergreen
11:01 Dyrcona Or would that be encode?
11:03 Dyrcona Well, I can try it and see.
11:03 Dyrcona That would be quicker than switching to MARCXML.
11:03 Dyrcona @monologue
11:03 pinesol Dyrcona: Your current monologue is at least 26 lines long.
11:07 Dyrcona Another clue: using decode on the raw input gives me several "wide character at" messages.
11:07 Dyrcona I may need to set the encoding to UTF-8 with IO::File....
11:08 Dyrcona So, decode on the input doesn't work at least not without encode on the output.
11:09 Dyrcona So, I'm already setting binmode on the input and output.
11:09 Dyrcona Maybe that's my problem?
11:11 Dyrcona Right. Removing the binmodes and doing the manual decode seems to work. I'm going to remove the decode to see what happens.
11:12 Dyrcona Ok, so the decode is not necessary, either.
11:14 Dyrcona Let's see what happens if I set the input file to utf-8 binmode. I suspect it was the output file that was the problem.
11:16 Dyrcona Yeah, that still produces the correct output.
11:17 Dyrcona Ok. setting the output to utf-8, but not the input, also works. I think it's doing both that causes the problem, and if not, I'm being gaslighted.....
11:18 Dyrcona No! why does it work, now?
11:18 Dyrcona What else changed?
11:19 Dyrcona Oh, wait... was it just using the Encode module?
11:22 Dyrcona I'm stumped.....
11:23 Dyrcona *facepalm*
11:28 Dyrcona All right, I take that facepalm back. It wasn't that obvious.... Encoding in Perl can be tricky....
11:31 Dyrcona So, it is double encoding. Looks like I can specify encoding on either the input or output, but not both.
11:33 Dyrcona Or, even not at all, since Perl defaults to UTF-8.
11:33 Dyrcona @monologue
11:33 pinesol Dyrcona: Your current monologue is at least 45 lines long.
11:34 Dyrcona Thanks, ducky!
11:34 Dyrcona I should check the load script. Not sure it does anyone encoding.
11:38 Dyrcona Also, TRAMP++ EMACS++ # I edited the file on the remote test server, committed it to the repo on that test server, then pushed it to the main repo, all using Emacs running on my laptop.... BTW, I know you can do that in vim, but most vim users don't know that. :)
11:41 Dyrcona It would be nice if there was an easy way in git to apply a commit to a different, almost identical file. There probably is, and I just don't have the magic sauce.
11:41 Dyrcona @monologue
11:41 pinesol Dyrcona: Your current monologue is at least 50 lines long.
12:10 jihpringle joined #evergreen
12:45 collum joined #evergreen
17:04 mmorgan1 qexit
17:04 mmorgan1 Oops. wrong window. Broke Dyrcona' monologue. :-(
17:04 mmorgan1 left #evergreen
17:18 jvwoolf left #evergreen
17:48 Keith_isl joined #evergreen
18:00 pinesol News from qatests: Testing Success <http://testing.evergreen-ils.org/~live>
18:25 jihpringle joined #evergreen
20:25 JBoyer joined #evergreen
21:11 Keith-isl joined #evergreen

| Channels | #evergreen index | Today | | Search | Google Search | Plain-Text | summary | Join Webchat