13:26 | Bmagic | small ones tend to load completely. But this interface is still Dojo, so I don't imagine anyone is interested until it's Angular |
13:41 | pinesol | [opensrf|kenstir] Fix LP#1883169 by using growing_buffer - <http://git.evergreen-ils.org/?p=OpenSRF.git;a=commit;h=a3368f9> |
13:42 | jvwoolf | Interestingly, we had one with 6 items fail to load |
13:44 | Dyrcona | jvwoolf: How big is the MARC associated with those? |
13:45 | Dyrcona | I'm pretty sure there is some MARC pulled over as well, though I might be thinking of something else. |
13:45 | Dyrcona | Oops! |
13:45 | jvwoolf joined #evergreen | |
13:46 | Dyrcona | jvwoolf: Check the public irc log. I think you might have missed something that I said. |
13:47 | jvwoolf | Dyrcona: Will do, my client keeps crashing today :( |
13:48 | jvwoolf | Cataloger says the MARC is "nothing out of the ordinary" |
13:52 | Dyrcona | Default max stanza size on Ubuntu 18.04 is 64K, IIRC, so if the MARC is around 10K each, or even for a few of them, thee you go. |
13:57 | Dyrcona | s/thee/there/ |
13:58 | jvwoolf | Dyrcona: Fair enough |
14:00 | jvwoolf joined #evergreen |
12:54 | berick | Dyrcona: it should pull from settings, but you can also override with --tempdir |
12:54 | collum joined #evergreen | |
13:01 | Dyrcona | berick: I see that, and I'm not sure that's my problem. The file in /tmp may or may not be related. It could be some Net::Server thing. |
13:04 | Dyrcona | Yeah, the 0 byte file in /tmp seems to get dropped when I kill the marc stream importer and then shows up when I start it. |
13:05 | Dyrcona | I think I remembered reading in the Net::Server docs somewhere that something required Socket::Linux, so I've installed that module via apt packages. We'll see if that makes a difference. |
13:09 | Dyrcona | ah ha. Maybe this is the problem: opensrf.settings.host_config.get training.cwmars.org, |
13:14 | mixo joined #evergreen | |
13:16 | mixo | thank you for help |
13:18 | Dyrcona | So, that opensrf.setting.host_config.get call seems to be coming from SettingsClient, but the subsequent opensrf.settings.default_config.get call does not appear to be logged, so I don't think getting the settings is the problem. |
13:18 | Dyrcona | mixo: On behalf of those who helped you, "You're most welcome!" |
13:22 | Dyrcona | berick: Can I just throw a binary marc file at the port to test it? (It looks like it, but thought I'd ask.) |
13:33 | Dyrcona | Well, i tried lobbing a MARC record at it using netcat and nothing happened. |
13:36 | Dyrcona | Interesting, it actually got something in my case: Sep 14 13:27:23 training /openils/bin/marc_stream_importer.pl: [INFO:5664:marc_stream_importer.pl:449:163163899556641] stream parser read 1603 bytes |
13:37 | Dyrcona | Whatever happens after that, I don't see authentication nor vandelay calls in the logs, and yes, I'm using syslog for everything. |
13:48 | Dyrcona | So, it is failing to create the temp file, but that failure is not logged. That's what it looks like right now. |
13:20 | abneiman joined #evergreen | |
13:20 | jweston joined #evergreen | |
13:38 | collum joined #evergreen | |
14:47 | Dyrcona | Does anyone use the MARC stream importer with OCLC? |
14:47 | Dyrcona | I have questions. |
14:51 | berick | Dyrcona: affirmative |
14:52 | JBoyer | claiming 1284 for (void *) |
08:52 | Dyrcona | I'll run my output through yaz-marcdump and see what it says. Most of the length errors seem to involve records with multibyte characters and we say the record is two to four bytes longer than the vendor does. I supsect that they might not be counting lengths correctly. |
08:53 | Dyrcona | I know we do have a bunch of garbage records with bad indicators and other junk. |
08:55 | mmorgan joined #evergreen | |
08:58 | Dyrcona | I've also asked this vendor if they can accept records as UTF-8 and not MARC-8. |
09:04 | Bmagic | JBoyer++ |
09:07 | Dyrcona | yaz-marcdump's messages are next to useless. |
09:08 | Dyrcona | "Separator but not at end of field...." |
13:01 | Dyrcona | jeffdavis: What Eg version? |
13:01 | jeffdavis | 3.7 beta-ish |
13:02 | Dyrcona | Ok. I've not seen that on 3.5.3, but we may also have very different Z39.50 use patterns. |
13:09 | Dyrcona | I wonder if I should open this bug on Evergreen or on marc-perl on github? I've got a bib that consistently produces a record with a bad length when exported in MARC8. |
13:10 | jeffdavis | bug 1940698 for the Z39.50 thing |
13:10 | pinesol | Launchpad bug 1940698 in Evergreen "Duplicate open-ils.search.z3950.search_class calls lead to drone exhaustion" [Undecided,New] https://launchpad.net/bugs/1940698 |
13:17 | Dyrcona | FYI: I just used the --pipe option on marc_export in production and it did exactly what I expected: Lp 1940662. |
16:36 | jeff | the changelog entry notes it as: "This restores sudo handling of $HOME to what everyone else does" |
16:41 | jeff | (and the actual change to "what everyone else does" probably only affects 19.10 (Eoan Ermine) and later.) |
16:53 | Bmagic | How does Evergreen "know" which index an authority record belongs to? Subject/Author ect. |
16:57 | Dyrcona | Bmagic: It starts with the headings fields in the MARC. |
16:58 | Bmagic | Let's say the authority record was both an author and a subject. Like Mark Twain. Books about him and books by him |
16:58 | Dyrcona | It should show up in both. |
16:59 | Dyrcona | It depends on a number of factors that would require me to do some digging. |
13:35 | Dyrcona | Thanks, terranm. I'm not sure we'd want to allow this, but I think it could be a useful feature with appropriate org unit settings. |
13:36 | terranm | But yes, the vendor only verifies the address against the USPS database and puts the data into a standardized format for us. We're not doing actual verification that the person lives at that address because the cost is much higher. |
13:42 | terranm | Dyrcona - yes, I agree. |
13:44 | Dyrcona | On an unrelated note, the split command is pretty much useless for splitting a file of binary MARC records. At least, I can't figure out how to use \x1E as the separator character. |
13:45 | Dyrcona | csplit also looks like it is not very useful for this task. |
13:56 | jvwoolf joined #evergreen | |
14:30 | terranm | JBoyer - I'm trying to test this patch on festivus but I'm getting browser console errors and an "Editing users in this group is disallowed " message when I try to edit a patron (logged in as admin) - https://bugs.launchpad.net/evergreen/+bug/1937299 |
13:00 | tlittle | terranm++ |
14:28 | Bmagic | Does Evergreen have a way to include the "call numbers" that are in the MARC into the "expert search" -> Call number? |
14:29 | Bmagic | 090a and 090b and 099a |
14:29 | Dyrcona | Bmagic: You can search by MARC field in advanced search, though I know that's not what you're asking. |
14:30 | Bmagic | sure, I'm aware of the tag searching. This question is specifially about the "Call Number" Search |
14:30 | Dyrcona | Yeahp. You'd have to write some custom code to do what you're asking. |
14:31 | Bmagic | That search is searching asset.call_number right? |
14:36 | Bmagic | That's what I'm coming up with too. Dyrcona++ |
14:37 | Bmagic | I thought I understood this cataloger to say that this used* to work. But I don't think it ever could have |
14:37 | JBoyer | Heads up, Dev meeting is scheduled to be in ~30. There's nothing but placeholders and LP updates currently; if there's nothing else anyone wants to discuss I'd recommend we give this month a pass. |
14:38 | Dyrcona | Bmagic: Yeah, I don't think that ever worked. Only way to search those field is MARC expert search. |
14:38 | Bmagic | TY, thanks for confirming |
14:38 | Dyrcona | You could add an index for those fields and add it as an advanced search option, bu t it would still only search the MARC, not asset.call_number and MARC. |
14:39 | Dyrcona | You'd have to modify the backend somewhere to search both. |
14:39 | Dyrcona | JBoyer: I'm cool with having the meeting or skipping it. |
14:39 | Bmagic | JBoyer: skipping is fine with me as well |
08:49 | dguarrac joined #evergreen | |
09:06 | Dyrcona joined #evergreen | |
09:14 | jvwoolf joined #evergreen | |
09:27 | Dyrcona | So, for those following along (or not), I have run my db upgrade script on Pg 9.6, Pg 10, and Pg 11 so far. This includes a partial ingest of record attributes for MARC item type g. |
09:31 | Dyrcona | It took 28 hours 48 minutes on Pg 9.6. On Pg 10, it ran for 25 hours 3minutes. Pg 11 finished after 12 hours 7 minutes. This is with the db being "optimized" and the server rebooted in between each configuration change. |
09:31 | berick | pg11 FTW |
09:32 | Dyrcona | I probably won't get around to "testing" Pg 12 and Pg 13 until next week. I want to restore the configuration for Pg 10 for an automated production restore over the weekend. |
13:51 | jeff | also, i think the desktop client may initiate the connections from the client PC. |
13:52 | jeff | the desktop connexion client also has another option, "OCLC Gateway Export". this lets the ILS return a status message that's displayed to the end user. |
13:52 | jeff | OCLC and Koha have some documentation on that. I think I looked at that a while ago also. |
13:53 | Dyrcona | The marc stream importer also appears to be aimed more at loading a file via command line, since it requires a username, password, workstation, queue, etc as command line options. This does not seem to be optimized to run as a server process. |
13:56 | JBoyer | I thought those might be optional in server mode (though I haven't looked in a long time) |
13:57 | Dyrcona | I'll keep looking. Thanks, again. |
13:59 | jeff | not sure about "optimized" to run as a server process, but it certainly supports running as a server process. |
09:43 | stephengwills | the insight into your process is helpful. |
09:49 | Dyrcona | stephengwills: Lp 1936662 if you'd like to confirm |
09:49 | pinesol | Launchpad bug 1936662 in Evergreen 3.7 "Missing Did You Mean Prequisites in Pg Server Installation Make Targets" [Undecided,New] https://launchpad.net/bugs/1936662 |
09:53 | Dyrcona | Hm... After looking more closely, they are there. I should double check if they're getting installed or not. It could be that the CPAN_MODULES_PGSQL is just not being called. In which case MARC::File::XML would also be missing on a new installation. |
09:55 | Dyrcona | Y'know what. I was too quick to open the Lp bug. They *should* be getting installed. |
09:55 | Dyrcona | I'm going to check on a clean VM before I mark the bug invalid, though. |
09:57 | stephengwills | Dyrcona++ |
13:03 | jeffdavis | I should read what I type before I hit Enter |
13:24 | jvwoolf joined #evergreen | |
13:52 | jvwoolf1 joined #evergreen | |
14:08 | Dyrcona | Bmagic: RE the trouble you reported with parsing MARC records last week, I heard today that OCLC has an issue that they're generating corrupted MARC records sometimes with cat express. I'm also told that they're working on it. |
14:11 | Bmagic | Dyrcona: interesting! |
14:16 | dickreckard joined #evergreen | |
14:17 | rhamby | yeah, someone mentioned it on the evergreen catalogers list last week, what they reposted from oclc wasn't much just "catexpress and connexion are having issues and we're working on it" if I remember the essence correctly |
09:52 | jvwoolf joined #evergreen | |
09:55 | awitter joined #evergreen | |
10:47 | Stompro joined #evergreen | |
11:08 | Dyrcona | Grr. marc-- |
11:08 | Dyrcona | gmcharlt: The badldr.usmarc test data also has an extra record terminator in it. |
11:09 | Dyrcona | tsbere's code causes it to error out, but the normal code doesn't. I think that's a different bug in the normal code. |
11:11 | Dyrcona | Record number 2 ends like this: \x1e\x1d\x1d, or you could say there's an extra \x1d between records 2 and 3. |
11:25 | Dyrcona | \x1e\x1d as the record separator and consider anything that doesn't conform to that as junk. |
11:44 | jihpringle joined #evergreen | |
11:59 | Stompro joined #evergreen | |
12:21 | Dyrcona | Heh. You get amusing results trying to run a MARC file as a Perl script. :) |
12:31 | collum joined #evergreen | |
12:52 | Keith-isl joined #evergreen | |
13:18 | jvwoolf joined #evergreen |
12:36 | rhamby | I usually scan for that and have it print them to visually eyeball and spot problems pretty fast |
12:37 | Dyrcona | Bmagic: I did a quick perusal of my scripts and I don't see anything like what you mentioned. |
12:37 | collum joined #evergreen | |
12:38 | Dyrcona | The real problem is when 1 MARC file contains text in different encodings, or worse Windows-1252 with smart quotes. |
12:38 | rhamby | yeah, that's why my solution does a decent job of finding the issues |
12:39 | rhamby | note that breaking strings into arrays and scanning with ord can be really really slow on big files but I cheat and do titles and authors usually and that is usually a good indicator if I need to dig deeper |
12:39 | rhamby | and yeah, the declaration if the file is marc8 vs unicode is usually more a hopeful statement than anything factual |
12:39 | Dyrcona | I've used python chardet with non-marc data. It could be used on a field by field basis with pymarc. |
12:40 | Dyrcona | Unicode is often spelt ISO8859-X (where X is a number). :) |
12:42 | Dyrcona | That should be misspelt, not spelt. :) |
12:48 | Dyrcona | I do have a script that spits out the record warnings and the encoding as understood by MARC::Record. |
12:49 | collum joined #evergreen | |
12:50 | Dyrcona | Bmagic: https://pastebin.com/Pef0KLeL |
13:00 | sandbergja joined #evergreen | |
15:47 | Bmagic | correction "$file = MARC::File::USMARC->in($filename)" |
15:48 | Dyrcona | Are you having any particular problems other than encoding? |
15:49 | Bmagic | I'm thinking of potential issues with the way this script reads the records to make it more "compatible" for the masses |
15:50 | Dyrcona | My script reads the records that way because that is a) how MARC::File::USMARC->in() does it, but also b) when you have "smart quotes" pasted into a field, you actually need to split records on \x1E\x1D because \x1D is in the smart quote sequence. |
15:51 | Dyrcona | If you're having encoding issues with some records, I'd suggest trying pymarc and chardet to go over each field. You can then convert the data field by field if necessary. |
15:51 | Bmagic | wow, I guess the question is: should I write this to support "smart quotes" |
15:52 | Bmagic | maybe the contribution needs to land in MARC::File instead of my script? |
15:52 | Dyrcona | MARC::FILE may already work with smart quotes. I know that tsbere opened a ticket on CPAN about it. I don't know if his patch ever went in. |
15:53 | Dyrcona | gmcharlt should know, as i think he is one of the maintainers of MARC::File. |
15:55 | gmcharlt | I should check the patch queue, but yeah, at the moment smart quotes would break MARC::File::USMARC's expectations |
15:56 | Dyrcona | I was just looking at rt.cpan.org. |
15:56 | Dyrcona | I couldn't find the bug report. |
16:01 | Dyrcona | Well, it's not reported by tsbere, but here it is: https://rt.cpan.org/Ticket/Display.html?id=70169 |
16:01 | Dyrcona | Looks like rt.cpan.org is being shutdown. |
16:01 | gmcharlt | I'll take (another) look |
16:03 | Dyrcona | gmcharlt: If you want a patch, I could probably provide. I recall tsbere writing one for this. Maybe it was for MARC::Batch? |
16:03 | Bmagic | gmcharlt's comments from 2011 on that ticket are great: "not that encouraging such sloppy MARC records is a good idea. :)" |
16:03 | gmcharlt | Dyrcona: sure, happy to take a patch from you |
16:05 | Bmagic | just to be clear: I don't need to do anything special when I pass a UTF8 or a MARC8 or a MARC21 file into MARC::File::USMARC ? |
16:05 | Dyrcona | Ha! I thought rt.cpan.org would be closed by now, but the latest bug on MARC::File is 20 minutes, and it's a spam. |
16:06 | Bmagic | MARC::File::USMARC does all the work for me? Figuring out which character set to use and whatnot? |
16:06 | Dyrcona | Bmagic: Usually, yes. |
16:06 | Dyrcona | If the encoding is set correctly in the file. |
16:07 | Dyrcona | What are you actually trying to do? Load records from a migration/new library, a vendor? |
16:15 | Dyrcona | gmcharlt: Is there a git repository for MARC::Recod & company? |
16:16 | gmcharlt | Dyrcona: yeah: https://github.com/perl4lib/marc-perl |
16:17 | Dyrcona | Cool. I make an issue and pull request there. |
16:18 | Bmagic | Dyrcona: This sprang from the electronic_bib_import.pl work I'm doing |
16:19 | Bmagic | answer: probably not migration, but yes on the vendor |
16:21 | Dyrcona | Bmagic: USMARC records should be in MARC8 unless the leader says UTF-8. Trouble is, I've seen just about anything in actual MARC records, and it is difficult to tell at run time. |
16:24 | Dyrcona | For the logs and anyone else following along at home: It turns out that tsbere made a pull request on github for the issue, but his code breaks the tests. I'll take that up and see if I can fix it so it doesn't break the tests. |
16:27 | Dyrcona | Also, for Bmagic, and those following along, generally, the only way to detect the encoding of a MARC record that says it is MARC8 is to assume it is MARC8, convert it UTF-8 for Evergreen and catch any errors. That's another reason why I often read the records individually and convert them to MARC::Record inside an eval, so that 1 or two bad records don't spoil the whole batch. |
16:30 | jihpringle joined #evergreen | |
16:32 | jeff | indication of marc8 vs utf8 is at the record level, not the file level, right? |
16:33 | Dyrcona | jeff: Yes. |
11:06 | collum joined #evergreen | |
11:59 | dbwells joined #evergreen | |
11:59 | Dyrcona | It would nice if there were a way, in SQL, to say that you want to select all but 1 or 2 two fields from a table rather than having to type out to copy and edit a list of columns. |
12:01 | Dyrcona | It would be really useful for skipping the marc column(s) for instance. |
12:02 | sandbergja joined #evergreen | |
12:08 | tlittle | I totally agree, Dyrcona! Google helpfully reminds me that I've googled many times if I can do that in SQL, but alas |
12:54 | mantis left #evergreen |
14:23 | rhamby | copy and pasting into marc is just evil, but I also have a special disdain for copy and pastes (with various flavors of end of lines) that end up in csv files |
14:24 | Dyrcona | Yeahp, and particularly with "smart" quotes. |
14:25 | rhamby | oh yeah, anohter thing to add to my csv correction script .... |
14:26 | Dyrcona | It's even worse in MARC, since one of them looks like the end of record character. |
14:32 | Dyrcona | I wonder if we should limit authority_full_rec_value_index to the first 2000 characters or so, using substring? |
14:33 | Dyrcona | I saw one suggestion online to use GIN for the index, but we can't since the field is not a tsvector. |
14:34 | Dyrcona | The error output suggests full text indexing as an option: Consider a function index of an MD5 hash of the value, or use full text indexing. |
09:50 | bshum | For same reasons |
09:50 | Dyrcona | Bmagic: How big are these invoices, i.e. # of line items? |
09:51 | Bmagic | 60 or 70 lines |
09:51 | Dyrcona | Oh, and I guess that includes a chunk of MARC for each one, so if the records are detailed..... |
09:51 | bshum | Yep |
09:52 | Dyrcona | Assuming 15K per entry, you're looking about 1MB to retrieve all that in one go. |
09:53 | Bmagic | I solved the stanza size issue with a config tweak to ejabberd. But this other thing about incompletely loading the UI is a different thing. An issue I've seen for many months. Maybe years. But shrugged it off because the interface is still dojo |
12:36 | Dyrcona | Time is hard. |
12:43 | Dyrcona | When I add dates to milestones on Launchpad, I sometimes have to change the day because they can be off by a day. |
12:46 | * Dyrcona | wonders how difficult it would be for "Use Now" on workstation registration to just work without having to log in again. |
12:50 | Dyrcona | So, I just tried using MARC batch import with master, and I get upload progress 100%, but enqueue progress is 0%. There's also nothing in the /tmp directory AFAICT. |
12:51 | jeff | have to teach open-ils.auth how to assign a workstation to a workstationless login and change type. Might also violate some other assumptions. |
12:51 | jeff | Dyrcona: is your web server running with private temp, and you're using /tmp as the queue location? You'll need to either change to a different dir or stop using private tmp for the apache service. |
12:52 | jeff | (many public services like apache are launched by systemd with private /tmp by default on a lot of systems now) |
13:03 | jeff | apache's /tmp is not your /tmp :-) |
13:04 | jeffdavis | I believe private /tmp is a change between Ubuntu 16.04 and 18.04 fwiw |
13:07 | Dyrcona | Well, I hadn't noticed because I don't use it much on development, and we've been using /openils/var/tmp mounted via NFS in production for years. |
13:08 | Dyrcona | So, I'm actually testing a security bug and trying to import a MARC record to trigger it, but the record won't import. |
13:08 | Dyrcona | When I select it and hit Import Selected Records, the screen goes back to the main Import view, and the record does NOT end up in biblio.record_entry. |
13:09 | Dyrcona | FWIW, I have no idea what I'm doing in the staff client, particularly the Angular interface. |
13:28 | Dyrcona | So, maybe Vandelay is broken in master on Ubuntu 20.04? |
09:09 | berick | csharp: JBoyer: we also do monthly batch exports to ebsco, which I hadn't put 2-2 together at the time, but in the end we also don't need/want the real time data. |
09:11 | JBoyer | Ah, ok. That is a lot closer to ok than "we're being scraped by a vendor we don't even use." Still not great but the eyebrow has been lowered. |
09:11 | berick | exactly |
09:17 | Dyrcona | We do a weekly ISBN dump for Novelist and a monthly MARC dump for EDS and send them to EBSCO. |
09:20 | Dyrcona | I suspect that the EBSCO "attacks" are them looking up data for the on-the-shelf feature of Novelist. |
09:33 | Bmagic | I can say with 95% certainty that unapi was the catalyst for the DOS yesterday. That's 6 bricks at 8 CPU's each, 48 CPU's maxed out. Today, we have a limit on that URL at nginx |
09:35 | berick | Bmagic: mind sharing your nginx rule? just curious |
15:58 | mmorgan | Another colleague wrote a tool to remove 856 fields from the MARC when links need to be removed. |
15:59 | * mmorgan | doesn't do this directly but it's often discussed as it's so time consuming |
16:12 | Dyrcona | I have a script and some extra tables to make removal of located URIs by vendor/library easier. |
16:13 | Dyrcona | My script finds the records to be edited and removes the 856 tags from the MARC and then updates the MARC. |
16:43 | dbwells | mmorgan: Dyrcona: Thanks for the responses. In our case, we are deleting the bib records as well. Is there any reason to be concerned about leaving located URIs in deleted records? I'd be surprised, but you never know. |
16:44 | dbwells | In my ideal workflow, deleting the record would make the URI bits go poof, but we weren't that lucky :) |
16:47 | Dyrcona | Yeah, I haven't looked into what happens to located URIs when you delete a bib record. |
11:04 | Dyrcona | I'll generate another profile of undoing the delete with a title and description and share that if anyone else wants to see what I'm talking about. |
11:04 | Dyrcona | s/delete/update/ |
11:04 | sandbergja joined #evergreen | |
11:04 | Dyrcona | I have a "revert" SQL with the original authority record marc handy. |
11:08 | Bmagic | jeff: yeah, mostly that. I can confirm that you shoild do it, lol. |
11:08 | Bmagic | Shouldn't* |
11:10 | Dyrcona | This time it was faster, probably because of database cache, but still too long.... |
09:42 | Dyrcona | Either space or -. |
09:44 | Bmagic | alright - thems the breaks I guess |
09:44 | Dyrcona | We have a Genre entry. |
09:46 | Dyrcona | Put his in xpath: //marc:datafield[@tag='600'] |
09:47 | Dyrcona | Put this in browse_xpath and display_xpath: //*[local-name()='subfield' and contains('abcdfgklnpstuvxyz',@code)] |
09:48 | Dyrcona | Then whatever you want for the joiner, I'd recommend '-- '. NOTE the space after --. |
09:48 | Dyrcona | Try that see if it's close. |
10:25 | * miker | looks up |
10:25 | Dyrcona | Oh no, never mind. xpath syntax sucks. |
10:25 | Bmagic | xpath_syntax-- |
10:26 | Dyrcona | I think you should remove this from the xpatch field: and marc:subfield[@code="a"] and marc:subfield[@code="d"] |
10:26 | Bmagic | I threw that in there later. It was* identical to your suggestion above |
10:27 | miker | if you have a special-purpose cmf there's no reason you couldn't create an xslt to do something special |
10:28 | Bmagic | It sounds like we can't get subfield a+b and the rest of them joined with dashes - which is probably ok. At this point, I am trying to figure out why the index STILL has dahses with joiner set to null |
11:19 | Bmagic | I believe that setting used to be false which would explain how the bib exists in the first place |
11:20 | Dyrcona | Bmagic: Maybe. It's supposed to replace the 001 with the bib id and move the original 001 to an 035 on import. |
11:20 | Dyrcona | You can try setting it back to false to see if that helps. |
11:21 | Dyrcona | BTW, bug 1859191 indicates that the web staff client MARC editor ignores that flag. |
11:21 | pinesol | Launchpad bug 1859191 in Evergreen "Editing and saving MARC record changes the TCN value" [High,Confirmed] https://launchpad.net/bugs/1859191 |
11:21 | Bmagic | Dyrcona++ |
11:23 | Dyrcona | Too many moving parts-- |
10:23 | * dbs | for this specific library doesn't want hotkeys for anything, but there's no easy way to turn them off |
10:24 | dbs | short of creating a custom web staff client build at nohotkeys.concat.ca :) |
10:26 | Dyrcona | You should never trap the keys used by the browser in my opinion. F1 is almost always the help key. |
10:27 | * Dyrcona | think aloud: Hmm... GNU Emacs mode for MARC editing that can talk to Evergreen.....that *might* work.... |
10:28 | * Dyrcona | mumbles as he walks off. |
10:45 | dbs | Yeah I've been bitten by hitting F5 to reload :) |
12:01 | abowling joined #evergreen |
08:42 | Dyrcona | ...since we're doling out the karma. :) |
08:42 | Dyrcona | chardet++ |
08:42 | mmorgan joined #evergreen | |
08:45 | Dyrcona | Email can be worse than MARC when it comes to character set issues. |
08:48 | dbwells | agoben++ |
08:48 | * Dyrcona | goes back to writing an email to summarize hack-away activity for CW MARS staff. |
08:53 | * mmorgan | would love to see a message summarizing the hackaway go out to the general list. |
08:59 | Dyrcona | Grr. Trying to convert a do block that uses exception/when into a function and Pg keeps saying syntax error at or near END for the END closing the function's begin. Doesn't matter if I add a semicolon or remove it. |
09:01 | Dyrcona | OIC.... I need another BEGIN... |
09:04 | Dyrcona | Thanks, ducky! :) |
09:15 | Dyrcona | Whee! Nothing like call number labels to contain garbage, well, after MARC.... :P |
09:17 | aabbee joined #evergreen | |
09:32 | jvwoolf joined #evergreen | |
09:33 | rhamby | Dyrcona: you forget addresses :) |
11:06 | berick | yeah, Wilmington got a nice blast. |
11:07 | berick | i watched some Outer Banks webcams this morning and it's just a grey windy mess |
11:07 | Dyrcona | Yeah.... |
11:08 | Dyrcona | 2019-09-06 10:32:53 bd1-bh4 open-ils.vandelay: [ERR :3810:Vandelay.pm:272:1567780218267323] unable to read MARC file |
11:08 | berick | Dyrcona: that's likely the client_max_body_size issue |
11:08 | Dyrcona | Was typing the question! |
11:08 | Dyrcona | berick++ |
15:30 | csharp | the problem didn't exist in the new ng staff catalog when I tested that on one of the "problem" records |
15:30 | csharp | but does in the AngJS version on current-ish master |
15:33 | Dyrcona | csharp: Maybe that's what they were trying to explain to me yesterday. I was told it happens with tags that have a second indicator. |
15:34 | Dyrcona | I looked at the MARC edit view and not that one, let me look at the bibs they sent me again. |
15:34 | Dyrcona | I also was not sent screen shots. |
15:35 | Dyrcona | Oh, yeah. They also added this as a comment on a totally unrelated ticket, though I guess to the staff it could seem related when you don't know how it all works. |
15:38 | Dyrcona | Yeahp. I see it with one of the two sample records but not the other. |
08:39 | rhamby | I'm still on my first cup so my intelligent well composed response is: ug |
08:43 | Dyrcona | So, I'm explaining the problem and giving the ticket back to the cat center. |
08:44 | * Dyrcona | catalogs by editing the marcxml in a text file and then updating the database via psql. :) |
08:45 | Dyrcona | Heh. I should write a MARC mode for emacs. :) |
08:45 | * Dyrcona | checks if one already exists. |
08:49 | Dyrcona | Heh. Bmagic might like this: marcopolo - Emacs client for Docker API |
09:08 | jvwoolf joined #evergreen |
12:44 | berick | beware MARC records are patron-visible |
12:44 | Dyrcona | That, or you could modify the database trigger. |
12:44 | Bmagic | Dyrcona: that sounds like a good place - are you thinking in perl? or postgres? |
12:44 | Dyrcona | And what berick said is one reason I wouldn't do this in the MARC itself. |
12:45 | Dyrcona | I was thinking to add the hook mechanism in the Perl, but you'd probably need some database support to configure it. |
12:45 | Bmagic | yeah, maybe I could convince them not to use the tag |
12:46 | Dyrcona | Another reason that I wouldn't do it in the MARC is it makes the records longer, and doesn't add anything "useful" to the bibliographic data. |
12:47 | Dyrcona | It's metadata, and should be recorded elsewhere. |
12:47 | Dyrcona | Just my opinion.... |
12:47 | berick | and in this case, meta-metadata |
10:13 | collum joined #evergreen | |
10:59 | khuckins joined #evergreen | |
11:10 | khaun joined #evergreen | |
11:27 | Dyrcona | Seems like after each Evergreen release messing with MARC records in the database gets slower. |
11:28 | Dyrcona | It's gotten to the point where it would be more efficient for staff to load records via Vandelay than using my Perl DBI scripts. |
11:40 | csharp | our match sets create ugly and super slow MARC loading through vandelay too |
11:43 | sandbergja joined #evergreen | |
11:54 | dbs | Also, I thought most of our vendors still use FTP or the like, rather than encrypted transfer methods... |
11:57 | sandbergja_ joined #evergreen | |
12:01 | dbs | Dyrcona: hrm, how do we spin that as a feature in the release notes? "Indexing now 5% slower!" |
12:02 | Dyrcona | dbs: Pretty much. Last time I tried timing things, it took 2 seconds to update a MARC record. |
12:03 | dbs | There was a lot of wisdom to the slim MODS approach for indexing, but I guess the demands for incredibly fine-grained search pushed us in a different direction |
12:03 | jihpringle joined #evergreen | |
12:03 | dbs | Also maybe complex Perl inside the database... |
12:08 | Dyrcona | "FTP was fine in 1999!" (No, it wasn't, but stil....) |
12:08 | dbs | Folio went with a MARC-centric approach to indexing and display and are now thinking about how to integrate BIBFRAME in parallel; their path will likely lead to a common intermediate format as well |
12:08 | dbs | Dyrcona++ |
12:09 | * Dyrcona | begins to suspect that MARC is part of the problem. |
12:09 | csharp | @quote add * Dyrcona begins to suspect that MARC is part of the problem. |
12:09 | pinesol | csharp: The operation succeeded. Quote #198 added. |
12:09 | dbs | (I believe slim MODS was supposed to be an intermediate format for Evergreen too but we've hard-coded MARC into frontend, middle layers, and backend all over now) |
10:14 | Dyrcona | Same time as the reshelving complete job, that is. They now run on different util servers, but still the same database. |
10:15 | Dyrcona | Could be some deadlocks going on there, too. |
10:15 | Dyrcona | I'll check the pg logs. |
10:19 | Dyrcona | Bleh... whole MARC fields showing up in the logs. Probably a long-running query or some other failure. |
10:21 | Dyrcona | And, no. Nothing about locks in the logs for the relevant time period. We may not be logging locks in Postgres. |
10:34 | JBoyer | Calling 1163, let's see how this goes. |
10:34 | * berick | holds JBoyer's beer |
13:42 | jeffdavis | check the postgres docs for the exact syntax for that command - the point is that "evergreen" needs to be listed first in your search_path |
13:42 | jeffdavis | since it's not, you are probably running into old but still-installed versions of db functions/triggers as a result |
13:43 | abowling | jeffdavis: that's almost certainly what i'm running into |
13:50 | * Dyrcona | suspects non-UTF8 in MARC records, but the search path, etc., are good places to start since they're quicker to resolve if that is the case. |
13:51 | berick | gmcharlt: at minimum, a sign off on the Hatch.git changes would suffice. I can upload the changes to the chrome store. |
13:51 | berick | and a FF stored upload volunteer as well |
14:11 | jeff | When using the hatch installer, the Chrome extension is installed. Does that extension come from the installer or from the chrome store? |
08:44 | aabbee joined #evergreen | |
09:05 | sandbergja joined #evergreen | |
09:22 | _bott_ joined #evergreen | |
09:24 | Dyrcona | So, I'm being asked to make an Opac Icon Format and Search Format from a RDA field value. Can that even be done? The only examples I've seen so far come from MARC fixed fields. |
09:28 | Dyrcona | I guess that is possible.... |
09:28 | JBoyer | You can do it, I've added a bunch here based on the 753 (for gaming systems) |
09:29 | Dyrcona | JBoyer: Yeah, I just found it in the documentation, which tells me how to do it in the client. I would rather something that I can do in the database, but I can always set it up on a test server and extract the new table entries. |
09:31 | JBoyer | Getting them to actually show up will pretty much require tearing down and rebooting all of your memcache servers though, so that's a lot of fun. |
09:31 | nfBurton | Yup mine have sometimes taken 2 weeks to show |
09:32 | nfBurton | And noone seems to know why |
09:32 | Dyrcona | JBoyer: I've done set them up by hand before, but don't recall if I used arbitrary MARC fields in them that way. |
09:32 | Dyrcona | JBoyer: If I can set this up today, I can do it during the upgrade when I'm rebooting memcached anyway. :) |
09:33 | JBoyer | Dyrcona++ |
09:33 | Dyrcona | Is something supposed to happen when you click "Save" in the Coded Value Maps interface? |
11:36 | Dyrcona | Which will shower you with binary. |
11:36 | bshum | That'd be my guess too |
11:37 | Dyrcona | You might want to "set_marcdump <somefile>" if all you care about is timing. |
11:37 | Dyrcona | That will dump the marc to somefile and not your screen. |
11:38 | abowling joined #evergreen | |
11:40 | bshum | csharp: When you guys moved servers, what version of OpenSRF do you have? And what's your ejabberd max_stanza_size set to |
11:41 | bshum | I just wonder if it's some weird collision with chunking/bundling that's not being logged well |
15:31 | makohund | A bit better at handling large batches? |
15:35 | Dyrcona | Yes. You can run an arbitrary number of records through it without having to do smaller batches through Vandelay. |
15:36 | Dyrcona | Things may have improved in Vandelay recently, but we used to not be able to tens of thousands of records at once in Vandelay, at least not in a manner reasonable for the cataloging staff. |
15:38 | Dyrcona | I'd put the MARC file on a utility vm, and then schedule the program to run sometime at night. |
15:40 | makohund | Cool, thank you. Will have to take a bit of time to suss it out (I don't know the MARC functions very well), but bypassing chunking it up for Vandelay's sake sounds great to me. |
15:41 | Dyrcona | You'll notice it doesn't actually use MARC to parse the file. That's because "smart quotes" will break the MARC parsers in Perl. |
15:42 | Dyrcona | And, yes, you will eventually find MARC records with smart quotes in them. Copy and paste.... |
15:45 | makohund | Ack. Sounds like they need an "eat smart quotes, spit out regular quotes" filter of sorts. |
15:46 | jonadab | That would in principle not be a difficult regular expression. |
15:46 | Dyrcona | Well, the record terminator character in MARC 0x1E, is apparently part of a Windows smart quote regular expression. |
15:46 | jonadab | Oh. |
15:46 | jonadab | Eww. |
15:46 | Dyrcona | bleh.... reading and typing at the same time.... :) |
15:47 | Dyrcona | That's why I set the Perl record separator to the combination of the stop and start characters for the record. I think tsbere opened a bug on this for MARC::Record and/or MARC::Batch over a cpan some years ago. |
15:47 | makohund | Wow... what an ugly coincidence. |
15:51 | makohund | I'd wondered previously if perl might not be a better route for messing with the incoming 856's & $9's than MarcEdit. Any thoughts on that before I investigate either option? |
15:53 | Dyrcona | Well, it could be, but you'd have to layout what you want to add somehow. |
13:47 | aabbee | nope. did not realize that existed. |
13:47 | aabbee | dbwells++ |
13:52 | aabbee | that worked perfectly. (but if i have to make code changes to hide the field anyway, i might as well just set the default field visibility to -1 and not bother with the org unit settings at all) |
14:42 | Dyrcona | hmm... Do w have an unflatten marc command? |
15:00 | yboston joined #evergreen | |
15:07 | sandbergja_ joined #evergreen | |
15:11 | pinesol | [evergreen|Jeff Davis] LP#1801191: ensure recall does not extend due date - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=9c90558> |
08:45 | mdriscoll joined #evergreen | |
08:51 | Dyrcona joined #evergreen | |
08:51 | aabbee joined #evergreen | |
09:10 | Dyrcona | @later tell yboston The problem is that there is no way to match subfields to a particular field in metabib.full_rec, nor in the table metabib.real_full_rec. I recommend pulling out the marc from biblio.record_entry and extracting the fields with XPATH or using the MARC::Record Perl module. |
09:10 | pinesol | Dyrcona: The operation succeeded. |
09:14 | yboston joined #evergreen | |
09:26 | jvwoolf joined #evergreen |
12:46 | aabbee | great, thanks! |
13:01 | bos20k joined #evergreen | |
13:08 | csharp | Dyrcona: if this is too complicated to get into or is documented elsewhere, feel free to ignore, but what is the difference between what pingest.pl does and the "UPDATE biblio.record_entry set id = id" approach? Does pingest.pl care about the "reingest on same marc" flag? |
13:09 | Dyrcona | csharp: pingest.pl runs the ingest database functions, so it does not care about reingest on same marc flag. |
13:10 | Dyrcona | It also allows you to skip certain ingests based on command line switches. |
13:10 | Dyrcona | So, if the changes don't affect browse, you can skip the browse ingest, for instance. |
13:14 | Dyrcona | It also handles splitting the ingests up into parallel batches for you, so you don't have to write out separate SQL files and run them yourself. |
13:26 | dickreckard | default behavior. |
13:27 | dickreckard | I looked on launchpad and couldn't find mentions about it. so let me know if I can add notify this somewhere. |
13:29 | jeff | It might be reasonable to review changing the default value of <importer>/tmp</importer> in openils.xml.example |
13:31 | Dyrcona | dickreckard: You should be able to get marc records out of the URL that miker shared earlier. |
13:31 | sandbergja joined #evergreen | |
13:32 | dickreckard | damn! |
13:32 | dickreckard | didn't see that jeff . *facepalm* |
11:42 | Dyrcona | Funny you should mention that. |
11:43 | csharp | getting the core_query from the logs... |
11:43 | Dyrcona | I have noticed what appears to be a slow down in our electronic resource URL mechanism since we upgraded to 3.0 back in May, though I've only done two "big" loads since then. |
11:43 | Dyrcona | My current load seems to take 18 hours to process approximately 1,000 MARC records. |
11:44 | Dyrcona | I don't recall it being that slow when we were on 2.12, but the Evergreen version may be a red herring and it is likely to be bad statistics. |
11:44 | Bmagic | the code I wrote is very slow. It's due to the fuzzy matching query it uses to attempt to match the incoming bib. It's not strictly matching fields |
11:45 | Dyrcona | I will share my code in a minute. There are two database queries used to search for matching records. It took me a long time to get the performance acceptable on 2.10/2.12. |
14:28 | csharp | I am still trying to discern which records are causing reingest to fail, so I don't have enough data yet to know |
14:28 | jeff | fwiw, the marc field from the full paste of the record (psql extended view) seems to parse without warning with xmllint and yaz-marcdump. didn't try throwing it against anything else, like MARC::Record yet. |
14:28 | csharp | I haven't added any indexes - this is on our freshly upgraded 3.2 server |
14:29 | Dyrcona | You said update.... Is the MARC being changed? |
14:29 | csharp | shouldn't be |
14:29 | csharp | oh, well in this case, it was an overlay I think |
14:29 | csharp | as in one of our catalogers attempted to overlay it |
10:51 | Dyrcona | harmless wrong tab. :) |
10:52 | Dyrcona | And, since I just restarted apache on my test vm I have a question: |
10:53 | Dyrcona | Has anyone noticed that you have to restart/reload apache after restarting the opensrf.settings service lately? |
10:54 | Dyrcona | Since 3.0 or possibly the web client, if I restart opensrf.settings and open-ils.cat to pickup new MARC templates, for instance, neither the web client nor XUL will let me login until I restart apache2. |
10:54 | Dyrcona | That didn't used to happen, IIRC. |
10:57 | kmlussier | jeff++ # Continued efforts to make #evergreen usable while keeping the spammers out. |
11:08 | Christineb joined #evergreen | |
15:17 | berick | er, translate-toolkit |
15:18 | berick | the xliff files are lot more expressive, hopefully we can use them directly in the near future |
15:19 | Dyrcona | Well, po files were designed to work with compiled software in the last century, before XML was a thing. |
15:20 | Dyrcona | Kind of like using MARC for bibliographic record data.... |
15:20 | berick | *zing* |
15:22 | nfburton joined #evergreen | |
15:32 | yboston joined #evergreen |
12:09 | mmorgan joined #evergreen | |
12:12 | bshum | Dyrcona: Well that's special... |
12:12 | bshum | I guess I hadn't gotten that far in my testing yet, since I couldn't get OpenSRF working, I ended there |
12:13 | Dyrcona | I wasn't even trying to install Evergreen. I just wanted MARC::File::XML to run a script. |
12:13 | Dyrcona | Looks like someone (me?) joins Masters of the Universe and repackages it, or we're back to using CPAN for more things. |
12:19 | rlefaive joined #evergreen | |
12:40 | khuckins joined #evergreen |
10:34 | berick | i don't know, it's almost elevensies |
10:40 | collum | https://www.youtube.com/watch?v=T8XeDvKqI4E |
10:46 | Dyrcona | https://en.wikipedia.org/wiki/Elevenses#United_States |
10:47 | Dyrcona | Anyway, has anyone noticed that the record summary in web staff client view of a MARC record doesn't honor the cat.default_classification_scheme? |
10:47 | Dyrcona | We're getting LoC regardless of the ou setting value. |
10:56 | Dyrcona | Hm... I think I see something, but need to look a bit more. |
10:58 | Dyrcona | Ok. Not what I thought. |
14:27 | Dyrcona | And, if you get that working, adding a branch on the above-mentioned bug would be most helpful for everyone else. |
14:37 | Dyrcona | Jaswinder: You typically need to create the object via Fieldmapper, get a pcrud or cstore transaction and then create the object. |
14:38 | frank_g | What I see is that the photo patron is only visible in web client when the Url contains https instead of http |
14:39 | Dyrcona | Jaswinder: Here's an example function that creates a skeletal MARC record, call number, and copy: http://git.evergreen-ils.org/?p=NCIPServer.git;a=blob;f=lib/NCIP/ILS/Evergreen.pm;h=3fc3ff19309b91c7a8f91acfd9ed6a1cdb268513;hb=master#l2103 |
14:40 | Dyrcona | It's in Perl and uses pcrud. |
14:40 | Dyrcona | frank_g: Mixed content warnings/errors? |
14:40 | miker | sandbergja: fwiw, it was intentional that some fields that are only used by db-layer code (stored functions, views) are not exposed in the IDL. but it doesn't hurt to expose them, per se |
16:37 | pinesol_green | Launchpad bug 1754455 in Evergreen "marc_export: want to delete fields/subfields" [Undecided,New] https://launchpad.net/bugs/1754455 - Assigned to Jason Stephenson (jstephenson) |
16:38 | dbwells | jaswinder: This may help you with anonymous pcrud (specific slide, and the whole presentation in general): http://git.evergreen-ils.org/?p=working/random.git;a=blob_plain;f=api_presentation/web_apis.html;hb=refs/heads/collab/berick/eg2015#(20) |
16:38 | Dyrcona | I was going to completely revamp dpearl's second implementation to make it more generally useful. |
16:39 | Dyrcona | I think we're going to end up with a mini-language for removing stuff from MARC records on export. |
16:40 | Dyrcona | dbwells++ berick++ |
16:40 | Dyrcona | Dyrcona-- # I should have remembered that... |
16:41 | Bmagic | Dyrcona: if we asked the vendor to leave it alone and we are provided the MARC back to us with the $0's intact, will the authority link survive the import? |
14:50 | Dyrcona | What's your overall goal? |
14:50 | dbwells | I think he is adding "ebooks.com" as a provider. |
14:51 | Jaswinder | Dyrcona: My goal is to add additional vendor that will be used for basic search, advanced search, checkouts, and etc. vendor API will return books or audio data and that will be displayed on the results. |
14:52 | Dyrcona | Jaswinder: The best way to get the results in search is to add MARC records supplied by the vendor, then you use the oneclick digital settings with Evergreen. |
14:52 | Dyrcona | If you want to use a search API for recorded books to add to Evergreen search results, a different approach would be in order. |
14:53 | Dyrcona | I think you'd want to add a module to search that would use the vendor's api and add the results to your evergreen search results. |
14:53 | Dyrcona | I'm not sure if anyone has ever done that. |
14:19 | rjackson_isl | probably just wait and see if it happnes again/regularly going forward - nothing spotted in logs |
14:25 | Bmagic | I think it's the ->as_usmarc that isn't giving me the utf character conversions |
14:26 | Bmagic | maybe I need to encode_utf8($marc->as_usmarc()) ? |
14:31 | Dyrcona | Bmagic: There's a way to do it, but I usuall don't have to. man MARC::Charset. |
14:32 | Bmagic | ah, maybe this is it $record->encoding( 'UTF-8' ); |
14:32 | Dyrcona | Bmagic: I don't think so. |
14:33 | Dyrcona | It isn't that simple. |
14:37 | Bmagic | I don't think these are weird records. I am just not handling them correctly. |
14:38 | Dyrcona | Bmagic: Are you ignoring me? |
14:38 | Bmagic | no? lol |
14:38 | Dyrcona | Bmagic: man MARC::Charset or use yaz-iconv |
14:39 | Dyrcona | Your best bet is probably converting the file, first, with yaz-iconv. |
14:40 | Bmagic | Dyrcona++ # I'll try some stuff |
14:41 | dbwells | Bmagic: If you can share your existing records and code, I don't mind helping with sanity checks. We've all been thrown for loops figuring out encoding issues at one time or another. |
14:41 | Bmagic | ty, if I get stuck using some of these other avenues, I will take you up on that! |
14:42 | Dyrcona | My favorites are the records that contain Windows-1252 with "smart" quotes. One of them looks like an end of record marker to MARC. |
14:42 | Bmagic | I remember you telling me about those! |
14:42 | Bmagic | yuk |
14:42 | dbwells | My all time favorite case was when a "zero-width space" somehow made its way into the record header somehow. Try to find that! |
10:47 | pinesol_green | Launchpad bug 1755502 in Evergreen "Alternate hold pickup popup displays when checking out item to hold patron" [High,New] https://launchpad.net/bugs/1755502 |
10:47 | csharp | is there an EG db function that strips out marc fields? for instance, I have a large group of records and I want to strip a 9XX field out indiscriminately |
10:48 | bshum | csharp: Kind of like https://wiki.evergreen-ils.org/doku.php?id=scratchpad:random_magic_spells#how_to_prune_a_tag_under_the_hood ? |
10:48 | Dyrcona | csharp: No there isn't, but we're working on adding that feature to marc export. |
10:49 | csharp | Dyrcona: oh cool |
10:49 | csharp | bshum: I'll look - I forget that page exists :-) |
10:50 | bshum | Yeah it's an oldie, but I just always remember using that SQL to strip unwanted tags from our bibs too |
15:28 | kmlussier | That doesn't sound familiar to me, but I think my newest master system is probably a couple of weeks old. |
15:30 | Dyrcona | I built the stock system last night, and the other this afternoon. |
15:30 | agoben joined #evergreen | |
15:31 | Dyrcona | I have only see it so far on the bills and messages. I just had a quick look at marc edit and some other interfaces and they look OK. |
15:36 | Dyrcona | I hesitate to Lp it because maybe I messed something up. |
15:36 | Dyrcona | But, I've done so many installations..... |
15:38 | Dyrcona | Heh, and we're getting internal tickets about how to add the "You saved $XXX" to the checkout receipts in the web client. :) |
15:19 | JBoyer-alt | I was under the impression that the woes were "I want to make this UTF-8" not "My UTF-8 is ügly" |
15:23 | Dyrcona | My impression is UTF-8 was wanted but ISO8859-1 was received. |
15:24 | miker | Dyrcona: well, who knows what shenanigans are going on between his clipboard and your screen ... but yes, I see a-with-circumflex in my irc client also |
15:24 | Dyrcona | Or, perhaps, MARC-8 was expected and ISO8859-1 was interpreted. |
15:24 | Dyrcona | And, yeah, hard to say. |
15:24 | miker | or, perhaps there are strings in various encodings in one record. that's my FAVORITE |
15:25 | Dyrcona | Plus, who knows, I've seem MARC records change character sets in different fields without warning. |
15:25 | Dyrcona | :) |
15:25 | Dyrcona | Smart Quotes are the BEST!!!! |
15:25 | miker | who DOESN'T catalog in Word(tm)? |
12:25 | JBoyer joined #evergreen | |
12:26 | khuckins_ joined #evergreen | |
12:58 | jwoodard joined #evergreen | |
13:08 | Dyrcona | @marc 050 |
13:08 | pinesol_green | Dyrcona: A classification or call number that is taken from Library of Congress Classification or LC Classification Additions and Changes. The brackets that customarily surround alternate class/call numbers are not carried in the MARC record; they may be generated based on the presence of repeated $a subfields. (Repeatable) [a,b,3,6,8] |
13:13 | berick | miker: fyi, put this together. checking now to make sure it doesn't break anything. |
13:13 | berick | http://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/berick/ws-gateway-connection-check |
16:21 | Dyrcona | rlefaive: Nothing that I've shared publicly, and most of what I have now are pretty specific to how C/W MARS manages URIs. |
16:22 | rlefaive | Dyrcona… ooh, nice… when you say URIs are you refering to matching 856s? |
16:22 | Dyrcona | Sort of. We usually match on 035 and 020. |
16:23 | Dyrcona | Then, I pull the marc out and mess with the 856s in Perl. |
16:24 | rlefaive | Dyrcona: i see. interesting. For some reason our match sets ignore the 035, but that seems like a good identifier to use. |
16:24 | Dyrcona | I get best results from building a tsvector of the ISBNs in the incoming record and doing a ts_search on the index_vector column of metabib.real_full_rec. |
16:25 | Dyrcona | Well, our code does two searches and adds up points based on the matches, choosing the first record with the highest score. |
12:38 | jeff | or what Dyrcona suggested, which will give you a JSON payload containing the biblio.record_entry object |
12:39 | jeff | (including a "marc" field with the marcxml of the record. |
12:39 | jeff | ) |
12:39 | Dyrcona | And the marc is in the marc field. |
12:39 | Dyrcona | :) |
12:39 | ejk | Thanks! I'll try the pcrud call. |
12:40 | Dyrcona | I think there are other ways, but pcrud came to mind first. |
12:44 | ejk | Thanks so much! Dyrcona++ jeff++ |
12:45 | Dyrcona | ejk: Is this written in Perl? |
12:45 | ejk | *cough* PHP *cough* |
12:45 | Dyrcona | OK. Never mind. I don't know any MARC frameworks in PHP. ;) |
12:45 | Dyrcona | If you want to switch to Perl, Python, or Java, though..... |
12:46 | ejk | jeff: I think I actually started this library based on your Opensrf PHP library from way back when; but it's been expanded quite a bit from there. |
12:47 | jeff | good. mine was incomplete garbage. ;-) |
12:49 | Dyrcona | Speaking of MARC.....We have a record that shows nothing in the View MARC window, but shows up OK if you click the display MARC link in the OPAC. |
12:50 | * Dyrcona | wonders what is wrong... To the Batlogs! |
12:50 | Dyrcona | Oh. Now it works.... |
12:58 | khuckins joined #evergreen | |
14:14 | ejk | Material Type code and Additional Authors were the two that I could only find in the MARC record |
14:14 | Dyrcona | It works fine on my Ubuntu 16.04 test vm, but I can't get it to work on training server with Debian 7. |
14:15 | jvwoolf left #evergreen | |
14:15 | Dyrcona | ejk: There are tables with mappings to get values from MARC, material type is one of them. |
14:17 | Dyrcona | You want to look at config.marc21_ff_pos_map. |
14:17 | Dyrcona | Here's an example in Perl of how it might be used: https://github.com/Dyrcona/evergreen_utilities/blob/master/perl/loaderecords.pl#L359 |
14:18 | Dyrcona | Added authors, you'll have to pull from the appropriate fields. |
14:18 | Dyrcona | @marc 700 |
14:18 | pinesol_green | Dyrcona: An added entry in which the entry element is a personal name. (Repeatable) [a,b,c,d,e,f,g,h,j,k,l,m,n,o,p,q,r,s,t,u,x,3,4,5,6,8] |
14:18 | Dyrcona | And so on... |
14:21 | ejk | Any way I can get that table through an OpenSRF call? |
14:21 | Dyrcona | I usually see that referred to as item type. :) |
14:22 | Dyrcona | ejk: What you could do is write some code to build a static look up table from the database and pop that into your code. |
14:23 | jeff | ejk: for some purposes, i extract that kind of thing by transforming the MARCXML to MODS -- the MODS XSLT from LoC is what many parts of the Evergreen ingest/index process use -- with some modifications in a few places. |
14:23 | Dyrcona | Those entries are based on the LoC MARC docs and almost never change. |
14:23 | Dyrcona | MODS would be handy for the authors, for instance. |
14:24 | jeff | ejk: some messy python is here that might give you a sense of how that works -- see the various XPATH bits in the indexes{} hash: https://github.com/tadl/marc-indexing-for-es |
14:24 | jeff | specifically, https://github.com/tadl/marc-indexing-for-es/blob/master/index.py#L70 |
15:51 | csharp | s/quick and handy/super damn slow in your case/ |
15:51 | Dyrcona | rsync with --bandwidth= is my best bet. |
15:52 | Dyrcona | Well, at one location the bandwidth is asymmetrical and that's the one that I'd be sending from, so I don't want to crush what others are doing. |
15:52 | Dyrcona | I've had that happen just sending marc records to the masslnc servers for testing. |
15:53 | Dyrcona | So, yeah, gymnastics are required. :) |
15:53 | Dyrcona | Hm... It happened on this testing server again with -j 4. |
15:54 | Dyrcona | I should see if it happens on the development server. |
10:43 | JBoyer | @marc w00t? |
10:43 | pinesol_green | JBoyer: unknown tag w00t? |
10:44 | JBoyer | Database is lacking. |
10:44 | Dyrcona | @marc w00 t |
10:44 | pinesol_green | Dyrcona: unknown field/subfield combination (w00/t) |
10:44 | Dyrcona | :) |
10:55 | csharp | @marc 1337 |
11:57 | LSachjen joined #evergreen | |
12:01 | Dyrcona | Nope. vmbuilder just don't work no more. |
12:03 | bwicksall | Has 2.12.5-3.0-beta1-upgrade-db.sql blown up for anyone with the following? https://pastebin.com/ysHaaBXb |
12:12 | Dyrcona | No, but it looks like you have bad MARC records. |
12:13 | Dyrcona | I haven't run the upgrade script. |
12:13 | * Dyrcona | thinks it is time to investigate lxd. |
12:23 | bwicksall | I'll run through the update step by step |
12:24 | khuckins joined #evergreen | |
12:29 | jeffdavis | bwicksall: At a glance it doesn't look like a bug in the upgrade script, but an issue with updating some record in your system which has bad character encoding (the "maintain_control_numbers" function usually runs when a biblio.record_entry record is updated). |
12:30 | jeffdavis | I'm trying to think of a good way to find the record(s). |
12:32 | Dyrcona | write something to iterate through all of the biblio.record_entry table entries, spit out the id, then call maintain_control_numbers or just update the record with update on same marc set to true. |
12:32 | Dyrcona | The last record id you see will be the first one that causes a problem. |
12:33 | Dyrcona | It would help to do them in id order with a way to specify a starting record id. |
12:33 | Dyrcona | repeat that until none blow up. |
12:34 | jeffdavis | That's basically a full reingest though right? |
12:34 | Dyrcona | Yeah, pretty much. |
12:34 | Dyrcona | But, that's how you'll find all the bad records. |
12:35 | Dyrcona | Or, I guess you could pull the marc out and try to make MARC::Record(s) out of 'em, but I've seen that succeed even on some bad records. |
12:36 | jeffdavis | I wonder if xml_is_well_formed(marc) would catch encoding issues? |
12:37 | Dyrcona | jeffdavis: Don't think so, since it runs every time a record is inserted or updated. |
12:38 | Dyrcona | Of course, if triggers were disabled during a batch load, then..... ;) |
10:26 | berick | mdriscoll++ |
10:26 | mdriscoll | Could the update in 1075 be run in parallel? I have 16 cores and 15 have nothing to do. |
10:28 | mdriscoll | s/1075/1057/g |
10:34 | Dyrcona | Seems to me it would make sense to modify the triggers so that maintain_901 and maintain_control_numbers are only called if the MARC is changed. |
10:39 | Dyrcona | mdriscoll: You could try doing it in parallel to see. I think it could be. |
10:52 | * kmlussier | agrees with berick, et al. on disabling the trigger for the upgrade. |
10:56 | miker | there are no triggers that need to fire during that particular script. I'm still in favor of wrapping the relevant guts of of 1057 in set session_replication_role replica/origin ... pg naming decisions aside, all that does is say "don't fire triggers / do fire triggers" |
16:05 | roycroft | does evergreen have a way of doing those imports? |
16:05 | gmcharlt | yeah, that's a reasonable plan |
16:05 | roycroft | i.e. can i uploade the csv to evergreen and it will go suck down the details? |
16:06 | Dyrcona | roycroft: You'll need to convert the CSV to MARC. |
16:06 | roycroft | there's a library system that covers most of eastern oregon (i'm in western oregon) who use evergreen, and folks seem to think it's a really nice package, which is what got me looking at it in the first place |
16:06 | roycroft | so csv to incomplete marc, and then evergreen can get the full marc records? |
16:06 | Dyrcona | Also, if you want to test on stretch use the branches on the bug I referenced earlier. |
13:20 | Dyrcona | Yeah. We could be approaching an event horizon. |
13:21 | * Dyrcona | greps the code for examples of xlst_process used int he database, specifically for mods transformation. |
13:24 | Dyrcona | Easy enough... :) |
14:00 | Dyrcona | @marc 035 |
14:00 | pinesol_green | Dyrcona: A control number of a system other than the one whose control number is contained in field 001 (Control Number), field 010 (Library of Congress Control Number) or field 016 (National Bibliographic Agency Control Number). (Repeatable) [a,z,6,8] |
14:00 | * Dyrcona | ponders what to call it in the spreadsheet. |
14:03 | dbs | That needs a 'q' as well. Dang old data. |
14:55 | Dyrcona | lasse_: Evergreen uses MARC21 with a heavy emphasis on Library of Congress standards, but it is used successfully in other nations, such Czech Republic. |
14:55 | csharp | berick: works great! I'll create a signoff branch so it can get into the 3.0 mix :-) |
14:55 | berick | csharp: cool |
14:56 | Dyrcona | lasse_: You might want to look at Koha, too. IIRC, they have support for different MARC formats. |
14:56 | lasse_ | Dyrcona: thanks - I'm already looking :) |
14:57 | rhamby | I once saw a presentation about a map library in Denmark using a MARC variant called danMARK but I don't know how widely it's used there |
14:57 | lasse_ | rhamby: that would be a little on the nose methinks :) |
11:14 | Dyrcona | Bmagic: If you want one, it's probably the other. ;) |
11:14 | Bmagic | I tried both |
11:15 | Dyrcona | Is there a date in the 240? |
11:15 | Dyrcona | @marc 240 |
11:15 | pinesol_green | Dyrcona: The uniform title for an item when the bibliographic description is entered under a main entry field that contains a personal (field 100), corporate (110), or meeting (111) name. [a,d,f,g,h,k,l,m,n,o,p,r,s,6,8] |
11:15 | Dyrcona | No.... I'm thinking of the 260.... |
11:15 | Dyrcona | :) |