IRC log for #evergreen, 2014-02-19

All times shown according to the server's local time.

Time	Nick	Message
01:48		davidcandlestick joined #evergreen
03:08		v101 joined #evergreen
03:08		v101 left #evergreen
03:11		mrpeters joined #evergreen
07:04		timlaptop joined #evergreen
07:55		akilsdonk joined #evergreen
08:14		collum joined #evergreen
08:21		kbeswick joined #evergreen
08:38		Shae joined #evergreen
08:56		berick joined #evergreen
08:59		mmorgan joined #evergreen
09:02		Dyrcona joined #evergreen
09:06		rfrasur joined #evergreen
09:09	rfrasur	rjackson-isl, is there something up with reports?
09:11	rjackson-isl	reports are not back - it was decided to wait another day due to the response issues
09:11	rfrasur	Okay, gotcha. Then I'll stop querying the db ;)
09:40		yboston joined #evergreen
09:57	jl-	I'm having some difficulties here,following this import example: http://docs.evergreen-ils.org/dev/_migrating_your_bibliographic_records.html -- 1) my marc records already are marcxml but not one record per line
09:57	jl-	and I can't use bash or anything to parse/format xml
09:58	jl-	what tool can I use to make it line format for a staging table?
09:59	paxed	there is that piece of python code there...
10:00	jeff	...which could probably be easily modified to consume marcxml rather than marc
10:00	Dyrcona	jl-: There is actually a Perl routine in the Evergreen utility code that you can use: OpenILS::Utils::Normalize::clean_marc.
10:00	jeff	hrm. pymarc has MARCReader and JSONReader, but not a marcxml variant.
10:01	Dyrcona	clean_marc is used by most of Evergreen to clean marcxml before insert into the database.
10:03	Dyrcona	This little bit of Perl will make the marcxml be all on one line: $xml =~ s/\n//sog;
10:03	paxed	or perhaps something (untested) like sed -i -e 's/\n//g' records.xml && sed -i -e 's#</record><#</record>\n<#g' records.xml
10:03	Dyrcona	TIMTOWTDI!
10:03	jl-	jeff correct
10:04		dluch joined #evergreen
10:04	Dyrcona	blarg. python. :)
10:04	jl-	paxed I don't think bash should be used to parse xml?
10:04	jl-	Dyrcona please explain that routine
10:04	* Dyrcona	doesn't think xml should be used, but that's another debate. :)
10:04	dbs	yaz-marcdump -i marcxml -o marc filename.xml > filename.mrc :)
10:05	* dbs	suspects Dyrcona's regex would mash all of the records onto 1 line, instead of one per line
10:05	* Dyrcona	assumes you're processing 1 record at a time at the point you use the regex.
10:05	dbs	TIMTOWTDI and unfortunately It All Depends
10:06	dbs	Dyrcona: oh, but jl- doesn't have anything parsing the marcxml qua marcxml records yet. just a marcxml file
10:06	Dyrcona	jl-: The code is in Open-IlS/src/perlmod/lib/OpenILS/Utils/Normalize.pm if you want to look at it.
10:07	dbs	ergo easiest method might be to roll back to MARC binary via yaz-marcdump and just use that little bit of python
10:07	jl-	hmm
10:07	jeff	a well reasoned and tested "what to do if your data is already in marcxml format, but not one-record-per-line" (and that might be use yaz-marcdump to go marcxml to MARC) would probably be a useful addition to http://docs.evergreen-ils.org/dev/_migrating_your_bibliographic_records.html
10:07	Dyrcona	No. Write a little something with MARC::Batch.
10:08	jl-	dbs so roll back to .mrc
10:08	yboston	jeff++
10:08	jeff	but updating the documentation doesn't help jl-'s immediate question :-)
10:08	yboston	jeff: I am about to deal with that scenario, "what to do if your data is already in marcxml format, but not one-record-per-line", any tips?
10:09	jeff	none better than those already suggested here by Dyrcona and dbs and paxed
10:10	jl-	rolling back
10:11	jeff	paxed: your suggestion of sed is probably fast and could work for most scenarios, but i fear it may summon a hybrid marcthulu/xmlthulu, similar to http://stackoverflow.com/a/1732454/157515
10:12	dbs	and there's always the possibility of xml namespaces ruining the generic sed fun
10:12	jl-	I noticed that the .mrc file is about 1/3 the file of the xml
10:12	dbs	jl-: sounds about right
10:12	jl-	size
10:13	jl-	now running the python script to parse xml
10:13	paxed	jeff: possibly. perhaps i should've put a slightly stronger disclaimer in my suggestion... "may summon any of the Elder Gods"
10:14	jl-	"may make you cry in the bathroom and reconsider your career"
10:14	dbs	"What have I gotten myself into?"
10:15		Wyuli joined #evergreen
10:15	dbs	"What have libraries done to themselves?"
10:15	jl-	^
10:19		ericar joined #evergreen
10:19	jeff	every phone call for a patron password reset that i handle is another reminder to get moving on bug 1264746
10:19	pinesol_green	jeff: Error: Could not gather data from Launchpad for bug #1264746 (https://launchpad.net/bugs/1264746). The error has been logged
10:20	jeff	huh. well, that's ``Add "email password reset" button to user editor''
10:25	berick	if anyone wants to help me quickly fix this : http://testing.evergreen-ils.org/~live/test.20.html
10:25	berick	see working/user/berick/pgtap-db-debs-check-sort-repair
10:27	jeff	but after that is "phone rings, web browser brings up list of matching patrons in web staff client, options for usual things including triggering a suggested username and triggering an e-mailed password reset" :-)
10:27	* jeff	looks at berick's url
10:28	jl-	ergreen=# SELECT staging_importer();
10:28	jl-	ERROR: could not extend file "base/16385/371863.7": wrote only 4096 of 8192 bytes at block 969998
10:28	jl-	dbs
10:28	jeff	jl-: you may have run out of disk space.
10:28	paxed	out of free space
10:29	jl-	ergreen=# SELECT staging_importer();
10:29	jl-	ERROR: could not extend file "base/16385/371863.7": wrote only 4096 of 8192 bytes at block 969998
10:29	jl-	HINT: Check free disk space.
10:29	jl-	CONTEXT: SQL statement "UPDATE staging_records_import SET dest = currval('biblio.record_entry_id_seq')"
10:29	jl-	PL/pgSQL function "staging_importer" line 6 at SQL statement
10:29	jl-	yes that's what it said
10:29	jl-	sorry for the spam
10:30	jeff	berick: quick fix looks good (i have not run it through pgtap). any opinion on whether evergreen.upgrade_list_applied_deprecates should be updated to return sorted values itself?
10:32	berick	jeff: thought about that, but couldn't think of a case where it mattered outside of the test
10:33	eeevil	jeff / berick: and order may be important (but, of course, that's available from config.upgrade_log, I suppose)
10:40		_bott_ joined #evergreen
10:43	berick	eeevil: as in, the order of entries in supersedes[] or deprecates[] may be important, so evergreen.upgrade_list_applied_deprecates (etc) should not apply asciibetical sorting of patches it returns?
10:45	eeevil	berick: it's not important today, but may be in the future. but yes, that's what I meant
10:45	berick	right
10:45	berick	thanks
10:45	berick	ok, so i stand by my quick fix, then
10:46	jeff	have a launchpad bug yet?
10:46	jeff	or would this just be on the original bug?
10:46	berick	no, seemed like a waste of pixels
10:46	berick	original bug
10:47	berick	https://bugs.launchpad.net/evergreen/+bug/1262722
10:47	pinesol_green	Launchpad bug 1262722 in Evergreen "DB patch supersede / deprecate func repairs; unit test" (affected: 1, heat: 6) [Undecided,Fix committed]
10:47	* jeff	waits for new two-factor token
10:49	senator	paxed: congratulations to your national hockey team (if you're into that sort of thing)
10:49	paxed	senator: i'm not really into hockey, but it's always nice to beat russia... :P
10:49	mrpeters	osrfsys.10.log:2014-02-19 10:39:08 brick01-head osrf_http_translator: [ERR :2223:CStoreEditor.pm:114:1392803222222329] editor[0\|0] request error open-ils.cstore.json_query.atomic : {"from":["metabib.suggest_browse_entries","\"","title","StartSel=\"<span class='oils_AS'>\", StopSel=</span>","-1","10","14"]} : Exception: OpenSRF::DomainObject::oilsMethodException 2014-02-19T10:39:08 OpenILS::Utils::CStoreEditor /usr/local/share/pe
10:49	mrpeters	osrfsys.10.log:2014-02-19 10:39:08 brick01-head osrf_http_translator: [ERR :2223:EX.pm:66:1392803222222329] Exception: OpenSRF::DomainObject::oilsMethodException 2014-02-19T10:39:08 OpenILS::Utils::CStoreEditor /usr/local/share/perl/5.14.2/OpenILS/Utils/CStoreEditor.pm:453 <500> Severe query error -- see error log for more details
10:49	mrpeters	osrfwarn.log:2014-02-19 10:39:08 brick01-head open-ils.cstore: [ERR :2228:oils_sql.c:5585:1392803222222329] open-ils.cstore: Error with query [SELECT * FROM metabib.suggest_browse_entries( '"', 'title', 'StartSel="<span class=''oils_AS''>", StopSel=</span>', '-1', '10', '14' ) AS "metabib.suggest_browse_entries" ;]: 0 ERROR: syntax error in tsquery: ":*"#012CONTEXT: PL/pgSQL function "suggest_browse_entries" line 12 at assignm
10:49	mrpeters	those ring any bells for anyone? newly upgraded 2.5.2 system...
10:49	senator	paxed: :-)
10:50	mrpeters	basically just stacking the system with queries from that and blocking normal activity from functioning
10:50	* jeff	hands mrpeters a paste.evergreen-ils.org or gist.github.com or something
10:50	mrpeters	argh yeah sorry, that was longer than i thought
10:51	jeff	mrpeters: "severe query error" is a good clue to consult your postgresql logs to see what the error was from postgresql's POV.
10:51	senator	mrpeters: that's an autosuggest query for a quotation mark
10:51	senator	short term, you may consider disabling autosuggest until you figure why there are so many of those or why they take a long time/error out/whatever
10:51	berick	@isitdown docs.evergreen-ils.org
10:51	pinesol_green	berick: http://cat.evergreen-ils.org.meowbify.com/
10:51	mrpeters	senator: aha, interesting
10:53	jeff	of course, if the "severe query error" is that you killed the query because it was long running, your postgresql logs may not be helpful. i don't know for certain that severe query error is shown in the opensrf logs when you kill a query from pg_cancel_backend or similar
10:54	mrpeters	jeff: yeah, apparently they have some process that runs and kills things after 3 minutes
10:54	mrpeters	i want to knock that off for now
10:55	jeff	i'd probably go with senator's suggestion of disabling autosuggest as a stop-gap.
10:55	mrpeters	yeah, i have --- restarting bricks now
10:58	* dbs	suspects that killing pg backends leads to a cascade of other errors in cstore/storage thereafter, based on recent experience
10:59	berick	and anything above the cstore level (middle-layer, client) would just see a request timeout, since they usually give up after a minute or so
11:00	jl-	I'm assuming I can resolve the space issue by expanding the rootfs ?
11:01	mrpeters	dbs: yeah? I'd definetly like to get that little "sniper" stopped
11:03	paxed	jl-: or there's a way to tell postgres to save the database files on somewhere else. (but, iirc, that has to be done when you create the db)
11:04	jl-	paxed that's why I was wondering if it will even help postgres if I clean up rootfs spce
11:05	jl-	or if it has allocated space set somewhere
11:05	eeevil	mrpeters: I'd suggest looking at the schema, make sure all the indexes supporting autosuggest (which also support browse) are in place. you can configure autosuggest to not kick in until you have more than a certain number of characters from the user, IIRC
11:05	mrpeters	eeevil: thanks, great info. i hope autosuggest really is the culprit!
11:05	paxed	jl-: it just uses as much space as you've got available.
11:07	eeevil	mrpeters: and, short of disabling autosuggest, you can just disable visibility testing for autosuggest. I'd point you to the docs that, but the community docs sites seems broken at the moment...
11:08	mrpeters	no i know how, thanks mike
11:10	mrpeters	keyword searches are just bunk....title/author/etc. come back snappy....any particular index that might be corrupt/missing/etc.
11:10		jboyer-laptaupe joined #evergreen
11:16	dbs	mrpeters: maybe your database just tipped over some point that triggers a different query optimizer path? That occasionally happens :/
11:18	mrpeters	hmm
11:19	dbs	Unfortunately hard to diagnose without EXPLAINs from prior to that tipping point
11:19	mrpeters	sure
11:19	mrpeters	im wondering if the autosuggest stuff, and the process sniping deal WAS the tipping point
11:24		mcooper joined #evergreen
11:27		mmorgan left #evergreen
11:28	jeff	berick: tested your fix, pushed to master, commented in bug 1262722
11:28	pinesol_green	Launchpad bug 1262722 in Evergreen "DB patch supersede / deprecate func repairs; unit test" (affected: 1, heat: 6) [Undecided,Fix committed] https://launchpad.net/bugs/1262722
11:28	jeff	berick++
11:28	jeff	tests++
11:29	berick	jeff++
11:30	pinesol_green	[evergreen\|Bill Erickson] LP#1262722 Fix DB supersedes/deprecates pgtap test - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=6c16a44>
11:44	phasefx	if we can get this one too, then we'll have a clean QA test run: https://bugs.launchpad.net/evergreen/+bug/1279420
11:44	pinesol_green	Launchpad bug 1279420 in Evergreen "need Test::Output prerequisite" (affected: 1, heat: 6) [Medium,Triaged]
11:46		jwoodard joined #evergreen
11:46	jeff	heh. i was just testing the waters in preparation for arguing for that. :-)
11:46	jeff	phasefx: that system currently builds once nightly, around midnight?
11:46	phasefx	correct, though I can fire it off practically at will over here
11:48	phasefx	in #openils-evergreen, pinesol_green is following an RSS feed for it
11:48	* dbs	fears what adding a thin plperl wrapper around xml::libxslt is going to do to ingest times
11:48	dbs	all that I know is that solr is RAD
11:50	jl-	wow I ran out of space again
11:50	jl-	200,000 records, -- 15 GB weren't enough
11:50	jl-	any idea how much is neccessary
11:52	jeff	"how much can you give it?"
11:52	dbs	seems like it would be a good research area for planning docs - rough memory / storage requirements
11:53	* jeff	nods
11:53	dbs	for various levels of records, etc
11:53	mrpeters	jl-: yeah, that sounds low for that many records
11:53	mrpeters	i have a customer about that size, let me tell you what their db size is
11:53	jeff	jl-: for your immediate purposes, if you can give it 30-60 GB, try that. :-)
11:54	mrpeters	jl-: is that items or bibs?
11:56	mrpeters	i have a customer with 121,070 bibs and 147,094 items and their DB hovers around 8GB
11:56	mrpeters	just to give you a little perspective
11:57	jl-	mrpeters bibs
12:02	mrpeters	how many patrons do you plan to load?
12:02	mrpeters	is 80GB feasible for you? i'm thinking about circulation, auditor, etc. buildup over time as well
12:02	jl-	possibly
12:03	jl-	testing on a VM right now
12:03	jl-	gonna try 60
12:03	mrpeters	ah, i see
12:03	mrpeters	just don't run your production DB in a VM, right eeevil ;)
12:03	mrpeters	i think we've all seen the pain that can cause
12:04	jl-	everything is virtualized these days
12:05	mrpeters	i hear you, but we tried running a BIG production database in vmware and it just didn't perform well
12:05	mrpeters	but, maybe for a smaller size consortium or library it would perform better, honestly not sure
12:05	mrpeters	things just got WAY better for us when we went bare metal
12:05	mrpeters	(still virtualizing everything else)
12:06	jl-	all our library services are virtualized
12:06	jl-	so we'll have to go with that
12:06	jl-	;)
12:09	mrpeters	soooo, back to my keyword searching issues --- i'm the only one with access to the system --- keyword searches running over 7 minutes --- http://pastie.org/8749161
12:09	mrpeters	ill ask again, is there a particular index that may need rebuilding?
12:10	mrpeters	title/author/subject/etc. work to perfection
12:11	dbs	mrpeters: 7 minutes for NASCAR? That seems extreme, unless "NASCAR" or a stemmed version thereof is in half of your records. What does EXPLAIN say for that query?
12:12	mrpeters	dbs: is that the full query there, or is that truncated?
12:13	eeevil	dbs: re plperl wrapper, we gain the benefit of being able to cache the xslt object ... that will likely pay for any perl parsing overhead, I'd bet
12:13	dbs	appears to be truncated
12:13	mrpeters	i think so too
12:13	mrpeters	let me find it in the pg logs
12:13	mrpeters	appreciate the input dan
12:14	eeevil	mrpeters: that's not a complex query. it suggests either tuning or schema deficiencies ... I'm guessing this is the same instance as the autosuggest one above?
12:14	dbs	mrpeters: as far as "tipping points" go, that's usually associated with a growth in the amount of data that pushes postgresql to suddenly choose a different plan (from reading pgsql-performance on and off over the years)
12:15	mrpeters	eeevil: yes, it is
12:15	mrpeters	no change in amount of data, just the upgrade from 2.2.2 to 2.5.2
12:15	dbs	So as eeevil says, could be tuning (various cost estimates) or table / index bloat or (maybe?) corrupted / missing index
12:16	mrpeters	thats what im fearing -- a missing index
12:16	eeevil	mrpeters: heh ... that changes a LOT of data :)
12:16	eeevil	both size and shape
12:16	dbs	ah, I didn't realize the 2.2.2 -> 2.5.2 was in the mix
12:16	mrpeters	yeah, it is
12:17	mrpeters	eeevil: true....those ingests are pretty massive
12:17	mrpeters	i meant as far as new libraries, etc.
12:17	mrpeters	as far as tuning, it's using the same hardware as before, and the same pgtuned values that have been in place for many years on this cluster
12:18	dbs	seems very likely to be upgrade schema related. possibly reingest bloat?
12:19	mrpeters	we did run a VACUUM FULL ANALYZE VERBOSE after the reingest
12:25	dbs	mrpeters: if you can get an EXPLAIN ANALYZE of a keyword query, that's really going to help
12:25		sunnysnowcat joined #evergreen
12:26	mrpeters	yep, im on it
12:27	mrpeters	thought this was the query, but it's not seeming to work -- http://pastie.org/8749204
12:31	dbs	needs more $core_query_25686$ methinks
12:31	mrpeters	okk maybe i cut it off somewhere
12:31	mrpeters	its a big honkin select :P
12:32	eeevil	mrpeters: syslog mixes up incoming messages. that's what the [XXXX\|YY] block is for before the incoming message itself, on each line. the XXXX part is the syslog message id, and the YYY part orders the lines of the message
12:34	eeevil	find the XXXX for the line containing "-- bib search", then get all the lines for that XXXX, then stitch them together
12:34	eeevil	you might find that the query you thought you were tracing was not the one that had the long duration...
12:35	mrpeters	could be, im kind of curious to run it by hand now and see how it responds (and get you an explain)
12:44	mrpeters	ok, query is churning, then ill get you the explain
12:46	mrpeters	dbs: damn, sorry!
12:46	mrpeters	QUERY PLAN
12:46	mrpeters	------------------------------------------------------------------------------------------------------------------------------
12:46	mrpeters	Function Scan on query_parser_fts (cost=0.25..10.25 rows=1000 width=64) (actual time=34352.543..34352.622 rows=949 loops=1)
12:46	mrpeters	Total runtime: 34352.730 ms
12:46	mrpeters	(2 rows)
12:47	mrpeters	34 sconds is pretty damn reasonable
12:47	mrpeters	but that seems to go up and up and up as load increases
12:48	mrpeters	but, i did find out the indexes weren't primed into RAM (actually was able to fit the entire DB in ram this time) so maybe that's helped things
12:52		jihpringle joined #evergreen
12:55		kbeswick joined #evergreen
12:58	mrpeters	and they've gone to crap again
12:58	mrpeters	i can't even run that explain again now that the system had load again
12:59		stevenyvr joined #evergreen
12:59	mrpeters	is there a particular table/index we should focus in on and run a vacuum on for keyword searches?
13:03	dbs	metabib.keyword_field_entry would be the key table to focus on
13:04	* dbs	slipped away for lunch
13:05	mrpeters	no worries dan
13:05	dbs	reingests going on? other things that might be trashing cache?
13:05	mrpeters	no ingests
13:05	mrpeters	my explain analyze verbose has been running for about 7 minutes now
13:06	mrpeters	just getting no where.....before we let people back in it was returning within 30 seconds
13:06	mrpeters	i assume what i gave above isnt enough info (i probably needed verbose?)
13:07	dbs	IIRC you want everything from within the query_parser_fts() function; the function itself masks all of the real work going on
13:08	mrpeters	http://pastie.org/8749326 is what im trying to get you
13:12	dbs	mrpeters: try http://pastie.org/8749337
13:12	eeevil	mrpeters: explain just the core query
13:12	mrpeters	can do
13:12	eeevil	(what dbs said)
13:12	* mrpeters	realizes how little he knows about QP :(
13:13	mrpeters	gonna be a few minutes on this...waiting until 1:20 to take the system down again from the public
13:14	mrpeters	it won't budge right now
13:15	dbs	:(
13:16	mrpeters	thank you for baring with me
13:16	mrpeters	i sincerely appreciate the help....ive not run into anything like this before
13:17	mrpeters	can we stop sip/reports/utility
13:17	mrpeters	no reason for reports to be running
13:17	mrpeters	utility should be fine too
13:19	dbs	mrpeters: we had great pain last week or two due to very broad queries like "A" and "The" appearing to result in lock escalation
13:19	mrpeters	interesting
13:19	dbs	Pretty much a denial of service attack. Amusingly, due to our own IT dept running a 3rd party link checker service.
13:19	dbs	But could affect anyone :(
13:20	mrpeters	sounds like a great adventure haha
13:24	mrpeters	dbs: http://pastie.org/8749393 -- ah, finally
13:25	* dbs	has eyes pop at "Seq Scan on keyword_field_entry fe" - twice
13:25	mrpeters	drop some knowledge on me dbs!
13:25	dbs	That's visiting every row in metabib.keyword_field_entry
13:25	dbs	It's ignoring the index (if there is one)
13:25	mrpeters	interesting......
13:26	* dbs	would expect to see "Index Scan using metabib_keyword_field_entry_index_vector_idx on keyword_field_entry fe"
13:27	mrpeters	http://pastie.org/8749400
13:27	dbs	eeevil: you concur?
13:27	mrpeters	is that missing an index then?
13:27	dbs	it looks a lot like it
13:28	dbs	what does \d metabib.keyword_field_entry say?
13:28	mrpeters	thats in the paste, dbs
13:28	mrpeters	just above
13:29	dbs	yeah
13:29	dbs	you're missing metabib_keyword_field_entry_index_vector_idx
13:29	mrpeters	ok
13:29	dbs	CREATE INDEX metabib_identifier_field_entry_index_vector_idx ON metabib.identifier_field_entry USING GIST (index_vector);
13:30	mrpeters	dbs++
13:30	mrpeters	eeevil++
13:30	mrpeters	hrm
13:30	dbs	realllllly weird that's missing
13:30	mrpeters	evergreen=# CREATE INDEX metabib_identifier_field_entry_index_vector_idx ON metabib.identifier_field_entry USING GIST (index_vector);
13:30	mrpeters	ERROR: relation "metabib_identifier_field_entry_index_vector_idx" already exists
13:30	jeffdavis	s/identifier/keyword/g in dbs' command
13:30	dbs	jeffdavis++ # good catch, sorry!
13:30	mrpeters	ah, thanks jeffdavis
13:30	dbs	both uses of identifier there
13:30	* eeevil	reads up
13:31	dbs	CREATE INDEX metabib_keyword_field_entry_index_vector_idx ON metabib.keyword_field_entry USING GIST (index_vector);
13:31	eeevil	well, around 2.4/2.5, the indexes are dropped and recreated (IIRC) to speed up the upgrade
13:31	dbs	-- to be sure :)
13:31	eeevil	the recreate seems to have failed... ?
13:31	dbs	sounds like
13:31	eeevil	you might want to switch to GIN indexes while you're at it
13:32	eeevil	you can do that with CONCURRENTLY and drop the GIST ones after hours
13:32	mrpeters	you think? csharp mentioned that to me too
13:32	* dbs	thought about mentioning that but also thought about just keeping things as close to stock as possible until the firestorm dies
13:32	mrpeters	it was about 4AM and i wasn't up for tinkering
13:32		jihpringle joined #evergreen
13:32	mrpeters	^^^ what dbs said :)
13:33	mrpeters	i know csharp has been happy with GIN indexes
13:34	jeffdavis	we've been using GIN for that index too quite happily, I believe it requires more disk space but is faster
13:34	mrpeters	just swap GIST for GIN in that query?
13:35	jeffdavis	I believe so
13:35	dbs	yep
13:35	dbs	we've been using GIN too, except when upgrades switch them back to GIST
13:35	mrpeters	good to know, awesome. is there a list anywhere of which indexes are best to convert?
13:36	mrpeters	2.4.3 > 2.5 upgrade script maybe? where it drops/recreates them?
13:57		kbeswick_ joined #evergreen
14:12	Wyuli	Is there a button I can press that ships beer & pizza to devs? Can we add that in the Evergreen interface?
14:13	bshum	We should add that to the "about" page link :)
14:14	Wyuli	I'm getting stressed just watching the chatter roll through here...
14:15	Wyuli	I wouldn't be opposed to shipping refreshments out if I could. :)
14:17	jboyer-laptaupe	dbs++
14:17	jboyer-laptaupe	eeevil++
14:17	jboyer-laptaupe	I suspect we've been missing that index for longer than we know.
14:17	bshum	It's alive! alive!
14:17	bshum	:)
14:23	eeevil	ha! "laptaupe" ... nice :)
14:30	rjackson-isl	system is currently smoking! - seconded on dbs++ and eeevil++
14:31	bshum	Calling 0861
14:31		mllewellyn joined #evergreen
14:32	mrpeters	yeah, much love for the help guys
14:32	mrpeters	fast as can be!
14:37		stevenyvr joined #evergreen
14:37		smyers_ joined #evergreen
14:39	pinesol_green	[evergreen\|Jason Stephenson] LP1223903 - Add from_bare_hash to Fieldmapper.pm. - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=b9edf30>
14:39	pinesol_green	[evergreen\|Jason Stephenson] LP1223903 - Add indexes to authority.record_entry. - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=1108b8b>
14:39	pinesol_green	[evergreen\|Jason Stephenson] LP1223903 - Rewrite marc_export.in in support-scripts. - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=2c28f9b>
14:39	pinesol_green	[evergreen\|Jason Stephenson] LP1223903 - Add release notes for rewrite of marc_export. - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=e495dbb>
14:39	pinesol_green	[evergreen\|Ben Shum] LP1223903 - Stamping upgrade script for new authority_record_entry indexes - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=c0a0fc6>
14:41	pinesol_green	[evergreen\|Ben Shum] LP1223903 - Also bump the schema config - <http://git.evergreen-ils.org/?p=Evergreen.git;a=commit;h=c0c1061>
14:57		smyers_ joined #evergreen
14:59	eeevil	Dyrcona: since you've been testing http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dyrcona/lp1234845_ranked_volumes would you mind taking a look at the SQL version in http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/miker/lp1234845_ranked_volumes ?
14:59	* eeevil	updates the LP bug too
14:59	Dyrcona	No problem. I'll load it in a few.
15:00	Dyrcona	It has mostly been kmlussier looking at it and she's on vacation at the moment.
15:00	eeevil	Dyrcona: might need to re-drop the intermediate one by hand on your test server, just a head's up
15:00	eeevil	thanks!
15:00	Dyrcona	ok
15:01	rangi	Dyrcona: when you get a chance (non urgent) you might like to do a pull of the ncip stuff too
15:01	Dyrcona	rangi: Cool.
15:12	ktomita	For browse search, does anyone know if the sample data has any "see" references? I have seen "See Also" but no "See".
15:12	bshum	I don't remember off hand what's the difference...
15:13		mmorgan joined #evergreen
15:13	ktomita	bshum: "See" is 4XX for things like alternate names, "See Also" is 5XX for related things.
15:14	ktomita	bshum: that is my understanding
15:15	bshum	What's a record in the sample data with "see also"? Just curious.
15:15	ktomita	bshum: I did a subject browse for 'organic foods'
15:15	* bshum	wonders if this is some sort of authorities thing that he's not using yet
15:16	phasefx	in my concerto load, I see authority entries with 400 and 410 tags, but no 5XX
15:16	ktomita	bshum: yes, this is authorities related.
15:16		kbeswick joined #evergreen
15:16	bshum	ktomita: Well, there isn't any organic foods in the concerto dataset so I'm assuming you must be using something else then.
15:17	ktomita	bshum: yeah sorry, I had imported some records to try and see it. I haven't been able to get that to work either
15:21	ktomita	I am going to go back and do some more research and ask again later.
15:23	Dyrcona	rangi; Good stuff. I need to do some research on what's been pushed where for the moment.
15:23	Dyrcona	eeevil: I loaded the new code. I did have to remove the intermediate one by hand.
15:24		akilsdonk joined #evergreen
15:24	* Dyrcona	wonder if MNPALS is doing anything on the NCIP front. I've not seen anything of theirs go into the working repo in some time.
15:30	eeevil	ktomita: IIRC, from writing that with senator, in order to see "see" references you need to link your authority records together. see the 2.5 release notes, sections 2.2.3, 2,6.2 and 2.6.3 for more information
15:30	ktomita	eeevil: thanks, I will look into those areas
15:32	Dyrcona	authority_authority_linker.pl... If memory serves.
15:32	eeevil	Dyrcona: aye (2.2.3 in the release notes)
15:33	eeevil	though, as with authority_control_fields.pl, it is for priming, not perfect cataloging (because, as we know, MIGO (marc in garbage out))
15:33	ktomita	I did run that script.
15:33	eeevil	ktomita: if you don't have authority records that would link to one another, it won't do anything for you
15:34	ktomita	eeevil: that brings me back to my original question about it that concerto data does have records that will link with "see" and "see also".
15:34	eeevil	I doubt it
15:35	ktomita	I have tried to setup my own records that would have "see" and "see also" linking but have only been able to see "see also"
15:35	ktomita	I was hoping to get an example of "see" linking. thanks for the responses. more research for me.
15:46	ktomita	Do you know how to setup the authority marc record so that it would have a "see" link?
15:53	dbs	SELECT * FROM authority.full_rec WHERE tag LIKE '5%'; says there are no auth records in the --load-all sample set with 5xx fields
15:55		gsams joined #evergreen
15:55	graced	ktomita: I'm curious - are you implementing the authorities work that has been done in the community or writing your own? I heard a rumor at a library conference that you were writing one from scratch...
15:58	ktomita	graced: I am working with the community code
15:58	graced	ktomita: cool, thanks for dispelling my rumor :-)
16:01	ktomita	graced: have you dealt with see and see also much?
16:09		Wyuli joined #evergreen
16:10	jl-	the staging table is still importing (been about 4 hours).. would it be wise to sneak in and see how many rows there are in bibs_records
16:11	Dyrcona	jl-: Probably not.
16:11	Dyrcona	Depends on locking.
16:11	jl-	k
16:12	jl-	the initial available 70 GB have shrunken to 25 GB
16:12	jl-	./s/initially/initial
16:12	Dyrcona	yep.
16:13	jl-	I'll start worrying once I'm in the single digits
16:14	Dyrcona	I don't know how much space you need to load via a staging table since I've never done it.
16:18	jl-	ils curveballs
16:18	jl-	wait, I don't even like baseball
16:18	jl-	let's call it a sciscor tackle
16:19	jl-	http://i.minus.com/iL5juqoZUPgkW.gif
16:20	jeffdavis	jl-: how many MARC records are you importing?
16:22	jl-	jeff 230,000
16:23	jl-	jeffdavis
16:26	jeffdavis	hm
16:27		jboyer_laptaupe joined #evergreen
16:27	jeffdavis	I've got a test dataset with about 278,000 bib records, total size of the database after ingest is 10 GB
16:27		smyers__ joined #evergreen
16:27	jl-	something's fishy then
16:27	jl-	well
16:27	jl-	after ingest
16:27	jl-	but during?
16:28	jeffdavis	not sure
16:28	jl-	btw these are not 'test datasets'
16:28	jeffdavis	I tried the staging_importer method that was discussed here recently and it didn't work for me, I ended up killing it after about 4 days and using a slightly different approach
16:28	jl-	this is real data
16:28	jl-	from a real university
16:28	jl-	what worked for you?
16:29	jeffdavis	getting a list of all the id's from the staging table and converting it into an SQL script with 278,000 individual 'INSERT INTO biblio.record_entry' statements
16:29	jeffdavis	very inelegant, but it worked for me
16:29	jl-	interesting
16:30	jeffdavis	it's the only time I've tried that staging_importer approach, so I can't say if it normally works better or if it just doesn't work for large numbers of bibs
16:30	jl-	what do you usually use?
16:30	jl-	staff client?
16:30	jeffdavis	fwiw these were real (public library) records too
16:30	jl-	kk
16:30	jeffdavis	I had been using the old method before (the perl scripts), wanted to try out the new approach
16:32	jl-	jeffdavis I thought the perl scripts (like pg loader) take forever?
16:33	Dyrcona	jl-: They're fast compared to the staff client.
16:33	jeffdavis	that's one reason I wanted to try the other approach :)
16:33	Dyrcona	If the staff client were even useful for that sort of thing.
16:34	jl-	Dyrcona what's your take on this, 70GB almost used up, ran for 4 hours
16:34	jl-	is it the method or the records
16:34	jl-	or do I need more space
16:34	Dyrcona	jl-: I'd say its the method more than the records.
16:35	Dyrcona	Unless your records are huge or you're loading millions.
16:35	jl-	the question is if it will work eventually or not..
16:35	jl-	or if I should abort it
16:35	jl-	I could try with giving it 100 GB
16:35	jl-	but that seems bizzare
16:36	phasefx	there are pg tuning things you can do temporarily to make imports faster, but that bit is voodoo to me
16:36	phasefx	stuff with working mem and checkpoints
16:37	phasefx	disable archive mode :)
16:37	bshum	Well, tuning postgres is probably a good thing to do anyways.
16:38	phasefx	just, differently tuned for normal production use as compared to huge batch imports
16:39	Dyrcona	You are also probably having issues with work logs, 'cause your whole import is likely done as a single transaction.
16:39	Dyrcona	Our entire database with indexes and that sort of thing takes up about 146GB of disk space.
16:40	* bshum	keeps trying to shrink his
16:40	bshum	:)
16:40	Dyrcona	dumped, compressed and encrypted, our database is only 14GB. :)
16:42	Dyrcona	jl-: Searching the IRC and dev mailing list lost from three or four years ago might turn up some discussions of what I was doing during our migration.
16:42	Dyrcona	I got some tips on tuning postgres for large loads back then.
16:43	Dyrcona	s/lost/logs/
16:43	jl-	Dyrcona I could batch delete all log files
16:43	jl-	from the sys?
16:43	Dyrcona	jl-: No. Postgresql keeps a work log around in its own data until a transaction is complete or a checkpoint is reached.
16:44		hbrennan joined #evergreen
16:44	Dyrcona	this doesn't show up in /var/log/ or anywhere like that.
16:44	Dyrcona	I recommend getting a good book on PostgreSQL or do some online research about how it works.
16:45	* dbs	used basically what is in the docs, but modified it to import in batches - not as extreme as jeffdavis' 1 insert per, but 100 records or 1000 records per transaction
16:46	* dbs	checks -- batches of 5000 records, actually
16:47	dbs	this was from an import in the summer of 2012.
16:48	hbrennan	We're upgrading and missed Rose-URI perl module on the db server.... is it safe to rerun the whole 2.3-2.4.0-upgrade-db.sql a second time?
16:48	dbs	http://pastie.org/8749954 - ah, cute, I started with 1000, then 5000, then batches of 10000, to get more feedback early...
16:48	jeffdavis	http://pastebin.com/VXgUiVEj <- this is what I did, allows parallelization without dealing with locks
16:49	Dyrcona	http://git.mvlcstaff.org/?p=jason/hz2eg.git;a=summary <- the public version of what I did. :)
16:50	* dbs	likes the simplicity of jeffdavis's approach, about 70 fewer files :)
16:51	hbrennan	and apologies for being rude and cutting in
16:51	dbs	hbrennan: not at all. I didn't answer you because I can't remember where Rose::URI fits in :/
16:51	* Dyrcona	was moving a whole database from one ILS to another, not just bibs.
16:52	hbrennan	we fixed that part
16:52	* dbs	guesses that you probably don't want to rerun the whole upgrade script
16:52	hbrennan	but would it hurt?
16:52	dbs	Depends, actually; if the whole thing rolled back because Rose:URI was missing, then it wouldn't hurt.
16:52	hbrennan	I'm pretending to be smart by typing what the real admin is saying across the room
16:53	dbs	But without being there and seeing the logs...
16:54	* dbs	used basically what is in the docs for imports because he wrote the docs based on what he did for his last import :)
16:54	hbrennan	Ah. Just got a reply back from Equinox. They say it is a-okay to rerun
16:55	hbrennan	Hooray! We've been twittling thumbs for an hour waiting
16:55	hbrennan	Best to be safe though
16:55	dbs	hbrennan++
16:55	hbrennan	Thanks, Dan! dbs++
16:57	dbs	But yes, once you get past a small set of demo records, you're not going to avoid PostgreSQL tuning and customized approaches to loading, which is what we've probably said about 10 times now :/
16:58		Wyuli joined #evergreen
17:19		mmorgan left #evergreen
17:36		dcook joined #evergreen
17:37		smyers_ joined #evergreen
17:43	jeff	hrm. search errors in the logs on one of my test systems. will investigate further tonight.
17:44	jeff	pgtap being present shouldn't break searches, should it?
17:44	jeff	seems unlikely.
17:47	jeff	oh. i believe i have a disconnect between my database and my services. all is broken as expected. :-)
19:13		nap joined #evergreen
19:14		nap left #evergreen
19:47		nap joined #evergreen
22:26		rjackson_isl joined #evergreen
23:38		zerick joined #evergreen