IRC log for #evergreen, 2016-10-10

All times shown according to the server's local time.

Time	Nick	Message
04:48		eeevil joined #evergreen
04:48		akilsdonk_ joined #evergreen
04:48		Bmagic_ joined #evergreen
04:49		jeffdavi1 joined #evergreen
04:49		jeff___ joined #evergreen
08:45		Dyrcona joined #evergreen
09:05		maryj joined #evergreen
09:29		yboston joined #evergreen
09:35	Dyrcona	If one changes kpac.xml what services need to be restarted? opensrf.settings?
09:36		jeff___ joined #evergreen
09:53	Dyrcona	Great. cstore is not connected to the network, but I'm having a devil of a time firguring out why.
10:06	Dyrcona	This is weird. All of the servers are using the same opensrf.xml.
10:06	Dyrcona	Two of the sip servers are blowing up because of the application_name field, but all of the sip servers are running the same version of DBI: 1.612, so I'd expect them all to blow up.
11:47		Christineb joined #evergreen
12:35		sciani joined #evergreen
12:43	jeff	Dyrcona: get things all sorted out?
13:23		Christineb joined #evergreen
13:23		sciani joined #evergreen
13:24	Dyrcona	jeff: I don't think so. Other things crop up.
13:41	Dyrcona	So, almost 500GB of pg_xlog files, some timestamped from last night when I started a pingest.
13:41	Dyrcona	My guess is that a lot of these are junk and will never process, but how to tell?
13:42	jeff	I would expect a pingest to generate a lot of WAL traffic like that.
13:44	jeff	Perhaps whatever normally consumes them has run out of disk itself?
13:50	jeff	(In terms of the target of an archive_command if applicable, or something else)
14:00	Dyrcona	Yeah. But, where the WAL files are being copied seems to have plenty of space.
14:10	Dyrcona	And, what logs I can find don't mention running out of space, though every time I google the error, I get results saying how to deal with that or the continuous archiving instructions.
14:10	Dyrcona	I get two errors, maybe the same, but they come out in two lines.
14:11	Dyrcona	One says the .bz2 file exists in the destination and the other says a cp failed with exit code = 131.
14:12	Dyrcona	The space does seem to be freeing up, though.
14:13	Dyrcona	I guess what worked int training doesn't work in production. :/
14:14	csharp	Dyrcona: research pg_resetxlog
14:14	csharp	(pretty sure that's right)
14:15	csharp	you'd want to do a base backup right after so you can make sure you're covered
14:15	csharp	I had to do that during an upgrade several years ago - no fun, but at least we were already down
14:16	jeff	if your archive_command includes a compression step, it's possible that you ran out of resources other than disk (memory, cpu + time) during the pingest run.
14:16	jeff	in any event, good luck and proceed carefully. :-)
14:16	csharp	once the xlog is reset you can remove the old pg_xlog files
14:17	Dyrcona	csharp: Thanks. I'll look into that.
14:17	Dyrcona	jeff: Yeah, could be. There's a bzip2 apparently going on, but that cp exit code has me stumped, and the message related to the bzip2 files is that they already exist. Not helpful, I know.
14:18	Dyrcona	pg_resetxlog sounds drastic from reading the documentation....
14:21	csharp	in my case my disk was full and I had no other choice
14:23	Dyrcona	Yeah, my disk is close to full, but it seems to be clearing up a git on its own.
14:23	jeff	Dyrcona: any OOM killer logs on the box doing the archiving?
14:25	Dyrcona	jeff: No, but there are disk errors. :(
14:25	jeff	ouch.
14:26	jeff	i repeat my earlier, "good luck and proceed carefully"
14:27	Dyrcona	jeff: For a drive other than the one where I think the logs are going.
14:27	Dyrcona	I had to check the partitions on that machine.
14:27	Dyrcona	failure on sdb, but logs are going to sda, it looks like
14:28	jeff	what is sdb?
14:28	Dyrcona	So much nagios junk in the logs.
14:30	Dyrcona	jeff: It looks like just a mounted drive that's not used, but I'm not so sure now that I have a third look.
14:32	Dyrcona	It's mount by the db server as /mnt/backups, but /dev/sda8 on the destination machine is mounted as /usr/backups on the db server and that is where backups are going.
14:33		Bmagic joined #evergreen
14:41	Dyrcona	Nothing like a good, old-fashioned hardware failure to go along with the typical upgrade issues.
15:32	Dyrcona	csharp: Do you use continuous WAL archiving? Or, rather who in here does (that's around to answer)?
15:46		remingtron_ joined #evergreen
15:51		dbwells joined #evergreen
16:11		remingtron joined #evergreen
19:24		dcook joined #evergreen