Friday, March 23, 2012

Readin' Dirty

They see me rollin'
A client recently had a data integrity issue when they deployed some ORM code that inadvertently NULL'd some columns during updates. They needed to recover the data quickly and came to me. The first thing that came to mind was to restore from backup and then replay xlog's to right before the deployment happened and see how many would be recoverable that way. They asked me to think outside the box and see if I could come up with a solution to do this more quickly.

Unfortunately I wasn't able to. I looked at pg_filedump and xlogdump, but I didn't really get out of them what I wanted. Ideally I would have liked to say something like 'CREATE TABLE bar AS SELECT * FROM foo WHERE corrupt_col IS NOT NULL USING SNAPSHOT ANY'. For those of you that are unfamiliar, there are a handful of global snapshots internal to Postgres that allow you to peek under the covers of the MVCC. 'SnapshotAny' is one that allows you to see "any" row in the relation, including deleted ones that haven't been vacuumed yet. As far as I could tell, there was no better solution then restoring from a backup.

So after being unable to meet the client's expectations, and some prodding by co-workers, I decided to see what I could come up with that would meet my needs. So today I made publicly available the first version of pg_dirtyread. It's an extension that provides a simple function for reading heap tuples from relations using SnapshotAny. It's not ideal as, among other drawbacks, it requires a full sequential scan of the relation, but at least it's a start.

Example usage:
SELECT * FROM pg_dirtyread('foo'::regclass) as t(bar bigint, baz text);
Where the schema of foo is (bar bigint, baz text)

I'd be interested in any feedback on it as there is plenty of room for improvement. Maybe even some similar projects out there that my google-fu did not find for me. Also, I know the name is fallacious. A dirty read is reading rows that aren't committed. There is a different snapshot for that even. Maybe I should have called it pg_readany, but that name didn't sound as cool. Maybe in the future I will call it something like pg_readsnapshot and allow you to specify which snapshot you would like to use.

6 comments:

  1. I've always missed SELECT ... AS OF SCN Oracle's feature in PostgreSQL

    ReplyDelete
  2. Hello! HELP! I need to install this tool in my postgres.
    I read page http://www.depesz.com/2012/04/04/lets-talk-dirty/ but when I follow the steps I get the following error:
    Makefile: 8: /usr/lib/postgresql/9.1/lib/pgxs/src/makefiles/pgxs.mk: No such file or directory.
    If I go to my /usr/lib/postgresql/9.1/lib I notice that there is no subdirectory there.
    How do I install the developer package without breaking more things?.
    Thank you very much and sorry for my bad English.
    Guillermo

    guillermovil at gmail

    ReplyDelete
    Replies
    1. Make sure pg_config from the install in question is in your path.

      Delete
  3. Sorry Phil, my o.s. is ubuntu and my postgres is 9.1.4. and pg_config is in /usr/bin and my $PATH is:
    /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games

    ReplyDelete
  4. Hi,

    It’s possible make a jdbc version to read the deleted rows in postgresql ?

    Thanks

    Alejandro

    ReplyDelete
    Replies
    1. This can be used with a JDBC postgres client. It's not directly related though.

      Delete