LVM2 snapshots are fully read/write. You can write to either the snapshot or the original volume and the write won’t be seen by the other. In the snapshot volume what is stored is an “exception list”, basically a big hash table(?) recording which blocks are different from the original volume.
Now I was curious how this works, because obviously writes to the original volume must also cause an exception to be added to the snapshot volume. How does the snapshot module see these writes? Does it hook into the original device in some way? It turns out, no.
It’s all handled at higher levels (by the lvcreate command in fact) which inserts a device mapper layer above the original device. This special layer (called a “snapshot-origin”) grabs the write request and passes it to the snapshot code where it causes an exception to be added to the snapshot (or snapshots because several snapshots might have been created against a single origin). [Refer to origin_map and __origin_write in dm-snap.c].
You can see the extra layer added by lvcreate -s by examining the device mapper tables directly. For example, before creating a snapshot:
# dmsetup table | grep F13 vg_pin-F13x64: 0 20971520 linear 253:0 257687936
and after creating a snapshot of that LV (notice the “real” LV has been renamed):
# lvcreate -s -n F13x64snap -L 1G /dev/vg_pin/F13x64 # dmsetup table | grep F13 vg_pin-F13x64snap: 0 20971520 snapshot 253:38 253:39 P 8 vg_pin-F13x64snap-cow: 0 2097152 linear 253:0 605815168 vg_pin-F13x64: 0 20971520 snapshot-origin 253:38 vg_pin-F13x64-real: 0 20971520 linear 253:0 257687936
Resolving the device references into a simpler diagram:
F13x64 (253:4) F13x64snap (253:37) | | | | +---------------+ | v v v F13x64-real (253:38) F13x64snap-cow (253:39)
F13x64snap-cow at the bottom right is the actual storage used for the snapshot exception list. It is just a plain linear mapping of some blocks from the underlying block device. F13x64snap is the virtual device. When read, the read consults the exception list in the snapshot cow, and if not there, consults the real device. Writes to F13x64snap go to the exception list. Finally, writes to the virtual origin device F13x64 go to the snapshot cow (or snapshot cows plural). There is no explicit connection here — in fact it goes via a hash table stored in kernel memory.
Gotta love open source =)
I have always wondered why it is not possible to drop an LVM snapshot without accepting the changes that have happened since. lvremove accepts the changes…
What you have found out today, is that part of the reason for this?
btw. How would you drop a snapshot without rolling the changes back?
dd if=/dev/vg00/snap of=/snap
lvremove /dev/snap
virsh stop vm
dd if=/snap of=/dev/vg00/vm
?
Hello sir,
Good Evening
well sir your article is too good and full of indepth information but again can you help me understanding the difference b/w read and read-write snapshot along with proper examples so that i can make my concepts regarding snapshots more clear.
Thanks…..In advance………
Pingback: LVM-Snapshots | Christian Schröders Blog
Pingback: LVM Snapshots | Christian Schröders Blog