It is possible that flash drives are handled specially, as they are well known devices. However the normal way I've seen that sort of thing handled is by remoting a block device from the controlling machine.
Jitter means variation in the inter-arrival time of voice frames. Ideally you want them to arrive at close to 20ms intervals, and certainly not in a big bunches at 100ms intervals.
Properly configured means dedicating enough CPU power and disk bandwith and guaranteeing that that power will be provided to the VM in milliseconds even if that is to the disadvantage of processing on another VM. Orignal VM designs would schedule VMs for quite long time slices, and manipulate the time they saw so as to make them think they were running continuously.