########### KeyFollower ########### The KeyFollower class is designed to facilitate live data processing of datasets contained within an hdf5 file. It achieves this through the use of two main classes: * Follower * FrameGrabber This tutorial assumes a basic level of skill using the h5py library. Specifically, you should be comfortable with using h5py to: * Open and create hdf5_files * Navigate files using python dictionary methods: *e.g.* using the get() method * Create groups and datasets If you are unfamiliar with how to do any of this we recommend reading the h5py quick start guide: https://docs.h5py.org/en/stable/quick.html Follower ======== The Follower class can be used to create instances of a python iterator object. The Follower is central to everything that swmr_tools does and most other classes either directly use it or are dependent upon the keys it produces. Example - Iteration through an all non-zero key dataset ------------------------------------------------------- We will create a dataset of non-zero integers, respresenting a complete set of scans all flushed to disk :: import h5py from swmr_tools.KeyFollower import Follower import numpy as np #create a sequential array of the numbers 1-8 and reshape them into an array # of shape (2,4,1,1) complete_key_array = np.arange(8).reshape(2,4,1,1) + 1 We will create an empty hdf5 file, create a group called "keys" and create a dataset in that group called "key_1" where we will add our array of non-zero keys :: with h5py.File("test_file.h5", "w", libver = "latest") as f: f.create_group("keys") f["keys"].create_dataset("key_1", data = complete_key_array) Next, we shall create an instance of the Follower class and demonstrate a simple example of its use. At a minimum we must pass the h5py.File object we wish to read from and a list containing the paths to the hdf5 groups containing our keys. Shown below is an example of using an instance of Follower within a for loop, as you would with any standard iterable object. For this basic example of a dataset containing only non-zero values, the loop runs 8 times and stops as expected :: # using an instance of Follower in a for loop with h5py.File("test_file.h5", "r", swmr = True) as f: kf = Follower(f, ["keys"]) for key in kf: print(key) 0 1 2 3 4 5 6 7 Example - Iteration through a dataset containg zeros ---------------------------------------------------- The key dataset is a form of metadata which (as we will see in detail when looking at the FrameGrabber class) represents whether a frame of a given dataset is complete and has been flushed to disk. Non-zero key values represent frames that have been completely written and flushed to disk, while values of zero represent a frame that has not. We therefore expect the iterator to halt when the next key is zero and either to wait for it to update to a non-zero value and continue or to stop iteration entirely if a termination condition is met. We will demonstrate a simple example of this below using a timeout method as a termination condition. Timeout is the default method used by Follower (although others can be set) :: with h5py.File("test_file.h5", "r+") as f: #set all values in the second row to zero f["keys/key_1"][1,:,:,:] = 0 with h5py.File("test_file.h5", "r", swmr = True) as f: kf = Follower(f, ["keys"], timeout = 1) for key in kf: print(key) 0 1 2 3 The example above clearly shows that the follower iterates through the first row waits for the timeout and then proceeds to halt iteration when the key at index [1,0] does not change to a non-zero value within the 1 second timeout. Example - Using other termination methods ----------------------------------------- The timeout method is the default for halting iteration. Other methods can be used by passing a list of method names (as strings) as an argument when instantiating the Follower :: with h5py.File("test_file.h5" "r", swmr_mode = True) as f: kf = Follower(f, ["keys"], termination_conditions = ["always_true"]) for key in kf: print(key) 0 1 2 3 As expected, we see the same outcome above as when a timeout was used. What has happened is that whilever there were non-zero keys the iterator behaved as normal. As soon as the next available key was zero the iterator stopped straight away (rather than waiting for a timeout). FrameGrabber ============ Indices produced by instances of the KeyFollower class correspond to frames of relavent datasets. To understand how the FrameGrabber class works it is important to understand that instances of Follower do **not** return the value of a key, they return the index of the key for a flattened version of the array. We will demonstrate this with an example :: complete_key_array = np.random.randint(low = 10, high = 20000, size = (2,4)) with h5py.File("test_file.h5", "w", libver = "latest") as f: f.create_group("keys") f["keys"].create_dataset("key_1", data = complete_key_array) #print dataset to demonstrate the non-sequential nature of the keys print(f["keys/key_1"][...]) array([[15083, 15092, 15918, 11475], [10070, 9500, 15115, 8331]]) As you can see above the key values are all non-zero, however they are not in sequential order and many of the values are quite high. When using an instance of the KeyFollower to iterate through this we simply recieve an index :: with h5py.File("test_file.h5", "r", swmr = True) as f: kf = Follower(f, ["keys"], timeout = 1) for key in kf: print(key) 0 1 2 3 4 5 6 7 If we just want to access the value corresponding to the index we can use numpys unravel_index() method :: with h5py.File("test_file.h5", "r", swmr = True) as f: print(f["keys/key_1"][np.unravel_index(6, shape = (2,4))]) 15115 This is fine for extracting a scalar, but does not help when trying to extract a vector valued frame from a dataset. For this purpose we have created the FrameGrabber class Using FrameGrabber to Extract Frames from a key index ----------------------------------------------------- First, we will create a small dataset with a corresponding key dataset containing with all values non-zero :: complete_key_dataset = np.arange(4).reshape(2,2,1,1) + 1 complete_data_dataset = np.random.randint(low = 0, high = 1000, size = (2,2,5,10)) with h5py.File("test_file.h5", "w", libver = "latest") as f: f.create_group("keys") f.create_group("data") f["keys"].create_dataset("key_1", data = complete_key_dataset) f["data"].create_dataset("data_1", data = complete_data_dataset) FrameGrabber takes two arguments, the full path to the dataset you want to extract frames from and an open h5py.File object containing the dataset. To extract a frame, call the method FrameGrabber.Grabber() with the key index :: with h5py.File("test_file.h5", "r", swmr = True) as f: kf = Follower(f, ["keys"], timeout = 1) fg = FrameGrabber("data/data_1", f) for key in kf: frame = fg.Grabber(key) print(f"Printing frame {key}:") print(frame +"\n") print(f"Shape of frame: {frame}") Printing frame 0: [[[[913 25 989 89 425 221 634 947 510 616] [819 56 268 162 474 543 471 368 948 295] [723 453 937 548 473 463 542 230 759 567] [517 821 388 941 523 420 564 606 491 985] [427 967 845 115 526 812 742 419 411 531]]]] Shape: (1, 1, 5, 10) Printing frame 1: [[[[533 411 801 739 470 908 493 634 137 678] [862 382 633 113 952 152 520 937 413 685] [414 985 69 161 69 53 453 978 846 953] [ 94 346 223 891 499 992 888 846 573 507] [139 345 834 396 445 789 361 73 504 500]]]] Shape: (1, 1, 5, 10) Printing frame 2: [[[[492 428 465 627 165 583 558 868 133 64] [926 732 564 725 424 144 991 139 114 356] [941 653 303 665 768 384 894 239 720 510] [663 815 228 888 325 356 293 225 481 700] [155 506 906 29 307 589 16 264 616 88]]]] Shape: (1, 1, 5, 10) Printing frame 3: [[[[376 22 142 805 266 176 824 85 886 771] [403 795 603 528 349 117 384 176 186 324] [561 467 322 430 792 977 606 906 833 243] [954 466 125 597 959 245 699 36 254 410] [943 629 468 131 657 717 734 482 657 895]]]] Shape: (1, 1, 5, 10) The above example demonstrates the ability of the FrameGrabber class to return corresponding vector-valued dataset frames of the correct shape. This lets us do operations frame by frame live as frames are being written. Below is a simple data reduction example where we return the sum of each frame :: with h5py.File("test_file.h5", "r", swmr = True) as f: kf = Follower(f, ["keys"], timeout = 1) fg = FrameGrabber("data/data_1", f) for key in kf: current_frame = fg.Grabber(key) data_reduced_frame = current_frame.sum() data_reduced_frame = data_reduced_frame.reshape((1,1,1,1)) print(f"Printing frame number {key}") print(f"Frame = {data_reduced_frame}\n Shape = {data_reduced_frame.shape}\n") Printing frame number 0 Frame = [[[[25616]]]] Shape = (1, 1, 1, 1) Printing frame number 1 Frame = [[[[25727]]]] Shape = (1, 1, 1, 1) Printing frame number 2 Frame = [[[[23705]]]] Shape = (1, 1, 1, 1) Printing frame number 3 Frame = [[[[28003]]]] Shape = (1, 1, 1, 1)