# Changelog ## 0.9.2 - Fixed non-deterministic ordering when running jobs. Now jobs always run starting with the lowest JID again. ## 0.9.1 - Add the `--stagger` flag for machine setup. This helps avoid resource bottlenecks during setup tasks for many machines. - Add some more columns to `job stat` for convenience. - Add `--only_done` flag to `job stat`to only include jobs that are done. - Add retry limits to `job add` and `job matrix add` - All command line arguments that expect a JID now also accept the value `last` to indicate the JID of the last job. ## 0.9.0 - Renamed `job matrix stat` to `job matrix ls` for better consistency with `job ls`. - The following have all been removed in favor of the new `job stat` command: - The `--output` flags of `job matrix ls` and `job ls` - The `job matrix csv` subcommand - The `job results` subcommand - The `job stat` command has gained significant super powers. It is now vastly more useful for post-processing results, and outputing data into a number of useful formats. See the help message for more info, but here are some examples: - Print a plain text table of the given experiments with only the given columns: ```sh > j job stat --text --jid --time --machine $EXPERIMENT_JIDS JID TIME MACHINE 14923 1h34m clnode199.clemson.cloudlab.us:22 14951 1h4m clnode201.clemson.cloudlab.us:22 14956 14963 46m2s clnode212.clemson.cloudlab.us:22 ``` - Print a JSON of ID and log path for all running jobs: ```sh > j job stat --json --jid --log --running [{"jid":"15778","log":"/path/to/my.log\n"},{"jid":"15781","log":"/path/to/my.log\n"},{"jid":"15787","log":"/path/to/my.log\n"},{"jid":"15792","log":"/path/to/my.log\n"},{"jid":"15798","log":"/path/to/my.log\n"},{"jid":"15831","log":"/path/to/my.log\n"},{"jid":"15832","log":"/path/to/my.log\n"},{"jid":"15833","log":"/path/to/my.log\n"},{"jid":"15834","log":"/path/to/my.log\n"},{"jid":"15835","log":"/path/to/my.log\n"},{"jid":"15836","log":"/path/to/my.log\n"},{"jid":"15837","log":"/path/to/my.log\n"}] ``` - Print a CSV generated by mapping each job's info with the given scipt, which takes a JSON of all info about a job: ```sh > j job stat --id 14740 --jid --results --cmd --csv --mapper /nobackup/extract.py Data filename,Huge page,Runtime (s),cpu_clk_unhalted.thread_any,cs,dtlb_load_misses.miss_causes_a_walk,dtlb_load_misses.walk_active,dtlb_store_misses.miss_causes_a_walk,dtlb_store_misses.walk_active,faults,inst_ retired.any,migrations /nobackup/scratch/page-value/exp_10__bare_metal___hacky_spec17__-transparent_hugepage_huge_addr140721422073856-transparent_hugepage_huge_addr_mode_Less_-2020-10-13-10-25-24-316505881.mmu,TODO,356.387142448,37422 90180352,5889,4265220134,116025476938,271985527,8132321876,11494749,5143124399584,6 /nobackup/scratch/page-value/exp_10__bare_metal___hacky_spec17__-transparent_hugepage_huge_addr140721422073856-transparent_hugepage_huge_addr_mode_Less_-2020-10-13-10-25-24-414504453.mmu,TODO,360.590801957,37566 69277437,3031,4273389066,116331409111,277050659,8249010131,11494750,5142777500448,10 /nobackup/scratch/page-value/exp_10__bare_metal___hacky_spec17__-transparent_hugepage_huge_addr140721422073856-transparent_hugepage_huge_addr_mode_Less_-2020-10-13-10-25-24-822510504.mmu,TODO,356.487874315,37420 04270438,5900,4274810097,116482238285,273613875,8186410000,11494750,5142880002204,4 /nobackup/scratch/page-value/exp_10__bare_metal___hacky_spec17__-transparent_hugepage_huge_addr140721422073856-transparent_hugepage_huge_addr_mode_Less_-2020-10-13-10-27-43-860570478.mmu,TODO,359.158573514,37522 29639845,2997,4276891646,116319096098,276830049,8289222935,11494752,5142941529188,9 /nobackup/scratch/page-value/exp_10__bare_metal___hacky_spec17__-transparent_hugepage_huge_addr140721583554560-transparent_hugepage_huge_addr_mode_Less_-2020-10-13-10-34-03-577381879.mmu,TODO,358.304955228,37518 94627436,3188,4267564111,116262044231,274054941,8203832653,11258667,5141843316464,14 ``` - Print a plain text table of the given experiments, using the given script to map over a particular column of the data as plain text: ```sh > j job stat --running --cmd --cmd_map /tmp/replace_a_with_unk.sh --text cmd exp00010 --mmu_overheunkd hunkcky_spec17 xunklunkncbmk exp00010 --mmu_overheunkd hunkcky_spec17 xunklunkncbmk exp00010 --mmu_overheunkd hunkcky_spec17 xunklunkncbmk exp00010 --mmu_overheunkd hunkcky_spec17 xunklunkncbmk exp00010 --mmu_overheunkd hunkcky_spec17 xunklunkncbmk exp00010 --mmu_overheunkd hunkcky_spec17 xunklunkncbmk exp00010 --mmu_overheunkd hunkcky_spec17 xunklunkncbmk exp00010 --mmu_overheunkd hunkcky_spec17 xunklunkncbmk exp00010 --mmu_overheunkd hunkcky_spec17 xunklunkncbmk exp00010 --mmu_overheunkd hunkcky_spec17 xunklunkncbmk exp00010 --mmu_overheunkd hunkcky_spec17 xunklunkncbmk ``` - Combine with shell to restart all failed jobs among the listed experiments: ```sh > j job restart $(j job stat --text --jid --status $EXPERIMENT_JIDS | grep Failed | awk '{print $1}') ``` - Added the `job mvresults` subcommand to copy all file associated with a task to a new location. - Fixed issue where copying results hangs due to SSH host key verification failure. This was a long-standing and annoying issue. Instead, we now detect this case and print a specific error message encouraging the use to add the given host to their `known_hosts` file. Additionally, we move the host out of the class so that further experiments won't error out wastefully. The user can move it back when the host has been added to `known_hosts`. ## 0.8.1 - Fixes a panic on "narrow" terminals. ## 0.8.0 - Change the way results files are identified. The runner should now return a common prefix of all files to be copied, and the jobserver will copy all files with that prefix. In contrast, in the past, you had to return a filepath with a glob. - The client now has some better support for manipulating said prefixes. ## 0.7.1 - Added `machine mv` subcommand. - Fix minor bugs. ## 0.7.0 - Matrices that have become empty because all of their jobs were forgotten will also be forgotten. This is different from prior behavior, so I'm bumping the major version. - Added support for timing out jobs. - Added a shortcut for restarting a job. - Added `-r` flag to list all running jobs. - Fix some bugs. - Bump the optimization level a bit. ## 0.6.1 - Minor backwards-compatible changes to client-server protocol and vast refactoring of client-side printing for `job ls`. These produce a major improvement in the format of job listings for matrices. ## 0.6 - Changes to client-side `j machine rm` arguments to allow removing classes of machines more easily. This allows removing expired reservations more easily. ## 0.5 - Add `j job results` subcommand. - Major improvements to handling of failed/cloned jobs in matrices: - When a matrix job is cloned, the clone also ends up in the matrix. - Matrix jobs automatically repeat on failure. - `j job matrix add` now supports the `-x` flag. - `j job ls` now prints a summary of the printed jobs. - Internal rearchitecting of the thread that copies results back to the host. This may allow future improvements to handling of failed/timed out/hanging copying tasks. ## 0.4 - Reimplemented the server state serialization for snapshots. This fixes weird errors where tasks would become corrutped after a server restart for no apparent reason. Unfortunately, this is breaking change to the format of the server snapshots, so tasks that were already in the snapshot will show as `Unknown` after restarting the server into version 0.4. ## 0.3 - This is the first version I published on crates.io.