- [Notes](#notes) - [Tensor](#tensor) - [Todo](#todo) - [Done](#done) - [wont do](#wont-do) - [Notes](#notes-1) - [Resources](#resources) - [backprop](#backprop) - [Autodiff](#autodiff) - [Rust](#rust) - [SIMD](#simd) # Notes - no ints or primitive types in generic parameters - there is an rfc, but it was opened in 2017 - https://www.lpalmieri.com/ ## Tensor ### Todo ### Done - use vectors to store data - separate the view and data parts of tensor to not have to copy everything? - use pointers? - cant use (mut) reference because you need to initialize data somehow, it must belong to the tensor that created it - tensor must be able to mutate data to do in-place operations - option to do in-place or copy - autograd needs copy - https://discuss.pytorch.org/t/how-in-place-operation-affect-autograd/31173 - but some ops like `x[:10] = 0` should be able to be done inplace - so don't need to separate data and tensor, but keep in mind for future if you want to do more (or multiple) inplace ops - make two, `slice` and `slice_mut` - just make copies immutable to make it simple - right now, `calc_shape_from_slice` will slice a tensor of shape `[2, 2, 2]` to `[1, 1, 1]` if slicing it on `[[0, 1], [0, 1], [0, 1]]` - this means that strides for 3+ d tensors can be wrong - vector's overhead shouldn't be much - maybe set vec to not allocate extra memory - vec is contigous in memory - naive slicing is too slow - numpy sometimes makes a copy when slicing to keep stuff contigous - naive approach slows down with size - it looks like numpy will make a copy when slicing - dont' use `[start:stop]` syntax for rust, use `.slice(!vec[start:stop])` - you should be able to do: **will make copies to make it simple** - `x[1:2, 3:4] = y` - this has to be inplace - return a tensor containing a vector of mut references to values in original tensor? - return vec of mut references to original data - bad idea, instead: - use strides and slice start and stop points to allow access to the right parts of the data vec - return a mut reference to the tensor - `y = x[1:2, 3:4]` - this can be a copy or a view - should be able to do either easily - how ndarray does it https://stackoverflow.com/questions/50400966/is-there-a-rust-ndarray-equivalent-for-numpy-arithmetic-on-a-slice - numpy notes - http://scipy-lectures.org/advanced/advanced_numpy/#indexing-scheme-strides - https://ipython-books.github.io/45-understanding-the-internals-of-numpy-to-avoid-unnecessary-array-copying/ - ndarray heap (https://users.rust-lang.org/t/ndarray-stack-and-heap-memory-and-overhead/25254) - pre-allocate size of new vectors - add checks for slicing and tensor creation - validation should happen at initialization - fix error handling and structure - panic!("Invalid slice") - should panicing be done in main? - use smarter indexing - -1 for all values - not needing to say either first or last vals - slicing with an empty slice will return a copy of the orignal tensor - reduce number of &[..] - reshape - benchmarks for previous stuff - broadcasting - ops - should be like `let c = &a + &b;` and `let c = l2::add(a, b);` - elementwise ops are comparable to numpy until ~ 4096x4096 - other should be reference - ops won't return Result - that would mean that you would need to run `let x = (a + b).unwrap();` - will panic instead - all ops are tensor-tensor ops - element-wise ops - self-ops - pow - sqrt - exp - log - e - 10 - abs - sin - cos - tan - other ops - _dim ops are slower than numpy (30us vs 300us), numpy seems to cache stuff_ - _argmax and argmin return f32 tensors_ - over dim or all dims - sum - mean - max - min - argmax - argmin - use enum for ops? - matmul - _about 100x slower than numpy_ - _wont implement broadcasting on matmul_ - batch matmul - check errors - concat - transpose - clone - normal - uniform - autodiff - accumulate gradients - hold reference to parents - lhs and rhs - hold gradient - know grad functions for each op - account for creators - do tensor-tensor op without grad tracking for backend - derivative is `RefCell>` - `borrow_mut()` on derivative and assign to it - use `Rc::new(Tensor::new(Rc::clone(&t))),` so references can be to more than one tensor? - dont need to use option? - no - use a wrapper for grad tracking tensors? - save memory on normal tensors - mark nodes as evaluated? - prevent having to recurse through shared graph multiple times - topological sort - done - in backward mutate lhs and rhs parent's grad not own - works ``` a / \ * * / \ b c \ / + + \ / d da = dd/db + dd/dc but this will recompute the backwards pass for all the graph above a topological sorting accumulates gradients for a before going further up the computation graph ``` - printing of tensors and graph - combine 1,2,3,4d ops into one function - figure out macros and crates - ops - div - pow - sqrt - exp - log10 - log - abs - sin - cos - tan - slice - allocates a small (1 element) tensor to satisfy match arms - transpose - allocates a small (1 element) tensor to satisfy match arms - view - concat - sum - mean - figure out if you want to use tensor ops in backward or vec ops - decide what to do with new_with_parents - used tensor ops - clear gradient - blas - autograd - benchmarks for each operator - use correct shapes - compare to pytorch and ndarray - only 100s of ns used for topo sort, rest is calling backwards - cargo.rs ### wont do - change col-major to row-major - impl iterator - replace indices slice with enum - tensor::new shouldn't return result - use enum to store diff between two types of indices - [start, end) - -1 - prevent having to reallocate memory on each backwards pass - clear unneeded memory as soon as you can? - impl == on tensors - derivative vec for broadcasted mul to a one-element tensor is 6 elements long - problem with derivatives when broadcasting - ops - matmul - 3 and 4d backwards don't work yet because transpose should only work on two dims - max - min - argmax - argmin - fix transpose - redo error handling - const generics for compile time errors - fix blas ci ### Notes - L2 is competitive with numpy for small tensors - L2 copies slices since thats whats needed for autograd - by default, numpy returns a view - takes about 6s to slice all elements from a 64x64x64x64 tensor - speed of slicing/allocating cannot be optimized. Numpy takes about 2x the time that l2 does because l2 will always copy a slice. Numpy's native slices are views, but copies are needed for autograd. ## Resources - https://dev.to/erikinapeartree/rust-for-machine-learning-simd-blas-and-lapack-4dcg - https://docs.rs/rayon/1.3.0/rayon/ - https://www.google.com/search?q=rust+ndarray+simd&oq=rust+ndarray+simd&aqs=chrome..69i57.3773j0j7&sourceid=chrome&ie=UTF-8 - https://stackoverflow.com/questions/39477684/should-i-avoid-unwrap-in-production-application/39478185#39478185 - https://medium.com/@GolDDranks/things-rust-doesnt-let-you-do-draft-f596a3c740a5 ### backprop - https://datascience.stackexchange.com/questions/20139/gradients-for-bias-terms-in-backpropagation - https://cs231n.github.io/optimization-2/ - https://cs231n.github.io/neural-networks-case-study/#grad - https://stackoverflow.com/questions/38082835/backpropagation-in-gradient-descent-for-neural-networks-vs-linear-regression - https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b - https://stackoverflow.com/questions/38082835/backpropagation-in-gradient-descent-for-neural-networks-vs-linear-regression - https://github.com/bkkaggle/L2/tree/master#acknowledgements ### Autodiff - https://github.com/karpathy/micrograd - https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation - https://github.com/ibab/rust-ad - https://github.com/Rufflewind/revad/blob/eb3978b3ccdfa8189f3ff59d1ecee71f51c33fd7/revad.py - https://github.com/srirambandi/ai - https://discuss.pytorch.org/t/is-pytorch-autograd-tape-based/13992/3 - https://www.reddit.com/r/MachineLearning/comments/8ep130/d_how_does_autograd_work/ - https://github.com/mattjj/autodidact - https://github.com/karpathy/recurrentjs - https://github.com/karpathy/randomfun - https://medium.com/@ralphmao95/simple-autograd-implementation-understand-automatic-differentiation-hand-by-hand-9e86f6d703ab - https://evcu.github.io/ml/autograd/ - https://blog.paperspace.com/pytorch-101-understanding-graphs-and-automatic-differentiation/ - https://github.com/maciejkula/wyrm - https://medium.com/@maciejkula/building-an-autodifferentiation-library-9ccf32c7a658 - https://github.com/evcu/numpy_autograd/blob/master/my_autograd.py#L147 - https://github.com/evcu/numpy_autograd/blob/master/Autograd.ipynb - https://cs231n.github.io/optimization-2/ ### Rust - https://nora.codes/post/what-is-rusts-unsafe/ ### SIMD - https://opensourceweekly.org/issues/7/