Logic Analyzer in Rust Entry: la.rs Date: Sat Jan 31 11:40:42 EST 2015 So the basic ideas: - Think of Rust as the implementation + scripting language - Keep integration in sigrok in mind This is a built off of the ideas of pyla, a Python + C++ logic analyzer. It worked fine, but still was clumsy in its aproach so I started thinkig that Rust might be good as both a core implementation language for the signal processors and the dataflow glue. Entry: Synchronicity Date: Sat Jan 31 11:43:28 EST 2015 Meaning: is it really necessary to process events from multiple sources at the same time? This is about the only obviou reason to _not_ use a task/process abstraction for handling communication protocols. Entry: Threads Date: Sat Jan 31 23:51:03 EST 2015 Solve the rest of the dataflow programming using concurrent programming? It seems the most natural approach. A good trade-off between speed and ease of use: - front-end: processes a lot of data but produces likely very little. a synchronous state machine seems best here. - back-end: processes little data but might need more elaborate code/data structure to do its job: higher abstraction seems best here. The question is about task switch granularity. For 20MHz data rate there is no way that this can be anything other than a tight single-task loop over a data array. It would be nice though to abstract the connectivity. I.e. if I want to chain two processors, it will figure out how they pass data. [1] http://doc.rust-lang.org/std/thread/ Entry: Iterators and buffers Date: Sun Feb 1 15:00:12 EST 2015 Is it necessary to include the iterator in the "tick" method of the trait that abstracts analyzer state machines? I'd think that this can all be optimized away. Maybe take the plunge and look at generated code? Entry: Input/Output Date: Sun Feb 1 16:58:56 EST 2015 Problem: data types when connecting processing pipelines. Either avoid it by using byte streams and explicit protocols, or figure out a way to encode it in the type system. There seem to be too many variables here to find a good solution. Attempt to simplify and fix the abstraction levels: - parallel bit streams - sequential byte streams - packet streams - high level data streams Maybe abstract everything as Bus and provide some wrappers? Parallelism is necessary: a bus has multiple channels with time-correlated data. A Bus can be a "packet bus" ? Entry: Finding the right abstraction is hard Date: Sun Feb 1 20:01:15 EST 2015 Trying too much at once. It's probably best to give up on premature optimization and figure out how to type things properly first. A UART is something that takes in a synchronous bit stream and produces a (possibly time-tagged) byte stream. There are two problems here: - The types - The I->O control flow. There are three obvious way of structuring: - As a function (i->o) possibly buffered, leaving connectivity to a different layer. - As a sink, abstracting composite sinks. - As a source (generator), chaining generators. In Pyla I used an i->o approach and added some composition laws to build sinks. DO IT WELL OR DON'T DO IT So I have a design that I already thought about for a long time. It uses byte buffers to communicate, which makes it rather simple, structurally. If types are necessary, why not build those on top of things? I.e. use types as "compile time blessing" like phantom types[1]. So let's do this: two layers: - Implementation, unconstrained uses byte buffers - Phantom layer adds typed-blessed composition [1] http://rustbyexample.com/generics/phantom.html Entry: Iterators Date: Sun Feb 1 22:39:36 EST 2015 Tired, but I really want to get to the bottom of this. Apparently passing around iterators doesn't work very well, maybe because they are stateful, abstract objects? So it seems better to pass something around that can create an iterator. Especially because I want the ability to provide fanout. Entry: Push vs. pull Date: Mon Feb 2 13:57:25 EST 2015 // TL;DR: Call flow represents low level call flow. High level // abstractions build on top of this. // From a first iteration in C++ (see zwizwa/pyla), the simplest // architecture seems to be a "push" approach, i.e. a data-driven // / reactive one where a function call corresponds to data being // available. // This corresponds best to the actual low-level structure when // this runs on a uC: a DMA transfer_complete interrupt. // It is opposed to a "pull" approach where a task blocks until // data is available, which always needs a scheduler. The pull // approach works best on higher levels, i.e. when parsing // protocols. So I took this out. It's clear that the Iterator trait is the way to go from the API side. Entry: Bit generators Date: Fri Feb 13 22:00:25 EST 2015 Makes sense to also put the stream generators in there. At least for SPI it's hard enough to make a test sequence without writing it as a state machine. Entry: Only 80 - 100 MiB/sec for uart.elf Date: Sat Feb 14 14:26:22 EST 2015 This is after adding some generic code. Trying with previous. Isn't better.. EDIT: That's 20-40 cycles per sample. Maybe not too bad? Ok, fixed. The iterator next() wasn't inlined. Now gets 250 - 300 MB/sec Chased the .dasm by adding a "marker const". Entry: Visualize Date: Sat Feb 14 18:37:29 EST 2015 First, get rid of wiggles. Encode as black/white/grey. Allow very fast zoom. Entry: RL UART Date: Tue Feb 17 22:10:21 EST 2015 Sending 0xF0 bytes, it sees 0xE0 bytes and frame errors. So wire: 0 0000 1111 1 It sees: 0 0000 0111 1 Which means it samples to early. Adding a bit more delay helps. Entry: UART sampling Date: Fri Feb 20 02:28:08 EST 2015 dsPIC UART can use 16x oversampling with where majority voting is used for mid-bit and 1 16x clock left and right of mid-bit. I'm still puzzled why mid-bit doesn't work on the current sniff setup. This could also be rounding errors. What about using a fractional bit period? Entry: BeagleBone Black Date: Wed Feb 25 22:15:57 EST 2015 slip.elf can take about 26-27 MByte/sec from /dev/zero on BBB. That's definitely usable. It would be nice though to get that to the full 100MByte/sec of the BeagleLogic. A better way is probably to use RLE. Reduce at the source. This requires the processor API to change. PRU RLE does not work at 100MHz A simpler way is to code an RLE processor in assembler to read from /dev/beaglebone and feed the RLE into rust code. So a loop in C isn't any better. This needs assembly. Entry: beagle logic Date: Thu Feb 26 11:56:00 EST 2015 Looking at the PRU source, the RLE is not done there. A good place would be copy_to_user() in the driver. Replace that with some ASM to do the encoding. Entry: Fast-path RLE Date: Fri Feb 27 02:36:32 EST 2015 The idea is the following: - Fast path: if vector of 16 x 8 bit (8 x 16 bit) is the same as before (= pattern), increment counter by 16 (8). - If not, use scalar ops to scan the vector, read the next vector and check if it's all the same. If so, initialize the pattern. Entry: zero-copy mmap Date: Fri Feb 27 00:14:31 EST 2015 Trying on BeagleLogic. 1. Set size IOCTL_BL_SET_BUFFER_SIZE This allocates multiples of 4MB. 2. Mmap buffers, all buffers are mapped subsequently. Driver uses devm_kzalloc() to allocate buffers, which is automatically freed on unload. 3. Use poll() to synchronize. 4. use lseek(fd,0,SEEK_END) to find the size that's available. 5. Use NEON instructions to do the scanning. [1] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0002a/ch01s03s03.html [2] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.swdev.qrc/index.html [3] https://groups.google.com/forum/#!topic/android-ndk/0jDCsFzbmu0 Entry: Shaders for waveform display Date: Sat Feb 28 02:48:57 EST 2015 It should be possible to send raw data to the shaders and have it render a waveform display, i.e. turning a 0/both/neither/1 input into lines, blocks, ... Yes, by feeding waveform data in a dynamic vertex array (1D) and having a static array with x-axis values 0,1,2,... on the renderer. Maybe shader can do logic shifts as well? To pick out different bus values. Seems only in higher versions (1.5?) Hmm... maybe this doesn't work. All attributes go into one buffer, and it is strided? I.e. can't update just part of it? EDIT: It's probably not worth the trouble, since it needs a geometry shader: a logic edge turns one value into two vertices. Entry: Parallel analyzers Date: Tue Jul 31 09:47:25 EDT 2018 I'd like to merge this with Seq from asm_tools. One thing I've not been able to express properly in the past in lars is to display multiple output streams. The simplest way to do this is to use "clock enable" signals. Entry: Tweaks Date: Sun Apr 26 10:57:16 EDT 2020 I've added Erlang support and fixed some build issues. It does seem it is better to make a monolyth application instead of a bunch of small programs. So let's do that first.