Gemini-conduit design notes, from code review 8/2010 number of bounce buffers, make clear the effects of, in the readme see vapi conduit for example use gprofile or oprofile to measure startup of a large job (magic argument for testcollperf? correctnmess and performance iterations both > 0) replace bootstrap exhcange with a faster one replace pmi_barrier with a faster one PHH DONE dont need segment addr and size info in peersegmentdata because gasnet already has that PHH DONE the smsg peer data is almost all constant and doesn't need to be exchanged or kept around in the runtime array of structs in gc_init_messaging, loop around PHH DONE:multiplier sizing completion queue creates endpoints for the local supernode, which are not used when pshm is in use there is a gasneti_pshm_in_supernode to test if we need an endpoint for them PHH DONE: but first must fix shutdown coordination not to use SmsgSend w/i the supernode PHH DONE: and implement non-GNI path for loopback AMs when PSHM is disabled PHH DONE can save memory for smsg buffering and cq sizing as well (and endpoints) round up gc_post_descriptor to mulitple of cache line (only in PAR bulds?) probablyu a good idea everywhere since gni libraries and kernel write fields in it. PHH DONE fix name of gc.h to be longer LCS DONE remove am packet sequence number (only debug field or delete) or maybe a timestamp field PHH DONE eliminate from and to fields, could pack packet type together with numargs PHH DONE (but worth checking again) remove unused and constant fields from all structs LCS DONE insert padding in front of am medium payload if necessary to assure 64 bit alignment, just so the memcpy at destination will be faster (use gasneti_align? call in call to gc_send inAMMediumRequest PHH DONE-or-N/A if you want to piggyback credit return on amreply, AND use the smsg buffer to pass directly to the handler, then be sure to save at least one credit per currently executing handler DONE-or-N/A in medium dispatch, there are two memcpy's one for data and one for header, you could use just one, if the data was 64 aligned. Now there are no memcpys could memcpy the varargs arg list in amrequest, then revers the args in the call to the handler if necessary (too twisty) LCS NOT DONE replace gc_atomic_dec_if_positive with gasneti_semaphore_trydown Paul suggests semaphore is bigger, so it is fine the way it is PHH DONE is a poll needed in the amrequest routines? (yes! see other conduits) put it up front but NOT in the reply fns DONE typeof is not portable, so change gc_post_descriptor_t to put the gni_post_descriptor first, and then cast to a gc_post_descriptor PHH DONE to piggyback credit return, use pointer to struct as token, then the handler can set a flag if it sends a reply struct tokenstruct { gasnet_node_t source; int reply_sent; } then token is pointer to PHH DONE gc_debug is not used PHH DONE gasnete_put_nb always blocks, but it doesn't have to if the data is shorter than the bounce-register cutover (if source is outside segme) PHH DONE if source is inside segment, then if source length is less than 128, then could copy to the immediate bounce buffer and return before the put is complete when there are multiple 1 MB transfrer for GET, make the first one shorter, to end on an aligned boundary so that the transfers are not misaligned first could be MAX - misalignment, pr perhaps very small, try both ways see testalign PHH DONE if add quota for registrations, could ifre off all the 1 MB blocks and synchronize all of them at once, for nbi, just increment the counters PHH DONE for put_nbi_bulk , create a recursive nbi access region and return the handle for it as the handle for the put_nb_bulk PHH DONE for put_nb, for small puts, copy the data to the immediate bounce buffer just so you can return faster PHH DONE for medium size out-of segment puts, return as soon as data copied into bounce buffer but before the rdma is finished PHH N/A bounce for medium in-segmgnet puts, just block - used bounce buffer instead just as for medium sized out-of-seg PHH DONE put_nbi needs the same treatment for medium size as now done in put_nb be careful to get 1M plus 17 bytes correct consider having a 512 byte zero block somewher in mapped semgnet to use as source for short memset (or memset the immediate buffer) DONE PHH: cray requires 4 byte alignment for get, could break up into a small transfer to reach alignment DONE PHH OR together the source, dest, and length, then & 3 only once DONE PHH: short misalighned messages should get an aligned block to the immediate buffer then copy on completion if we fire off multiple chunks then make sure that the registrations don't overlap and break on page boundaries to prevent duplicaton nb. try to understand use of huge_tlb read -h output for the tests testlarge and testsmall have -in to have source be in-segment DONE gc_rdma_get doesn't use immediate bounce buffer! DONE gc_rdma_get doesn't retry memregister the way gc_rdma_put does DONE Separate concept of size of immediate bounce buffer from the concept of when to switch from FMA to RDMA (fma->rdma at 1 K for MPI 4K for libpgas) INLINE the small fry consider changing queue for smsg work queue into a circular buffer, becvause it cannto be more than gc_ranks enqueus and dequeues do not need to share a lock, but they need an atomic coiunter for space available. PHH: but still need to atomically check whether its already queued before enqueue PHH DONE gc_queue push not used PHH DONE on-queue indication could be a flag, rather than a pointer to which queue you are on, since there is only the smsg_work_queue PHH GONE (qi no longer in gpd) note, queue item and gni_post cannot both be first in the gc_post_descriptor If I call handlers from the smsg buffer must assure that a handler cannot internally call poll_smsng_q PHH DONE (but should do again) look for unused fields (initialized only) PHH GONE close smsg_fd PHH GONE remembery why /dev/zero instead of anon either the sync synchronize in handle_am_X isn;t needed, or it doesn't work anyway Alternate atomic thing is a gasnet_semaphore_t implementation is efficient. PHH DONE get rid of loop around printf failed to coordinate shutdown what does PMI_Finalize really mean? Is there a timer that is started? Otherwise, bould move it above the flush_streams and trace_finish Items added post-review: PHH DONE: + Document and enforce a minimum value of GASNET_NETWORKDEPTH PHH DONE: + Does shutdown msg need an am_credit to avoid GNI_SmsgSend() failing w/ NOT_DONE? PHH DONE: + Requests could carry "banked" credits (if we can find header space) + Why do we still see NOT_DONE from SmsgSend (well before reaching shutdown). + If am_credits[] were in aux-seg, then could use AMO for gasnetc_send_am_nop(). + Since pd's *are* in aux-seg, might use FMA for other internal purpose(s).