| Crates.io | tiff2 |
| lib.rs | tiff2 |
| version | 0.0.4 |
| created_at | 2025-01-22 20:16:30.330927+00 |
| updated_at | 2025-03-27 22:12:34.961536+00 |
| description | temporary async implementation of tiff - to be upstreamed into image-tiff |
| homepage | https://github.com/feefladder/tiff2 |
| repository | https://app.radicle.xyz/nodes/ash.radicle.garden/rad:zV8XqcPhA2thndDYWeKkt3WGnqbT |
| max_upload_size | |
| id | 1527068 |
| size | 396,146 |
Similar in function and planned lifespan as arrow2 crate:
image-tiff, so code can be copied over easilyThis project is largely derived from image-tiff and planned to be used in the georust ecosystem. Therefore, it is dual-licensed under Apache 2.0 and MIT. I would like to stress that I do not want it to be used for military purposes and in the agronomical sector only for the promotion or support of agroecological practices. However, the license does not offer such guidance. For more info, please read this blog post (not mine) along with the ethical subcommons starter kit and CARE principles (Caroll et. al., 2023)
This crate is not meant for directly reading tiff files, but rather for building more specialized tiff readers on top of. However, a rudimentary tiff reader is still implemented to show how that would work.
The following use-cases were taken as example in the design:
COG layout
|Ifd1|Ifd2|Ifd3|-Ifd1TagData-|-Ifd2TagData-|-Ifd3TagData-|--Image1Data--|--Image2Data--|--Image3Data--|
\--->points to--->/\---------------->points to-------------->/
These are rather different, especially in how eagerly they all should read in the tag data. For 1, we'd want to be done with 3 requests, while not loading the complete tag data. For 2 we'd want to eagerly load a single overview, and for 3 we'd want to eagerly load in most of the tiff's metadata. Now, that's the case for COGs. Other tiffs may have different layouts with differing use-cases.
The crate is split up in three parts:
Data structures are shared between decoders and encoders. All three have a further hierarchial structure:
{
ifd: Ifd,
opts: Arc<ChunkOpts>, // immutable since we should decide on those before starting the encoding/decoding process.
chunk_offsets: BufferedEntry, //mutable, since it could be partial
chunk_bytes: BufferedEntry,
}
Decoder:
Decoding a tiff has multiple steps:
There are some crates that implement similar mechanics:
For some of the container formats I've handled, images included, I've found it convenient to split an initial metadata parse that gives you a read-only, shared source, from readers that share said source - though there's still a lot of room to play in that area.
The pictured use-case is a mapping application, where a user freely moves around the map,
&mut self when we could be decoding chunks at the same time? Or use internal mutability - locking hell?Adding another overview to the source makes it no longer read-only. Data required for tile retrieval and decoding, however, is rather small and doesn't change. Thus a Vec<Arc<Image>> would be enough to read all images contained in that vec. Problems arise when we want to add another Arc<Image> to the vec, or want internal mutability.
That is:
struct Decoder {
/// OverviewLevel->Image map (could be a vec)
images: HashMap<OverviewLevel, Arc<tiff2::Image>>,
geo_data: Idk,
reader: Arc<impl CogReader>
}
#[tokio::test]
fn test_concurrency_recover_problem() {
let decoder = CogDecoder::from_url("https://enourmous-cog.com").await.expect("Decoder should build");
decoder.read_overviews(vec![0]).await.expect("decoder should read ifds");
// get a chunk from the highest resolution image
let chunk_1 = decoder.get_chunk(42, 0).unwrap();
// get a chunk from a lower resolution image
if let OverviewNotLoadedError(chunk_err) = decoder.get_chunk(42, 5).unwrap_err() {
// read_overviews changes state of the decoder to LoadingIfds
decoder.read_overviews(chunk_err); // no await
}
let chunk_2 = decoder.get_chunk(42,5);
let data = (chunk_1.await, chunk_2.await);
}
impl CogDecoder {
/// requiring mutable access to self is suboptimal
/// actually solved
fn get_chunk(&mut self, i_chunk: u64, zoom_level: OverviewLevel) -> TiffResult<impl Future<Output = DecodingResult>/* + Send */> {
match self.images.get(zoom_level) {
// this will make the caller
None => Err(TiffError::ImageNotLoaded(zoom_level)), // in this piece of code, we'd have to await IFD retrieval+decoding
Some(img) => Ok(img.clone().decode_chunk(i_chunk)) // since this returns a future that doesn't reference self, we are happy
}
}
}
impl Image {
// better move this to decoder, only make image return the offset and length
fn decode_chunk<R>(&self, reader: R, i_chunk: u64) -> impl Future<Output = DecodingResult>{
let chunk_offset = self.chunk_offsets[i_chunk];
let chunk_bytes = self.chunk_bytes[i_chunk];
let chunk_opts = self.chunk_opts.clone();
async move {
// don't mention `self` in here, see [stackoverflow](https://stackoverflow.com/a/77845970/14681457)
ChunkDecoder::decode(reader, chunk_offset, chunk_bytes, chunk_opts)
}
}
}
#[tokio::test]
fn test_concurrency() {
let decoder = CogDecoder::from_url("https://enourmous-cog.com").await.expect("Decoder should build");
decoder.read_overviews(vec![0,5]).await.expect("Decoder should read ifds");
// get a chunk from the highest resolution image
let chunk_1 = decoder.get_chunk(42, 0);
// get a chunk from a lower resolution image
let chunk_2 = decoder.get_chunk(42, 5);
let data = (chunk_1.await, chunk_2.await);
}
#[tokio::test]
fn test_concurrency_fail() {
let decoder = CogDecoder::from_url("https://enourmous-cog.com").await.expect("Decoder should build");
decoder.read_overviews(vec![0]).await.expect("decoder should read ifds");
// get a chunk from the highest resolution image
let chunk_1 = decoder.get_chunk(42, 0);
// get a chunk from a lower resolution image
let chunk_2 = decoder.get_chunk(42, 5); //panic!
let data = (chunk_1.await, chunk_2.await);
}
// how HeroicKatana would do it if I understand correctly:
#[tokio::test]
fn test_concurrency_recover() {
let decoder = CogDecoder::from_url("https://enourmous-cog.com").await.expect("Decoder should build");
decoder.read_overviews(vec![0]).await.expect("decoder should read ifds");
// get a chunk from the highest resolution image
let chunk_1 = decoder.get_chunk(42, 0).unwrap();
// get a chunk from a lower resolution image
if let OverviewNotLoadedError(chunk_err) = decoder.get_chunk(42, 5).unwrap_err() {
// read_overviews changes state of the decoder to LoadingIfds
decoder.read_overviews(chunk_err).await;
}
let chunk_2 = decoder.get_chunk(42,5);
let data = (chunk_1.await, chunk_2.await);
}
#[tokio::test]
fn test_concurrency_recover_problem() {
let decoder = CogDecoder::from_url("https://enourmous-cog.com").await.expect("Decoder should build");
decoder.read_overviews(vec![0]).await.expect("decoder should read ifds");
// get a chunk from the highest resolution image
let chunk_1 = decoder.get_chunk(42, 0).unwrap();
// get a chunk from a lower resolution image
if let OverviewNotLoadedError(chunk_err) = decoder.get_chunk(42, 5).unwrap_err() {
// read_overviews changes state of the decoder to LoadingIfds
decoder.read_overviews(chunk_err); // no await
}
let chunk_2 = decoder.get_chunk(42,5);
let data = (chunk_1.await, chunk_2.await);
}
The last problem, with statefullness would be solved approx like:
struct CogDecoder {
/// OverviewLevel->Image map (could be a vec)
images: HashMap<OverviewLevel, tiff2::Image>,
/// Ifds should all be in the first chunk, so we can load them
ifds: Vec<Ifd>
byte_order: ByteOrder,
geo_data: Idk,
reader: Arc<impl CogReader>,
}
impl CogDecoder {
async fn read_overviews(&mut self, levels: Vec<OverviewLevel>) {
// there are only further states, from which we can always return
self.change_state(DecoderState::LoadingTagData).await;
levels
.filter(|level| !self.images.contains_key(level))
.map(|l| (l, Image::check_ifd(self.ifds[l]))
.map(|(l, req_tags)| (l, self.reader.read_tags(req_tags)))
.collect::<TiffError<_>, _>()?;
for l in levels {
let req_tags
}
self
}
async fn get_chunk(&self, i_chunk: u64, zoom_level: OverviewLevel) -> TiffResult<DecodingResult> {
match self.state {
// is there some magic that we can await state changes in ourselves?
DecoderState::Ready => {},
DecoderState::LoadingIfds => return TiffError::WrongState(),
_ => return TiffError::WrongState(),
}
match self.images.get(zoom_level) {
None => TiffError::OverviewNotLoadedError(zoom_level), // in this piece of code, we'd have to await IFD retrieval+decoding
Some(img) => img.decode_chunk(i_chunk) // since this returns a future that doesn't reference self, we are happy
}
}
}
pub struct BufferedEntry {
tag_type: TagType,
count: usize,
data: Vec<u8>,
}
The core struct is an IFD.
An IFD can hold sub-IFDs.
Therefore, it looks like:
pub struct Ifd {
sub_ifds: Vec<Ifd>,
data: BTreeMap<Tag, BufferedEntry>
}
A more specialized version is an Image.
pub struct Image {
ifd: Ifd,
chunk_opts: Arc<ChunkOpts>,
chunk_offsets: BufferedEntry,
chunk_bytes: BufferedEntry,
}
ProcessedEntry in stead of Value everywherefind a better name for CogReader trait
harmonize Value between "encoder" and decoder. Options:
Value (possibly recursive) enum:
BufferedValue: stored as bytes sequence <- I like actually
Vec<u8> or Bytes) has alignment issues that will not surface with the default allocator, but could surface with e.g. jemallocBytes to allow for reference-counted, "zero-copy" implementation.
ProcessedValue: stored as VecValue::List
&mut [u8] sliceBufferedEntry as:
/// Entry with tag data
pub struct BufferedEntry {
tag_type: TagType,
/// count := tag_type.size() * data.len()
count: u64,
/// Data should be aligned to `TagType.primitive_size()`
data: Bytes,
}
Until then, a possible solution could be:
impl<'a> TryFrom<&'a BufferedEntry> for &'a [u64] {
type Error = TiffError;
fn try_from(val: &'a BufferedEntry ) -> Result<Self, Self::Error> {
if val.tag_type.size() * val.count != val.data.len() {
return Err(TiffFormatError::InconsistentSizesEncountered.into())
}
match val.tag_type {
TagType::LONG8 => match bytemuck::try_cast_slice::<u64>(val.data) {
Ok(v) => Ok(v),
Err(bytemuck::PodCastError::TargetAlignmentGreaterAndInputNotAligned) => {
// magic to cast slice, will cost an alloc
// how do we create a (temp) value that outlives the current function?
// [we don't](https://stackoverflow.com/a/64196091/14681457)
// val.data.chunks_exact(tag_type.size()).map(|chunk| u64::from_ne_bytes(chunk)).collect()
Err(bytemuck::PodCastError::TargetAlignmentGreaterAndInputNotAligned)
},
Err(e) => Err(e.into())
}
}
}
}
or:
let tag = Tag::from_u16(0x01_01); //ImageLength
let offset = Offset{
tag_type = TagType::Long8,
count: 1,
offset: 42,
}
let tag_range = [offset.offset..offset.offset + offset.tag_type.size() * offset.count];
let res: Bytes = reader.get_range(tag_range).await?;
// add check here, possibly creating a new alignment
if !res.has_minimum_alignment(8) {
res = Bytes::from_with_minimum_alignment(res, 8) //this function doesn't exist
}
BufferedEntry {
offset.tag_type,
offset.count,
res,
}
Possible solutions:
AlignedVec as backing structure for BytesVec<u64> using bytemuck
pub struct IfdEntry {
Offset(Offset),
Single(Value),
Multiple(ProcessedEntry),
}
'staticSendnesschunks_exact(8).map(u64::from_ne_bytes) as safe alternative to casting
AlignedVec sourceCarroll, S. R., Garba, I., Figueroa-Rodríguez, O. L., Holbrook, J., Lovett, R., Materechera, S., Parsons, M., Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Sara, R., Walker, J. D., Anderson, J., & Hudson, M. (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19, 43–43. https://doi.org/10.5334/dsj-2020-043