Struct Repository

Source

pub struct Repository<ObjectID: FsVerityHashValue> {
    repository: OwnedFd,
    objects: OnceCell<OwnedFd>,
    write_semaphore: OnceCell<Arc<Semaphore>>,
    insecure: bool,
    _data: PhantomData<ObjectID>,
}

Expand description

A content-addressable repository for composefs objects.

Stores content-addressed objects, splitstreams, and images with fsverity verification. Objects are stored by their fsverity digest, streams by SHA256 content hash, and both support named references for persistence across garbage collection.

Fields§

§repository: OwnedFd§objects: OnceCell<OwnedFd>§write_semaphore: OnceCell<Arc<Semaphore>>§insecure: bool§_data: PhantomData<ObjectID>

Implementations§

Source §

impl<ObjectID: FsVerityHashValue> Repository<ObjectID>

Source

pub fn objects_dir(&self) -> ErrnoResult<&OwnedFd>

Return the objects directory.

Source

pub fn write_semaphore(&self) -> Arc<Semaphore>

Return a shared semaphore for limiting concurrent object writes.

This semaphore is lazily initialized with available_parallelism() permits, and shared across all operations on this repository. Use this to limit concurrent I/O when processing multiple files or layers in parallel.

Source

pub fn open_path(dirfd: impl AsFd, path: impl AsRef<Path>) -> Result<Self>

Open a repository at the target directory and path.

Source

pub fn open_user() -> Result<Self>

Open the default user-owned composefs repository.

Source

pub fn open_system() -> Result<Self>

Open the default system-global composefs repository.

Source

fn ensure_dir(&self, dir: impl AsRef<Path>) -> ErrnoResult<()>

Source

pub async fn ensure_object_async( self: &Arc<Self>, data: Vec<u8>, ) -> Result<ObjectID>

Asynchronously ensures an object exists in the repository.

Same as ensure_object but runs the operation on a blocking thread pool to avoid blocking async tasks. Returns the fsverity digest of the object.

For performance reasons, this function does not call fsync() or similar. After you’re done with everything, call Repository::sync_async().

Source

pub fn create_object_tmpfile(&self) -> Result<OwnedFd>

Create an O_TMPFILE in the objects directory for streaming writes.

Returns the file descriptor for writing. The caller should write data to this fd, then call spawn_finalize_object_tmpfile() to compute the verity digest, enable fs-verity, and link the file into the objects directory.

Source

pub fn spawn_finalize_object_tmpfile( self: &Arc<Self>, tmpfile_fd: OwnedFd, size: u64, ) -> JoinHandle<Result<ObjectID>>

Spawn a background task that finalizes a tmpfile as an object.

The task computes the fs-verity digest by reading the file, enables verity, and links the file into the objects directory.

Returns a handle that resolves to the ObjectID (fs-verity digest).

§Arguments

tmpfile_fd - The O_TMPFILE file descriptor with data already written
size - The exact size in bytes of the data written to the tmpfile

Source

pub fn finalize_object_tmpfile(&self, file: File, size: u64) -> Result<ObjectID>

Finalize a tmpfile as an object.

This method should be called from a blocking context (e.g., spawn_blocking) as it performs synchronous I/O operations.

This method:

Re-opens the file as read-only
Enables fs-verity on the file (kernel computes digest)
Reads the digest from the kernel
Checks if object already exists (deduplication)
Links the file into the objects directory

By letting the kernel compute the digest during verity enable, we avoid reading the file an extra time in userspace.

Source

fn compute_verity_digest(reader: &mut impl BufRead) -> Result<ObjectID>

Compute fs-verity digest in userspace by reading from a buffered source. Used as fallback when kernel verity is not available (insecure mode).

Source

fn store_object_with_id(&self, data: &[u8], id: &ObjectID) -> Result<()>

Store an object with a pre-computed fs-verity ID.

This is an internal helper that stores data assuming the caller has already computed the correct fs-verity digest. The digest is verified after storage.

Source

pub fn ensure_object(&self, data: &[u8]) -> Result<ObjectID>

Given a blob of data, store it in the repository.

For performance reasons, this function does not call fsync() or similar. After you’re done with everything, call Repository::sync().

Source

fn open_with_verity( &self, filename: &str, expected_verity: &ObjectID, ) -> Result<OwnedFd>

Source

pub fn set_insecure(&mut self, insecure: bool) -> &mut Self

By default fsverity is required to be enabled on the target filesystem. Setting this disables verification of digests and an instance of Self can be used on a filesystem without fsverity support.

Source

pub fn create_stream( self: &Arc<Self>, content_type: u64, ) -> SplitStreamWriter<ObjectID>

Creates a SplitStreamWriter for writing a split stream. You should write the data to the returned object and then pass it to .store_stream() to store the result.

Source

fn format_object_path(id: &ObjectID) -> String

Source

fn format_stream_path(content_identifier: &str) -> String

Source

pub fn has_stream(&self, content_identifier: &str) -> Result<Option<ObjectID>>

Check if the provided splitstream is present in the repository; if so, return its fsverity digest.

Source

pub fn write_stream( &self, writer: SplitStreamWriter<ObjectID>, content_identifier: &str, reference: Option<&str>, ) -> Result<ObjectID>

Write the given splitstream to the repository with the provided content identifier and optional reference name.

This call contains an internal barrier that guarantees that, in event of a crash, either:

the named stream (by content_identifier) will not be available; or
the stream and all of its linked data will be available

In other words: it will not be possible to boot a system which contained a stream named content_identifier but is missing linked streams or objects from that stream.

Source

pub async fn register_stream( self: &Arc<Self>, object_id: &ObjectID, content_identifier: &str, reference: Option<&str>, ) -> Result<()>

Register an already-stored object as a named stream.

This is useful when using SplitStreamBuilder which stores the splitstream directly via finish(). After calling finish(), call this method to sync all data to disk and create the stream symlink.

This method ensures atomicity: the stream symlink is only created after all objects have been synced to disk.

Source

pub async fn write_stream_async( self: &Arc<Self>, writer: SplitStreamWriter<ObjectID>, content_identifier: &str, reference: Option<&str>, ) -> Result<ObjectID>

Async version of write_stream for use with parallel object storage.

This method awaits any pending parallel object storage tasks before finalizing the stream. Use this when you’ve called write_external_parallel() on the writer.

Source

pub fn has_named_stream(&self, name: &str) -> Result<bool>

Check if a splitstream with a given name exists in the “refs” in the repository.

Source

pub fn name_stream(&self, content_identifier: &str, name: &str) -> Result<()>

Assign the given name to a stream. The stream must already exist. After this operation it will be possible to refer to the stream by its new name ‘refs/{name}’.

Source

pub fn ensure_stream( self: &Arc<Self>, content_identifier: &str, content_type: u64, callback: impl FnOnce(&mut SplitStreamWriter<ObjectID>) -> Result<()>, reference: Option<&str>, ) -> Result<ObjectID>

Ensures that the stream with a given content identifier digest exists in the repository.

This tries to find the stream by the content identifier. If the stream is already in the repository, the object ID (fs-verity digest) is read from the symlink. If the stream is not already in the repository, a SplitStreamWriter is created and passed to callback. On return, the object ID of the stream will be calculated and it will be written to disk (if it wasn’t already created by someone else in the meantime).

In both cases, if reference is provided, it is used to provide a fixed name for the object. Any object that doesn’t have a fixed reference to it is subject to garbage collection. It is an error if this reference already exists.

On success, the object ID of the new object is returned. It is expected that this object ID will be used when referring to the stream from other linked streams.

Source

pub fn open_stream( &self, content_identifier: &str, verity: Option<&ObjectID>, expected_content_type: Option<u64>, ) -> Result<SplitStreamReader<ObjectID>>

Open a splitstream with the given name.

Source

pub fn open_object(&self, id: &ObjectID) -> Result<OwnedFd>

Given an object identifier (a digest), return a read-only file descriptor for its contents. The fsverity digest is verified (if the repository is not in insecure mode).

Source

pub fn read_object(&self, id: &ObjectID) -> Result<Vec<u8>>

Read the contents of an object into a Vec

Source

pub fn merge_splitstream( &self, content_identifier: &str, verity: Option<&ObjectID>, expected_content_type: Option<u64>, output: &mut impl Write, ) -> Result<()>

Merges a splitstream into a single continuous stream.

Opens the named splitstream, resolves all object references, and writes the complete merged content to the provided writer. Optionally verifies the splitstream’s fsverity digest matches the expected value.

Source

pub fn write_image(&self, name: Option<&str>, data: &[u8]) -> Result<ObjectID>

Write data into the repository as an image with the given name`.

The fsverity digest is returned.

§Integrity

This function is not safe for untrusted users.

Source

pub fn import_image<R: Read>( &self, name: &str, image: &mut R, ) -> Result<ObjectID>

Import the data from the provided read into the repository as an image.

The fsverity digest is returned.

§Integrity

This function is not safe for untrusted users.

Source

fn open_image(&self, name: &str) -> Result<(OwnedFd, bool)>

Returns the fd of the image and whether or not verity should be enabled when mounting it.

Source

pub fn mount(&self, name: &str) -> Result<OwnedFd>

Create a detached mount of an image. This file descriptor can then be attached via e.g. move_mount.

Source

pub fn mount_at(&self, name: &str, mountpoint: impl AsRef<Path>) -> Result<()>

Mount the image with the provided digest at the target path.

Source

pub fn symlink( &self, name: impl AsRef<Path>, target: impl AsRef<Path>, ) -> ErrnoResult<()>

Creates a relative symlink within the repository.

Computes the correct relative path from the symlink location to the target, creating any necessary intermediate directories. Atomically replaces any existing symlink at the specified name.

Source

fn read_symlink_hashvalue(dirfd: &OwnedFd, name: &CStr) -> Result<ObjectID>

Source

fn walk_symlinkdir(fd: OwnedFd, objects: &mut HashSet<ObjectID>) -> Result<()>

Source

fn openat(&self, name: &str, flags: OFlags) -> ErrnoResult<OwnedFd>

Open the provided path in the repository.

Source

fn gc_category(&self, category: &str) -> Result<HashSet<ObjectID>>

Source

pub fn objects_for_image(&self, name: &str) -> Result<HashSet<ObjectID>>

Given an image, return the set of all objects referenced by it.

Source

pub fn sync(&self) -> Result<()>

Makes sure all content is written to the repository.

This is currently just syncfs() on the repository’s root directory because we don’t have any better options at present. This blocks until the data is written out.

Source

pub async fn sync_async(self: &Arc<Self>) -> Result<()>

Makes sure all content is written to the repository.

This is currently just syncfs() on the repository’s root directory because we don’t have any better options at present. This won’t return until the data is written out.

Source