(Version 1.9)
Copyright © 2005 David Flater.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
TA, the Transparent Archivist, is a front-end program that reduces
the fuss and muss involved in building archives under Linux. It
is "transparent" in that the archives that it produces can be
examined with ls
and restored with cp
.
You do not need TA to recover an archive built by TA.
TA is not a disc burning program. TA builds disc images but does not burn them. The value added by TA is:
TA offers eight choices of file system for the archives: three variants of ISO 9660, three variants of ext2, squashfs, and UDF.
Not all packages are required in all modes (e.g., you don't need squashfs-tools unless you are making squashfs discs).
Package / program | Version tested |
---|---|
Linux kernel | 6.11.2 |
GNU coreutils (cp and rm) | 9.5 |
e2fsprogs (mke2fs and tune2fs) | 1.46.5 |
libmagic | 5.41 |
GNU find | 4.8.0 |
libdstr | 1.0 |
mhash (libmhash) | 0.9.9.9 |
XZ Utils (liblzma) | 5.2.5 |
cdrtools (mkisofs) | 3.02a09 * |
udftools (mkudffs) | 2.3 |
util-linux (losetup, mount) | 2.37.4 |
squashfs-tools (mksquashfs) | 4.5 |
* In certain Linux distributions, mkisofs is a symbolic link to or simply replaced by a program called genisoimage. This is not cdrtools but a forked project known as cdrkit. Genisoimage has issues and its use with TA is not supported.
To run TA you must have plenty of free disk space. In addition to the room needed for the final images, iso, isoj, isorr and squashfs require room for a temporary copy of all of the files on a given disc.
To build ext2, le2a, le2f and udf archives you must have sufficient privileges to mount and unmount file systems.
Preserving ownership on archived files or archiving files with unfriendly permissions requires TA to be run as root.
Your kernel must support whatever file systems you are using.
Identifier | Description |
---|---|
isoj | ISO-9660:1988 Level 1 extended by Joliet and deep directories. Unless you need compatibility with Windows XP, avoid this legacy file system. |
iso | ISO-9660:1999 version 2, a standard for CD-ROMs and other optical media. |
isorr | ISO-9660:1999 version 2 plus Rock Ridge protocol, an extension to preserve more of the features of Unix file systems. |
ext2 | A normal Linux file system. Although ext4 now is prevalent, ext2 suffices for disc images and avoids needless overhead. |
le2a | LUKS, ext2 plus 256-bit AES (standard encryption). |
le2f | LUKS, ext2 plus 256-bit Twofish (deprecated encryption). Retained only to avoid a major version bump of TA. |
squashfs | Compressed Linux file system. |
udf | Universal Disk Format, a standard for DVDs and Blu-rays. |
The limits stated below for the various file systems are not standard or theoretical limits but actual limits determined by testing under Linux. Of course, the same disc may be read differently by a different operating system or a different version of Linux.
"8-bit agnostic" file name encoding means that file names are recorded as strings of 8-bit characters with no translation. Whether your ambient codeset is ISO 8859-1, UTF-8, or whatever, that is what goes on the disc. As long as you read the disc in the same context in which it was recorded, all file names should survive intact.
Identifier | File name encoding | File name length limit | File size limit | Year limits | File ownership, permissions, symbolic links? | Compression? | Windows problems |
isoj | UTF-16* | 206 B (103 char.) | (232 − 2) B = 4294967294 B | 1902–2037 | No | No | Lower file size limit, dates messed up |
iso | 8-bit agnostic | 207 B | > 9 GiB | 1902–2037 | No | No | Charset mismatch, forbidden characters, lower file size limit, dates messed up |
isorr | 8-bit agnostic | 248 B | > 9 GiB | 1902–2037 | Yes | No | Rock Ridge ignored, names truncated, charset mismatch, forbidden characters, lower file size limit, dates messed up |
ext2 | 8-bit agnostic | 255 B | > 9 GiB | 1902–2445 | Yes | No | No support |
le2a | 8-bit agnostic | 255 B | > 9 GiB | 1902–2445 | Yes | No | No support |
le2f | 8-bit agnostic | 255 B | > 9 GiB | 1902–2445 | Yes | No | No support |
squashfs | 8-bit agnostic | 255 B | > 9 GiB | 1970–2105 | Yes | xz | No support |
udf | UTF-8 | 254 B | > 9 GiB | 1–65535 | Yes | No | Unreliable, dates messed up |
* The following characters, which are allowed in Linux file names, are lost in translation to Joliet: *:;?\
TA does not handle any file types other than regular files, directories, and symbolic links.
TA does not preserve empty directories.
TA does not preserve timestamps on directories.
TA does not handle files that are too big to fit on the target media in one piece. TA does not split files and it does not reorder files when filling up discs.
TA does not handle file system race conditions. If a file to be archived changes while TA is running, its hash will be wrong. If a file to be archived is deleted while TA is running, TA will exit with an error.
You may not archive a file in the root directory called ta-hashes.txt because that is where TA stores the hashes.
TA is packaged with the popular and portable GNU automake, so
all usual GNU tricks should work. Help on configuration options can be
found in the INSTALL file or obtained by entering ./configure
--help
.
Normally, one should only need to do the following to compile and install the programs ta, tahash, and taval:
$ ./configure $ make $ su # make install
The distribution includes source for a program called createfile that is
useful for testing TA. It is not normally built. If for some reason you
want to build it, configure with
--enable-test-progs
.
Usage: ta [options] discsize filesystem workingdir imagedir src [src...] Options: -l Tweak ext2/le2a/le2f to maximize usable space. -nornd Don't initialize encrypted volumes with random data. -p Force file permissions to reasonable defaults. -r src Replicate file src in *every* image. -w Wait for confirmation after completing each image. discsize: cd74, cd80, dvd+r, dvd+rdl, bd-r, or an arbitrary size specified in bytes. filesystem: iso, isoj, isorr, ext2, le2a, le2f, squashfs or udf. workingdir: for ext2, le2a, le2f or udf this is just a mount point that we can use. For others, this must be an existing, empty directory that we can fill up and then wipe clean. imagedir: disc images will be written here, overwriting any files that are in the way. Make sure it is on a file system that can handle big files if you are creating DVD-sized images. src: stuff to archive. Should usually be a directory, but you can do single files if you want.
cd74 and cd80 refer to 74- and 80-minute CD-Rs or RWs. dvd+r and dvd+rdl refer to DVD+R/RW and DVD+R DL. Standard capacities are not available for DVD-R/RW or DVD-R DL. bd-r is single-layer Blu-ray.
The -p option will set the permissions on directories and executable files to rwxr-xr-x and on non-executable files to rw-r--r--. (Although the archive will be read-only, making files unwritable by owner creates more trouble than it is worth.) For iso and isoj this option has no effect.
The -w option is useful if you have inadequate disk space to store all of the images being produced. TA will wait for you to burn and delete the previous image before beginning the next one.
The -l option is useful if you need a few more megabytes to fit a few large files onto a DVD. See details below.
The -nornd option will speed up the creation of encrypted volumes for le2a and le2f file systems at the cost of not obfuscating the location of encrypted data on a less-than-full volume.
The translation of src paths into paths within the image is done more or less the way that tar does it: /mumble/foo (absolute) and mumble/foo (relative) both translate to mumble/foo in the image. However, references to "." are removed from the final file names, and references to ".." are not allowed.
TA leaves the disc images in the directory that you specify as imagedir. Disc images are named image001.iso, image002.iso, and so forth. Even non-iso images are called .iso because anything else can confuse disc-burning applications.
If you are building DVD images, imagedir must be on a file system that can support files larger than 4 GiB (i.e., not vfat).
It is not a good idea to do other work on the side while TA is archiving. If you modify a file in TA's list, the hash will be wrong and the file will not validate. If you delete a file in TA's list, TA will fail.
You can use whatever disc burning application you like to burn the images to disc. Following are sample commands that seem to work under Linux. Your mileage may vary.
Target media | Burning command |
---|---|
CD-R | cdrecord -v dev=/dev/cdrom -dao image001.iso |
CD-RW | cdrecord -v blank=fast dev=/dev/cdrom -dao image001.iso |
DVD+R/RW/DL | growisofs -dvd-compat -Z /dev/dvd=image001.iso |
BD-R | growisofs -Z /dev/dvd=image001.iso |
(N.B., the sao and dao options to cdrecord are completely equivalent.)
In most cases, a Gnome or KDE based desktop should figure out how to mount
a disc automatically. However, when the iso file system is used, discs
must be mounted with the map=o
option to avoid case-smashing
file names:
iso: |
mount -t iso9660 -o ro,map=o /dev/cdrom /mnt
|
isoj, isorr: |
mount -t iso9660 -o ro /dev/cdrom /mnt
|
ext2: |
mount -t ext2 -o ro /dev/cdrom /mnt
|
udf: |
mount -t udf -o ro /dev/cdrom /mnt
|
squashfs: |
mount -t squashfs -o ro /dev/cdrom /mnt
|
le2a, le2f: |
MAPNAME=`date +%N` # Pick a unique map name
|
Unmounting, all file systems: umount /mnt
le2a, le2f only: cryptsetup luksClose $MAPNAME
To validate a disc, mount the disc on some directory (/mnt in this
example) and do taval /mnt
. Taval will check the hashes on all
regular files.
For a second opinion on the validity of a given file, you can
manually compare the contents of /ta-hashes.txt with the output of
gpg --print-md sha512
.
Taval only checks the contents of files that are listed in /ta-hashes.txt. It does not ensure that the dates, permissions, or other metadata were correctly preserved, nor does it notice if other files were added.
As a convenience, taval can also validate an archive against a file of MD5
hashes that was produced by some other program, e.g., md5sum. To
validate an archive against MD5 hashes instead of ta-hashes.txt, use the
-md5 switch of taval: taval -md5 md5file /mnt
.
Each line of the MD5 hashfile must be 32 bytes of data, two spaces, and a filename:
d41d8cd98f00b204e9800998ecf8427e null
Since the archives are completely transparent, you can go directly to the disc(s) and directories that you want if you are in a hurry to retrieve something specific.
mount -o ro /dev/cdrom /mnt cd /mnt ls
Otherwise, repeat for each disc:
GLOBIGNORE=".:.." # Needed for * to match hidden files mount -o ro /dev/cdrom /mnt cp -a /mnt/* / umount /mnt
If not running as root, you might have to change some permissions in order to get all of the files to copy in. When done, delete the extraneous file /ta-hashes.txt.
Sometimes it is handy to generate hashes without getting involved in
making disc images. You can do this with tahash dir
, and the
directory's contents can subsequently be validated with taval
dir
.
The -l switch causes TA to create ext2/le2a/le2f file systems with options tweaked to maximize usable space. On a single-sided DVD+R, this saves about 78.8 MiB and reduces the overhead of an empty file system to a mere 460 KiB. However, it limits the number of files that can be placed on a single disc to around 566 (576 inodes).
The command used is mke2fs -m 0 -N 1 -O none,sparse_super2,filetype
-I 256 …
. As usual, no space is reserved for super-user, and the
lost+found directory is removed.
The behaviors of non-Linux operating systems are no longer tested. The following issues that were found with Windows XP may or may not persist in more recent versions of Windows.
Charset mismatch: Windows XP interprets agnostic
characters according to its own default code page, which unfortunately
is usually cp437. In theory, you should be able to say
chcp 1252
and then access a disc encoded as ISO 8859-1 with no
trouble. In practice, that doesn't work. The problem is avoided by using
Joliet or UDF, which specify unambiguous encodings for file names.
Lower file size limit: For ISO 9660 discs, the file size limit under Windows XP is (232 − 2048) B = 4294965248 B. Files larger than this produce an "Input/Output error" on attempt to open.
Dates messed up: Dates in the archive are often wrong by an hour (apparently Daylight Savings Time run amok), and years before 1980 are not supported.
Forbidden characters: Windows XP has different rules for what characters are legal in file names. Files whose names contain an asterisk, backslash, or question mark appear to be inaccessible under Windows XP. Files whose names contain a colon or semicolon are accessible from a Cygwin command line, but they cannot be opened in Windows Explorer. If Joliet is used, forbidden characters are suppressed; consequently, the files are accessible under Windows but they do not validate because their names were changed.
Unreliable: Windows XP sometimes has problems reading UDF discs.
Even when -p is used, TA can fail with "permission denied" if a source directory is not writable by owner. This problem is a consequence of how GNU cp propagates permissions and is not efficiently fixable in TA. Running TA as root avoids the problem.
Cause #1: Wrong mount options for iso format. To prevent file names from
being case-smashed, you must mount with the option map=o
;
e.g., mount -t iso9660 -o ro,map=o /dev/cdrom /mnt
. This does
not apply to isoj or isorr.
Cause #2: If your working directory is on a vfat partition,
using the wrong mount options will result in a corrupt
disc. To prevent short file names from being case-smashed, you
must mount the vfat partition with the option
shortname=winnt
.
Cause #3: If you are working in a UTF-8 locale and your file names have UTF-8 encoding errors (broken characters), these may get changed to '?' or '_' when making udf or isoj images. Any affected files will fail validation. If you wish to record the broken filenames as-is, run TA in an ISO 8859-1 locale, where every string is valid (even if it is not readable).
Cause #4: The following characters, which are allowed in Linux file names, are lost in translation to Joliet: *:;?\ Any affected files will fail validation.
Cause #5: If file name lengths exceed the limits shown above, the names will be truncated, assuming that the files get written at all.
Your terminal or desktop is using a different codeset than is assumed for
the file names being listed. For isoj and udf, you can enable translation
to your current codeset using the iocharset
or utf8
mount options; e.g., to read a UDF disc in a Latin-1
locale, mount it with -o ro,iocharset=iso8859-1
. Otherwise,
you have to either change the locale of your terminal or desktop to agree
with the data or run the file names through iconv to make them readable.
mke2fs 1.46.5 (30-Dec-2021) /dev/mapper/TA contains `PGP Secret Sub-key -' data Proceed anyway? (y,N)
This is a known problem with mke2fs that will not be fixed. Encrypted data or random filler can be misinterpreted by libmagic, causing a safety check in mke2fs to issue that spurious warning. Just say y and continue.
Versions of TA prior to version 1.4 supported a file system called ext2aes, which was ext2 plus 256-bit AES encryption via the cryptoloop module of the Linux kernel. ext2aes has no LUKS header to tell you what it is; you just have to know.
With old versions of the kernel and the mount command, you could mount an ext2aes disc with
mount -t ext2 -o ro,encryption=aes /dev/dvd /mnt
. That no
longer works since the cryptoloop kernel module has been completely
deleted. An ext2aes disc can still be mounted as follows:
MAPNAME=`date +%N` # Pick a unique map name cryptsetup -r -c aes-cbc-plain -s 256 -h plain create $MAPNAME /dev/dvd mount -t ext2 -o ro /dev/mapper/$MAPNAME /mnt
To unmount:
umount /mnt cryptsetup remove $MAPNAME
Versions of TA prior to version 1.5 supported a file system called ziso, which was isorr plus a Linux-specific transparent decompression extension. In version 1.5, ziso was replaced by squashfs.
ziso archives can be mounted and unmounted using the same commands as
isorr. However, to read ziso archives with transparent decompression,
you must have a Linux kernel that was compiled with support for transparent
decompression. As of kernel 6.11.3, the relevant option
appears in make menuconfig
as File systems → CD-ROM/DVD
Filesystems → ISO 9660 CDROM file system support → Transparent
decompression extension.
If kernel support is lacking, the content can be non-transparently decompressed using the mkzftree program included in the zisofs-tools package (with the -u option for uncompress).
Any questions, problems, or bug reports for TA should be directed to dave@flaterco.com.
To do if there is a future version:
Version 1.9, 2024-10-20:
Version 1.8, 2013-02-07:
Version 1.7.2, 2011-09-03:
Version 1.7.1, 2011-04-10:
Version 1.7, 2011-03-04:
Version 1.6.1, 2010-12-24:
Version 1.6, 2010-12-15:
Version 1.5.1, 2010-03-29:
Version 1.5, 2010-03-03:
Version 1.4, 2008-08-11:
Version 1.3.1, 2008-03-06:
Version 1.3, 2008-02-29:
Version 1.2.2, 2008-01-25:
Version 1.2.1, 2006-08-25:
Version 1.2, 2006-08-23:
Documentation rev. 2006-07-23: Noted UDF troubles with XP.
Documentation rev. 2006-07-22: Updated troubleshooting info for disk thrashing. Added -v to CD burning command.
Version 1.1.2, 2006-07-04:
Version 1.1.1, 2006-07-04:
Version 1.1, 2006-07-03:
Documentation rev. 2006-05-27: Added example command for burning DVD image. Removed statement about Nero.
Version 1.0, 2006-01-02