(Version 1.8)
Copyright © 2005 David Flater.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
TA, the Transparent Archivist, is a front-end program that reduces
the fuss and muss involved in building archives under Linux. It
is "transparent" in that the archives that it produces can be
examined with ls
and restored with cp
.
You do not need TA to recover an archive built by TA.
TA is not a disc burning program. TA builds disc images but does not burn them. The value added by TA is:
TA offers eight choices of file system for the archives, including three variants of ISO 9660, three variants of ext2, squashfs, and UDF. However, UDF support is experimental.
Not all packages are required in all modes (e.g., you don't need squashfs-tools unless you are making squashfs discs).
Package / program | Version tested |
---|---|
Linux kernel | 3.7.2 |
bash | 4.2 |
GNU coreutils (cp and rm) | 8.19 |
e2fsprogs (mke2fs and tune2fs) | 1.42.6 |
file | 5.11 |
GNU find | 4.4.2 |
libdstr | 1.0 |
mhash (libmhash) | 0.9.9.9 |
zlib | 1.2.6 |
cdrtools (mkisofs) | 3.01a08 * |
udftools (mkudffs) | 1.0.0b3 |
util-linux (losetup, mount) | 2.21.2 |
squashfs-tools (mksquashfs) | 4.2 |
* In certain Linux distributions, mkisofs is a symbolic link to, or simply replaced by, a program called genisoimage. This is not cdrtools but a forked project known as cdrkit. Genisoimage has issues and its use with TA is not supported.
To run TA you must have plenty of free disk space. In addition to the room needed for the final images, iso, isoj, isorr and squashfs require room for a temporary copy of all of the files on a given disc.
To build ext2, le2a, le2f and udf archives you must have sufficient privileges to mount and unmount file systems.
Preserving ownership on archived files or archiving files with unfriendly permissions requires TA to be run as root.
Your kernel must support whatever file systems you are using.
The limits stated below for the various file systems are not standard or theoretical limits, but real limits determined by testing under Linux 2.6.x. Your mileage should not vary.
Identifier | Description | File name encoding | File name length limit | File size limit | Date limits | File ownership, permissions, symbolic links? | Compression? | Problems reading under Windows XP |
isoj | ISO-9660:1988 Level 1 extended by Joliet and deep directories | UTF-16* | 206 B (103 char.) | (232 − 2) B = 4294967294 B | 1970–2027 | No | No | Lower file size limit, dates messed up |
iso | ISO-9660:1999 | 8-bit agnostic | 207 B | (232 − 2) B = 4294967294 B | 1970–2027 | No | No | Charset mismatch, forbidden characters, lower file size limit, dates messed up |
isorr | iso plus Rock Ridge protocol | 8-bit agnostic | 248 B | (232 − 2) B = 4294967294 B | 1970–2027 | Yes | No | Rock Ridge ignored, names truncated, charset mismatch, forbidden characters, lower file size limit, dates messed up |
ext2 | Linux file system | 8-bit agnostic | 255 B | > 9 GiB | 1902–2037 | Yes | No | No support |
le2a | ext2 plus 256-bit AES encryption, LUKS-compliant | 8-bit agnostic | 255 B | > 9 GiB | 1902–2037 | Yes | No | No support |
le2f | ext2 plus 256-bit Twofish encryption, LUKS-compliant | 8-bit agnostic | 255 B | > 9 GiB | 1902–2037 | Yes | No | No support |
squashfs | Compressed file system | 8-bit agnostic | 255 B | > 9 GiB | 1902–2037 | Yes | gzip | No support |
udf | Universal Disk Format (EXPERIMENTAL) | UTF-8 | 254 B | > 9 GiB | 1970–2037 | Yes | No | Unreliable, dates messed up |
* The following characters, which are allowed in Linux file names, are lost in translation to Joliet: *:;?\
UDF support is experimental. Kernel stability problems have been experienced while populating UDF file systems under Linux.
TA does not handle any file types other than regular files, directories, and symbolic links.
TA does not preserve empty directories.
TA does not preserve timestamps on directories.
TA does not handle files that are too big to fit on the target media in one piece. TA does not split files and it does not reorder files when filling up discs.
You may not archive a file in the root directory called ta-hashes.txt, because that is where ta stores the hashes.
If a file to be archived changes while TA is running, its hash will be wrong. If a file to be archived is deleted while TA is running, TA will exit with an error.
TA is packaged with the popular and portable GNU automake, so
all usual GNU tricks should work. Help on configuration options can be
found in the INSTALL file or obtained by entering ./configure
--help
.
Normally, one should only need to do the following to compile and install the programs ta, tahash, and taval:
bash-3.1$ ./configure bash-3.1$ make bash-3.1$ su bash-3.1# make install
The distribution includes source for a program called createfile that is
only useful for testing TA. It is not normally built. If for some
strange reason you want to build it, configure with
--enable-test-progs
.
Usage: ta [options] discsize filesystem workingdir imagedir src [src...] Options: -l Tweak ext2/le2a/le2f to maximize usable space. -nornd Don't initialize encrypted volumes with random data. -p Force file permissions to reasonable defaults. -r src Replicate file src in *every* image. -w Wait for confirmation after completing each image. discsize: cd74, cd80, dvd+r, dvd+rdl, or an arbitrary size specified in bytes. filesystem: iso, isoj, isorr, ext2, le2a, le2f, squashfs or udf. workingdir: for ext2, le2a, le2f or udf this is just a mount point that we can use. For others, this must be an existing, empty directory that we can fill up and then wipe clean. imagedir: disc images will be written here, overwriting any files that are in the way. Make sure it is on a file system that can handle big files if you are creating DVD-sized images. src: stuff to archive. Should usually be a directory, but you can do single files if you want.
The identifiers cd74 and cd80 refer to 74- and 80-minute CD-Rs or RWs. The identifiers dvd+r and dvd+rdl refer to DVD+R/RW and DVD+R DL. Standard capacities are not available for DVD-R/RW or DVD-R DL.
The –p option will set the permissions on directories and executable files to rwxr-xr-x and on non-executable files to rw-r--r--. (Although the archive will be read-only, making files unwritable by owner creates more trouble than it is worth.) For iso and isoj this option has no effect.
The –w option is useful if you have inadequate disk space to store all of the images being produced. TA will wait for you to burn and delete the previous image before beginning the next one.
The –l option is useful only if you need a few more megabytes to fit a few large files onto a DVD. See details below.
The –nornd option will speed up the creation of encrypted volumes for le2a and le2f filesystems at the cost of not obfuscating the location of encrypted data on a less-than-full volume.
The translation of src paths into paths within the image is done more or less the way that tar does it: /mumble/foo (absolute) and mumble/foo (relative) both translate to mumble/foo in the image. However, references to "." are removed from the final file names, and references to ".." are not allowed.
TA leaves the disc images in the directory that you specify as imagedir. Disc images are named image001.iso, image002.iso, and so forth. Even non-iso images are called .iso because anything else can confuse disc-burning applications.
If you are building DVD images, imagedir must be on a file system that can support files larger than 4 GiB (i.e., not vfat).
It is not a good idea to do other work on the side while TA is archiving. If you modify a file in TA's list, the hash will be wrong and the file will not validate. If you delete a file in TA's list, TA will fail.
You can use whatever disc burning application you like to burn the images to disc. Following are sample commands that seem to work under Linux. Your mileage may vary.
Target media | Burning command |
---|---|
CD-R | cdrecord –v dev=/dev/cdrom –dao image001.iso |
CD-RW | cdrecord –v blank=fast dev=/dev/cdrom –dao image001.iso |
DVD+R/RW/DL | growisofs –dvd-compat –Z /dev/dvd=image001.iso |
(N.B., using the source of cdrtools-2.01, the sao and dao options to cdrecord are completely equivalent.)
In theory, mounting an archive is a simple application of the standard
mount
command, but there are enough special cases to warrant
the following quick reference guide.
iso: |
mount -t iso9660 -o ro,map=o /dev/cdrom /mnt
| |
isoj, isorr: |
mount -t iso9660 -o ro /dev/cdrom /mnt (or -o ro,utf8 to read a Joliet disc in a UTF-8 locale)
| |
ext2: |
mount -t ext2 -o ro /dev/cdrom /mnt
| |
udf: |
mount -t udf -o ro /dev/cdrom /mnt (or -o ro,iocharset=iso8859-1 to read a UDF disc in a Latin-1 locale)
| |
squashfs: |
mount -t squashfs -o ro /dev/cdrom /mnt
| |
le2a, le2f: |
MAPNAME=`date +%N` # Pick a unique map name
|
Unmounting:
le2a, le2f: |
umount /mnt
| |
Everything else: |
umount /mnt
|
Use of le2a and le2f archives can be simplified by adding the following to your ~/.bash_profile:
function mountle2 { MAPNAME=`date +%N` # Pick a unique map name LOOPDEV=`losetup -r -f --show $1` cryptsetup --readonly luksOpen $LOOPDEV $MAPNAME mount -t ext2 -o ro /dev/mapper/$MAPNAME /mnt echo $1 "is now mounted on /mnt. Exit this shell to unmount it." PS1="le2# " bash -i umount /mnt cryptsetup luksClose $MAPNAME losetup -d $LOOPDEV }
Then you need type only mountle2 /dev/dvd
and the
passphrase to mount an encrypted archive and exit
to unmount
it:
bash-3.1# mountle2 /dev/dvd Enter LUKS passphrase: key slot 0 unlocked. Command successful. /dev/dvd is now mounted on /mnt. Exit this shell to unmount it. le2# ls -l /mnt total 177 -rwxr-xr-x 1 root root 178843 2008-08-08 16:46 ta -rw-r--r-- 1 root root 134 2008-08-11 13:18 ta-hashes.txt le2# exit exit bash-3.1#
To validate a disc, mount the disc someplace and do taval
someplace
. Taval will check the hashes on all regular
files.
mount -o ro,map=o /dev/cdrom /mnt taval /mnt
For a second opinion on the validity of a given file, you can
manually compare the contents of /ta-hashes.txt with the output of
gpg --print-md sha512
.
Taval only checks the contents of files that are listed in /ta-hashes.txt. It does not ensure that the dates, permissions, or other metadata were correctly preserved, nor does it notice if other files were added.
As a convenience, taval can also validate an archive against a file of MD5
hashes that was produced by some other program, e.g., md5sum. To
validate an archive against MD5 hashes instead of ta-hashes.txt, use the
–md5 switch of taval: taval -md5 md5file
dir
.
Each line of the MD5 hashfile must be 32 bytes of data, two spaces, and a filename:
d41d8cd98f00b204e9800998ecf8427e null
Since the archives are completely transparent, you can go directly to the disc(s) and directories that you want if you are in a hurry to retrieve something specific.
mount -o ro,map=o /dev/cdrom /mnt cd /mnt ls
Otherwise, repeat for each disc:
mount -o ro,map=o /dev/cdrom /mnt cp -a /mnt/* / umount /mnt
If not running as root, you might have to change some permissions in order to get all of the files to copy in. When done, delete the extraneous file /ta-hashes.txt.
In some cases, e.g. an FTP archive, it is handy to generate hashes
without getting involved in making disc images. You can do this
with tahash dir
, and the directory's contents can
subsequently be validated with taval dir
.
The –l switch causes TA to create ext2/le2a/le2f file systems with tweaked options to maximize usable space. On a single-sided DVD+R, this reduces the overhead on a newly created, empty filesystem from 81176 KiB to a mere 500 KiB. However, it limits the number of files that can be placed on a single disc to around 1100 (1152 inodes), eliminates extended attributes, and may slow down file access (no B-tree indices).
The command used is mke2fs -m 0 -N 1 -O none,sparse_super,filetype
-I 128 ...
As usual, no space is reserved for super-user, and
the lost+found directory is removed.
Character sets are irrelevant if you never venture beyond the plain old U.S. keyboard for naming your files. However, if your file names have umlauts in them, then you'll need to pay attention to this.
"8-bit agnostic" file name encoding means that file names are
recorded as strings of 8-bit characters with no translation.
Whether your ambient codeset is ISO 8859-1 (the Linux default), UTF-8,
or whatever, that is what goes on the disc. As long as you read
the disc in the same context in which it was recorded, all file names
should survive intact. Alas, Windows XP interprets agnostic
characters according to its own default code page, which unfortunately
is usually cp437. In theory, you should be able to say
chcp 1252
and then access a disc encoded as ISO 8859-1
with no trouble. In practice, that doesn't work.
One solution is to switch your Linux environment over to code page 437. Yeah. Right. The better solution is to use isoj.
Joliet and UDF don't have the character set portability problem because they both specify unambiguous encodings for file names. All characters translate correctly under Windows XP as long as the codeset assumed when writing the disc was correct. TA determines your codeset from your locale. If your codeset is supported, whatever you see in a directory listing under Linux is what you should see under Windows.
Joliet has only been tested with ISO 8859-1, but other ISO 8859 codesets should work. UTF-8 is not supported for isoj because it is not supported by mkisofs.
UDF has only been tested with ISO 8859-1 and UTF-8, but other ISO 8859 codesets should work.
Codesets other than ISO 8859 and UTF-8 definitely will not work with isoj or udf because I cannot guess how to map the codeset names reported by nl_langinfo to the codeset names expected by mkisofs and mount.
Ironically, it is less simple to read Joliet and UDF file names
correctly under Linux than it is under Windows because the mount program
does not automatically detect your locale. Joliet discs default
to ISO 8859-1 translation and UDF discs default to UTF-8. To
read a Joliet disc in a UTF-8 locale, use the utf8
mount
option, e.g., mount -t iso9660 -o ro,utf8 /dev/cdrom
/mnt
. To read a UDF disc in an ISO 8859-1 locale, use the
iocharset=iso8859-1
mount option, e.g., mount -t
udf -o ro,iocharset=iso8859-1 /dev/cdrom /mnt
.
Lower file size limit: For ISO 9660 discs, the file size limit under Windows XP is (232 − 2048) B = 4294965248 B. Files larger than this produce an "Input/Output error" on attempt to open.
Dates messed up: Dates in the archive are often wrong by an hour (apparently Daylight Savings Time run amok), and years before 1980 are not supported.
Forbidden characters: Windows XP has different rules for what characters are legal in file names. Files whose names contain an asterisk, backslash, or question mark appear to be inaccessible under Windows XP. Files whose names contain a colon or semicolon are accessible from a Cygwin command line, but they cannot be opened in Windows Explorer. If Joliet is used, forbidden characters are suppressed; consequently, the files are accessible under Windows but they do not validate because their names were changed.
Unreliable: UDF support is experimental. Windows XP sometimes has problems reading UDF discs, for reasons unknown.
Even when –p is used, TA can fail with "permission denied" if a source directory is not writable by owner. This problem is a consequence of how GNU cp propagates permissions and is not efficiently fixable in TA. The recommended workaround is to run TA as root.
The following configuration changes suffice to adjust the attitude of Nero Express 6.
Under Configure→General, uncheck "Check for correct disc format before burning" and "Check Joliet file names before burning."
Under Configure→Misc, set "Burn DVD if the data compilation does not fit on CD" to Always.
For some reason, when Nero Express 6 is presented with a CD image that is precisely at the capacity of the disc, it claims that it is 0.01 second too long (79:57.74 instead of 79:57.73). Since by default TA creates ext2, le2a, le2f and udf images to fill the entire disc, you may need to enable overburning to write these types of images, or else manually set the TA discsize slightly smaller.
No solution found except to use different burning software.
Cause #1: The default Linux mount options are wrong for
iso. The disc is actually fine; you just need to mount it
differently. To prevent file names from being case-smashed, you
must mount with the option map=o
; e.g., mount -t iso9660 -o
ro,map=o /dev/cdrom /mnt
. This problem only occurs with iso.
Cause #2: If your working directory is on a vfat partition,
using the wrong mount options will result in a corrupt
disc. To prevent short file names from being case-smashed, you
must mount the vfat partition with the option
shortname=winnt
.
Cause #3: Character set problems. This is very likely
if you are trying to read an isoj disc in a UTF-8 locale or a UDF disc
in the default Linux locale. To read a Joliet disc in a UTF-8
locale, use the utf8
mount option, e.g., mount -t
iso9660 -o ro,utf8 /dev/cdrom /mnt
. To read a UDF disc in
an ISO 8859-1 locale, use the iocharset=iso8859-1
mount
option, e.g., mount -t udf -o ro,iocharset=iso8859-1 /dev/cdrom
/mnt
. See About character sets and
portability for more information.
Cause #4: The following characters, which are allowed in Linux file names, are lost in translation to Joliet: *:;?\ Any affected files will fail validation.
Versions of TA prior to version 1.4 supported a file system called ext2aes, which was ext2 plus 256-bit AES encryption via the now deprecated cryptoloop module of the Linux kernel. ext2aes has no LUKS header to tell you what it is; you just have to know.
To mount an ext2aes disc, use this command: mount -t ext2 -o
ro,encryption=aes /dev/dvd /mnt
. It will prompt you
for a password. Given the correct one, the disc contents should then
appear under /mnt. When finished, just umount /mnt
.
If your kernel does not include the deprecated cryptoloop module or you just want to do it the hard way, you can also mount an ext2aes disc using dm-crypt, as follows:
MAPNAME=`date +%N` # Pick a unique map name LOOPDEV=`losetup -f -s /dev/dvd` cryptsetup -c aes-cbc-plain -s 256 -h plain create $MAPNAME $LOOPDEV mount -t ext2 -o ro /dev/mapper/$MAPNAME /mnt
To unmount:
umount /mnt cryptsetup remove $MAPNAME losetup -d $LOOPDEV
Versions of TA prior to version 1.5 supported a file system called ziso, which was isorr plus a Linux-specific transparent decompression extension. In version 1.5, ziso was replaced by squashfs.
ziso archives can be mounted and unmounted using the same commands as
isorr. However, to read ziso archives with transparent decompression,
you must have a Linux kernel that was compiled with support for transparent
decompression. As of kernel.org kernel 2.6.27.31, the relevant option
appears in make menuconfig
as File systems → CD-ROM/DVD
Filesystems → ISO 9660 CDROM file system support → Transparent
decompression extension.
If kernel support is lacking, the content can be non-transparently decompressed using the mkzftree program included in the zisofs-tools package (with the –u option for uncompress).
You must install the setperm script (provided in the TA distribution) somewhere in your path. Setperm is a shell script that automagically decides what permissions a file should get.
If multiple physical hard drives are available, the best solution by far is to locate the source files, workingdir, and imagedir in such a way that all copy operations are going from one drive to another instead of from one place to another place on the same drive. For ext2, le2a, le2f or udf, this means that imagedir should be on a different physical device than the source files. For all other file systems, this means that workingdir should be on a different physical device than the source files, and imagedir should be on a different physical device than workingdir (but it can be on the same device as the source files).
The efficiency of same-drive copies depends heavily on the behavior of the
I/O scheduler. As of version 2.6.17.4 of the Linux kernel, three
different I/O schedulers are available. Disk thrashing has been a problem
with the one called Anticipatory I/O scheduler. If you can switch to the
CFQ I/O scheduler, you should do so. If not, the script
stop-thrashing.sh
, provided in the TA distribution and shown
below, will tune the anticipatory scheduler in favor of large file copies to
correct the thrashing. The changes that it makes are temporary and will
revert after a reboot.
#!/bin/bash # # EDIT THIS SCRIPT: # Replace sda with the name of your hard disk device (e.g., hda). # # Documentation about the tuning parameters is in # /usr/src/linux/Documentation/block/as-iosched.txt. echo 1000 > /sys/block/sda/queue/iosched/read_expire echo 1000 > /sys/block/sda/queue/iosched/write_expire echo 1000 > /sys/block/sda/queue/iosched/read_batch_expire echo 1000 > /sys/block/sda/queue/iosched/write_batch_expire
It is not uncommon for hard disk drives in PC type systems to enter invalid states if they are thrashed for several minutes. The system locks up and the hard disk either does some looping behavior or takes itself offline. If this happens, you can recover by pressing the reset button or cycling power.
Any questions, problems, or bug reports for TA should be directed to dave@flaterco.com.
Version 1.8, 2013-02-07:
Version 1.7.2, 2011-09-03:
Version 1.7.1, 2011-04-10:
Version 1.7, 2011-03-04:
Version 1.6.1, 2010-12-24:
Version 1.6, 2010-12-15:
Version 1.5.1, 2010-03-29:
Version 1.5, 2010-03-03:
Version 1.4, 2008-08-11:
Version 1.3.1, 2008-03-06:
Version 1.3, 2008-02-29:
Version 1.2.2, 2008-01-25:
Version 1.2.1, 2006-08-25:
Version 1.2, 2006-08-23:
Documentation rev. 2006-07-23: Noted UDF troubles with XP.
Documentation rev. 2006-07-22: Updated troubleshooting info for disk thrashing. Added –v to CD burning command.
Version 1.1.2, 2006-07-04:
Version 1.1.1, 2006-07-04:
Version 1.1, 2006-07-03:
Documentation rev. 2006-05-27: Added example command for burning DVD image. Removed statement about Nero.
Version 1.0, 2006-01-02