Optimizing Coda 6.x

From Codawiki

Notes on the tricky questions in setting up coda 6 – as of Apr 13, 05 much of this disagrees with the current outdated documentation.

Table of contents

Server

Number of Servers

Some older documentation talks about using multicast to save bandwidth, but in coda 6.x, this is not the case. So every time you add a new replica server you are increasing the bandwidth required for each write. Also, in strongly connected mode, when a client writes to a coda volume, that file is sent to all servers.

This isn't as bad as it sounds, on a local (fast) network our SFTP window is too small to really saturate a fast link, we also have to wait for the server to write the received data to disk. By pushing to several servers simultaneously we effectively double/triple the SFTP window size and can send data to a second server while the first is busy flushing data to disk. Some measurements I did showed that sending to 2 or 3 servers is as fast (and in some weird situations faster) than sending to a single server. Only by the time we had more than 3 replicas there was a noticable effect on the total transfer time.

However when a replica server is offsite or behind a slower connection the client will consider the whole volume 'weakly connected'. As a result it will run in a degraded consistency mode where it batches writes in the local modification log and send them to the server that has historically had the best connectivity. Once this server has been updated, the client will trigger server-server resolution which is a far more costly process. It involves about 5-6 phases where data is gathered and differences are analyzed by one server, and a solution is then pushed to all replicas, finally the data is refetched so verify that all replicas are now in agreement. This means that the data will end up traversing the slow link at least twice.

RVM (Meta)Data and Log sizes

  • Determine the number of files and directories you want to accommodate. Take expected max number of files times 500 plus expected max number of directories times 3000 and you have your rvm size. Use the rvmsizer.c utility to make this calculation on an existing file tree and then use some multiple (I used 15) to allow for growth.
  • It's no longer necessary to use partitions for rvm metadata or logs from a performance standpoint. In fact, there may be better performance from using a file. However, either way, the rvm meta data file or partition will be mapped entirely into memory so you must have enough RAM (+swap, but you don't really want it swapping, do you?) available.
  • A log size of 20M should work for any setup
  • Note that the size of the RVM is dependent on the number of files and directories across all volumes (not per volume) and has nothing to do with file size.
  • It no longer matters what filesystem type the logs, metadata, and file data are stored under. ext3 is just as efficient as ext2. reiserfs may create problems.
  • A separate partition for file data can be useful in limiting the disk space coda can use and preventing system problems due to disk-full errors.
  • It is desirable to have the RVM Log on a separate disk to speed write operations and reduce disk seeks.

Server Volumes

  • Volumes are used as administrative units to specify quotas and backup scheduling. Note that ACLs (coda permissions) are done per directory, not per volume.
  • Certain operations are done per volume. Coda runs more efficiently with smaller volume sizes and more volumes than the other way around.
  • Volume creation and mounting is arbitrary, but volume names should either match or somehow indicate their mount points (slashes are allowed in volume names).
  • If you will be frequently moving files between two points in a directory tree, it would be advantageous to have those two points in the same volume so that the file doesn't have to be copied byte-by-byte for the move operation.
  • Maximum number of volumes is about 1024 per server.
  • There is no hard limit as to the maximum size of a volume, but according to Jan, “at some point the vnode lookup table becomes unmanageable, probably somewhere between 100K and 200K files [on a single volume].”

Backup

  • Backup is done by freezing volumes and creating snapshots. Using the coda backup system allows coda acls, quotas, etc., to be backed up. tar alone will not backup any file metadata or version information.
  • To copy a server manually from one machine to another you could copy your /vicepa files and then use this process to copy the RVM file:
Old server:
# norton-reinit -rvm <log> <data> <datasize> -dump rvmcontents 
New server:
Create empty rvm log and data. (vice-setup-rvm)
# norton-reinit -rvm <log> <data> <datasize> -load rvmcontents

Client

Client Cache

  • The client has its own RVM metadata for the files in its cache. The RVM data file can't be larger than 1gb, and should be much smaller than the available memory on the client so that coda is not too resource-intensive. The calculation of RVM size in memory is made in the same way as the server: 500 times number of files plus 3000 times number of directories. However, the client needs a much larger cushion as cache operations also occur in RVM. (Calculation details to be added here.)
  • Be sure to limit the number of cache files as well as the amount of data in the venus.conf file so you don't overflow your RVM.
  • If you put your client cache on a ram drive, your cache will not be persistent between venus restarts. To accommodate this, put the line, “dontuservm=1” in your venus.conf file. This will keep the clients very fast and reduce unnecessary disk writes.


Sorry but almost everything in the previous bullet points is incorrect.

  • Only metadata is stored in RVM and file data is stored on disk.
  • RVM can be 2GB or more, the problem for the larger sizes is often that shared libraries, heap or stack overlap with the area where we try to mmap the RVM data segment. This can be somewhat avoided by static linking.
  • Client allocation for files and directories is totally different from the server. A client actually stores more metadata for each cached object. However most of the data structures are pre-allocated when the client is initialized. I have no idea what the cushion thing is about.
  • The only reason why you would want to limit the number of cache files is that there are several functions with O(N^2) or worse behaviour that will cause periodic stalls and in the worst case frequent disconnections.
  • Finally when ramfs/ramdisk is used to cache the file contents it will ofcourse not survive reboots. But it will survive Coda client restarts. If someone uses the 'dontuservm=1' option, then all the metadata will only exist in memory and the cache will not be reusable when the client is restarted.

Jan Harkes 02:45, 17 Apr 2005 (CEST)

Files

  • Coda cannot handle a file larger than 2gb (due to 32bit int storing file size) and will be pretty slow and resource intensive for large files under 2gb.
  • A client cannot read or write a file larger than the amount of RVM they have available.
  • Client must also have enough cache room for the file (which means that a 500mb file copied to /coda will now be using 1gb of disk space on the client.
  • Changing unix file or owner permissions (eg using chmod or chown) on a file or directory is useless and may break your coda implementation. Always use the coda acl facilities (the cfs command) to manage who has what writes to certain directories.


Again a couple of incorrect points.

  • Since file data is stored on disk, you can read/write files larger than the amount of RVM available. However there is a 'cacheblocks' setting which is used to control the size of the client cache. One interesting behaviour results from the fact that we only handle open and close operations and leave the individual read/write operations up to the file system that hosts the on-disk client cache. When a new file is opened/created its initial size is 0 bytes, from then to the time it is closed the cache manager doesn't know what the size is. So the number of cacheblocks is not enforced on write operations. However if the file is larger than the number of cache blocks it will be discarded as soon as we know that is was stored on the server. But the client refuses to fetch the file since it would exceed the size of the cache.
  • Writing a 500MB file can in some cases use up to 1GB of disk space, this can only happen when the client does not immediately write the data back to the server, i.e. if we are in(write-)disconnected mode. The extra space is needed to keep a shadow copy of the file. This allows the user to re-open the file for writing without corrupting the already 'committed' version that is waiting to be reintegrated.
  • Unix mode bits for group and other are ignored. The user bits are only observed in some limited cases, when a file is marked 'r--', it cannot be written to even if the ACL allows write permission. Setting or clearing mode bits is a fairly normal operation and there is no reason why it would break anyone's Coda implementation. The only unusual things are that setuid/setgid bits are intentionally ignored and filtered both when setting and when reading files (basically we enforce nosuid in all cases). Also directories can still be readable even if the '--x' bits are not set, as long as the user was given lookup rights by the Coda ACL. This can be problematic when trying to copy a tree from Coda to a normal unix filesystem since the copied directories will be inaccessible on the normal unix filesystem as soon as the permission bits are copied.

Jan Harkes 03:34, 17 Apr 2005 (CEST)