martes, 13 de agosto de 2013

Automatic detection of file renames for Darcs

In the last few weeks i was implementing automatic detection of file renames adding "look-for-moves" flag to the amend-record, record, and whatsnew commands.

In darcs are 3 states:
  • The recorded state is the one is marked by the last record made.
  • The working state is the actual state of the files in the repository with all the last changes.
  • The pending state is the one that mark changes like file adds, moves, replaces, etc, before they are recorded. Is a temporal state between recorded and working that let darcs know about what filenames to track, and changes that are not common like replaces.

 If a file rename is not marked in the pending state, darcs lost track of the file and can't know where it is, and then `darcs whatsnew` and `darcs record` will indicate the file as deleted.
To detect this file rename I choose to use the inode info in the filesystem to check for equality between different filenames in the recorded and working state of the repo. for those who don't know, the inode is an index number assigned by the file system to identify a specific file data. The file name is linked to the data by this number, and it's used by directories as well. You can consult this number with "ls -i".
⮁ mkdir testdir
⮁ touch testfile
⮁ ln testfile testfile.hardlink
⮁ ln -s testfile testfile.symboliclink
⮁ ls -i1
10567718 testdir
10485776 testfile
10485776 testfile.hardlink
10485767 testfile.symboliclink 
You can see that the hardlink shares the same number with the test file, this is because a file is essentially a hardlink to the file data and when you make a new hardlink you are sharing the same file data, so the same inode number.
To have an old inode to filename mapping, there must be some record of the files inodes in some place, so I added the inode info to the index of hashed-storage in _darcs/index. The index save the last info about the record plus the pending state, sort of, so is a perfect fit to save this info.
Then comparing the RecordedAndPending Tree(from the index) with the Working Tree i get the file changes in a pair list mapping between the two states. With this list I resolve dependencies between the different moves, making temporal names if it's necessary and generating a FL list of move patches to merge with the changes between pending and working patches.
This patches are shown in with whatsnew or are selected with record/amend-record to be recorded in the repo.
There is a little more to make this happen but that's the core idea of the implementation.
The algorithm doesn't care if the file are modified or not, because it doesn't care of the content of the files, so it's very robust in that sense.
With this implementation you could do any move directly with "mv", and is very lightweight and fast in detecting moves so is likely a good decision make "--look-for-moves" a default flag. You could do things like this:
⮁ darcs init
Repository initialized.
touch foo
darcs record -a -m add_file_foo -A x --look-for-adds
Finished recording patch 'add_file_foo'
mv foo foo2
darcs whatsnew --look-for-moves
move ./foo ./foo2
This doesn't work on Windows yet, because fileID(the function on unix-compat that get the inode number) is lacking an implementation on windows. I know the windows API have GetFileInformationByHandle (it returns a BY_HANDLE_FILE_INFORMATION structure that contains the file index number[1]), so there doesn't have to be too hard to add an implementation of it with some boilerplate code to make the interface.
More complicated moves should work and some does but I was having problems with the dependency resolving algorithm implementation. I made some mistakes in the first implementation and I'm dragging them since then. I'm confident to know what is the error so I will fix it soon.
UPDATE: i'm testing a windows implementation with the Win32 haskell library on a virtual machine.

No hay comentarios:

Publicar un comentario en la entrada