For recreational reasons I wrote a PHP class that classifies files with tags instead of in a hierarchical way, the tags are stored in the filename itself in the form of +tag1+tag2+tagN+MD5.EXTENSION and thus I’m stucked with the chars limit (255) imposed by the FS/OS. Here is the class:
<?php
class TagFS
{
public $FS = null;
function __construct($FS)
{
if (is_dir($FS) === true)
{
$this->FS = $this->Path($FS);
}
}
function Add($path, $tag)
{
if (is_dir($path) === true)
{
$files = array_slice(scandir($path), 2);
foreach ($files as $file)
{
$this->Add($this->Path($path) . $file, $tag);
}
return true;
}
else if (is_file($path) === true)
{
$file = md5_file($path);
if (is_file($this->FS . $file) === false)
{
if (copy($path, $this->FS . $file) === false)
{
return false;
}
}
return $this->Link($this->FS . $file, $this->FS . '+' . $this->Tag($tag) . '+' . $file . '.' . strtolower(pathinfo($path, PATHINFO_EXTENSION)));
}
return false;
}
function Get($tag)
{
return glob($this->FS . '*+' . str_replace('+', '{+,+*+}', $this->Tag($tag)) . '+*', GLOB_BRACE);
}
function Link($source, $destination)
{
if (is_file($source) === true)
{
if (function_exists('link') === true)
{
return link($source, $destination);
}
if (is_file($destination) === false)
{
exec('fsutil hardlink create "' . $destination . '" "' . $source . '"');
if (is_file($destination) === true)
{
return true;
}
}
}
return false;
}
function Path($path)
{
if (file_exists($path) === true)
{
$path = str_replace('\\', '/', realpath($path));
if ((is_dir($path) === true) && ($path[strlen($path) - 1] != '/'))
{
$path .= '/';
}
return $path;
}
return false;
}
function Tag($string)
{
/*
TODO:
Remove (on Windows): . \ / : * ? " < > |
Remove (on *nix): . /
Remove (on TagFS): + * { }
Remove (on TagFS - Possibly!) -
Max Chars (in Windows) 255
Max Char (in *nix) 255
*/
$result = array_filter(array_unique(explode(' ', $string)));
if (empty($result) === false)
{
if (natcasesort($result) === true)
{
return strtolower(implode('+', $result));
}
}
return false;
}
}
?>
I believe this system works well for a couple of small tags, but my problem is when the size of the whole filename exceeds 255 chars. What approach should I take in order to bypass the filename limit? I’m thinking in splitting tags on several hard links of the same file, but the permutations may kill the system.
Are there any other ways to solve this problem?
EDIT – Some usage examples:
<?php
$images = new TagFS('S:');
$images->Add('P:/xampplite/htdocs/tag/geoaki.png', 'geoaki logo');
$images->Add('P:/xampplite/htdocs/tag/cloud.jpg', 'geoaki cloud tag');
$images->Add('P:/xampplite/htdocs/tag/cloud.jpg', 'nuvem azul branco');
$images->Add('P:/xampplite/htdocs/tag/xml-full.gif', 'geoaki auto vin api service xml');
$images->Add('P:/xampplite/htdocs/tag/dunp3d-1.jpg', 'dunp logo');
$images->Add('P:/xampplite/htdocs/tag/d-proposta-04c.jpg', 'dunp logo');
/*
[0] => S:/+api+auto+geoaki+service+vin+xml+29be189cbc98fcb36a44d77acad13e18.gif
[1] => S:/+azul+branco+nuvem+4151ae7900f33788d0bba5fc6c29bee3.jpg
[2] => S:/+cloud+geoaki+tag+4151ae7900f33788d0bba5fc6c29bee3.jpg
[3] => S:/+dunp+logo+0cedeb6f66cbfc3974c6b7ad86f4fbd3.jpg
[4] => S:/+dunp+logo+8b9fcb119246bb6dcac1906ef964d565.jpg
[5] => S:/+geoaki+logo+5f5174c498ffbfd9ae49975ddfa2f6eb.png
*/
echo '<pre>';
print_r($images->Get('*'));
echo '</pre>';
/*
[0] => S:/+azul+branco+nuvem+4151ae7900f33788d0bba5fc6c29bee3.jpg
*/
echo '<pre>';
print_r($images->Get('azul nuvem'));
echo '</pre>';
/*
[0] => S:/+dunp+logo+0cedeb6f66cbfc3974c6b7ad86f4fbd3.jpg
[1] => S:/+dunp+logo+8b9fcb119246bb6dcac1906ef964d565.jpg
[2] => S:/+geoaki+logo+5f5174c498ffbfd9ae49975ddfa2f6eb.png
*/
echo '<pre>';
print_r($images->Get('logo'));
echo '</pre>';
/*
[0] => S:/+dunp+logo+0cedeb6f66cbfc3974c6b7ad86f4fbd3.jpg
[1] => S:/+dunp+logo+8b9fcb119246bb6dcac1906ef964d565.jpg
*/
echo '<pre>';
print_r($images->Get('logo dunp'));
echo '</pre>';
/*
[0] => S:/+geoaki+logo+5f5174c498ffbfd9ae49975ddfa2f6eb.png
*/
echo '<pre>';
print_r($images->Get('geo* logo'));
echo '</pre>';
?>
EDIT: Due to the several suggestions to use a serverless database or any other type of lookup table (XML, flat, key/value pairs, etc) I want to clarify the following: although this code is written in PHP, the idea is to port it to Python and make a desktop application out of it – this has noting to do (besides the example of course) with PHP. Furthermore, if I have to use some kind of lookup table I’ll definitely go with SQLite 3, but what I’m looking for is a solution that doesn’t involves any other additional “technology” besides the filesystem (folders, files and hardlinks).
You may call me nuts but I’m trying to accomplish two simple goals here: 1) keep the system “garbage” free (who likes Thumbs.db or DS_STORE for example?) and 2) keep the files easily identifiable if for some reason the lookup table (in this case SQLite) gets busy, corrupt, lost or forgot (in backups for instance).
PS: This is supposed to run on both Linux, Mac, and Windows (under NTFS).
If you have use of hard/soft links than you might look into giving each tag it’s own directory having a link for each file with that “tag.” Then when you are given multiple tags you can compare those found in both. Then the files could be stored in a single folder and having them unique in name of course.
I don’t know how this would be different from having a meta file named by the tag, then listing all files that exist in that tag.