Rockbox General > Rockbox General Discussion

Need help installing wikipedia - cluster size problem

(1/2) > >>

Lala:
Hi.
I have a txt version of wikipedia and I wanted to run it under rockbox on my Gigabeat. The problem is it's made of 1.6 million very small files (most between 1 and 5 kb). If I try to put it on the FAT32 HD it fills it up incredibly quickly (100 MB fill up 1 GB !) I believe because it is in 32K clusters.
I can't format the drive with 4KB clusters (or smaller) since it says on the web you can use 4K clusters with 8GB HD max. 40GB HD must be 32K min. Is there any way to go around that? I need help. Any suggestion welcome.
I tried to partition the disk but Rockbox only recognizes the first one, so no worky there.

cool_walking_:
Hrm... unfortunately I think you'll just have to live with the wasted space.

Someone mentioned on a gzip patch in the tracker that they were also working on porting tar, which could have helped, but it seems not to have panned out.

You could try smushing the files together. Obviously not all in one giant file, as it would be completely unusable, but just enough to end up with files just under 32K, while only making it mostly unusable.  If I haven't completely forgotten my bash, something like this (untested) script in a Unix environment might do that.


--- Code: ---#!/bin/bash
dest="smushed"
mkdir "$dest"
i=0
for infile in *; do
    if [ -e "$outfile" ]; then
        if [ $(expr $(stat -c "%s" "$outfile") + $(stat -c "%s" "$infile")) -gt 32768 ]; then
            i=$(expr $i + 1)
        fi
    fi
    outfile="$dest/wiki$(printf "%010d" $i).txt" # can't really think of a good naming scheme
    cat "$infile" >> "$outfile"
    echo -e "\n\n\n\n\n" >> "$outfile" # give us some vertical space between articles
done
--- End code ---

Multiplex:
An efficient way of dealing with this might be to write an application to put all the articles in to one great big file with another file that is an index into each article. Obviously you'd need a program to build the files and a plugin to search the article index then display the appropriate article text.
I'm fairly sure that this scheme as been used many times before (I once saw a fortunes application that used an index to make the cookie selection more fair) so you don't have to invent anything from scratch - just adapt existing stuff (including the text viewer plugin for displaying the articles)

cool_walking_:
I think that porting tar would be a better option in the long run, as it could be used for other things, whereas I can't see another use for this index solution.

Multiplex:
Well I'm assuming the value of something like Wiki on your DAP is to have easy access to all that 'information'. I'm also guessing that searching by article name is the preferred option.

Mashing several files together would make that more difficult, but actually some really cute scheme where articles were combined using the leading digits till you get to an efficient file size (ab.wiki holding, Abba, abbreviation, Abscond, etc) might have some merit.

Now I'm on shaky ground with tar but I think that it just stores the files linearly - without a directory/index (each file presumably has a header) so accessing that will not be very efficient (but I may be wrong on that). But Tar might be a good way of compressing the big file in my previous proposal... (must do some research on tar)

Navigation

[0] Message Index

[#] Next page

Go to full version