Rockbox.org home
Downloads
Release release
Dev builds dev builds
Extras extras
themes themes
Documentation
Manual manual
Wiki wiki
Device Status device status
Support
Forums forums
Mailing lists mailing lists
IRC IRC
Development
Bugs bugs
Patches patches
Dev Guide dev guide
Search



Donate

Rockbox Technical Forums


Login with username, password and session length
Home Help Search Staff List Login Register
News:

Rockbox Ports are now being developed for various digital audio players!

+  Rockbox Technical Forums
|-+  Rockbox General
| |-+  Rockbox General Discussion
| | |-+  Need help installing wikipedia - cluster size problem
« previous next »
  • Print
Pages: [1]

Author Topic: Need help installing wikipedia - cluster size problem  (Read 2309 times)

Offline Lala

  • Member
  • *
  • Posts: 5
Need help installing wikipedia - cluster size problem
« on: December 21, 2007, 10:13:57 AM »
Hi.
I have a txt version of wikipedia and I wanted to run it under rockbox on my Gigabeat. The problem is it's made of 1.6 million very small files (most between 1 and 5 kb). If I try to put it on the FAT32 HD it fills it up incredibly quickly (100 MB fill up 1 GB !) I believe because it is in 32K clusters.
I can't format the drive with 4KB clusters (or smaller) since it says on the web you can use 4K clusters with 8GB HD max. 40GB HD must be 32K min. Is there any way to go around that? I need help. Any suggestion welcome.
I tried to partition the disk but Rockbox only recognizes the first one, so no worky there.
Logged

Offline cool_walking_

  • Rockbox Expert
  • Member
  • *
  • Posts: 695
Re: Need help installing wikipedia - cluster size problem
« Reply #1 on: December 21, 2007, 11:57:07 AM »
Hrm... unfortunately I think you'll just have to live with the wasted space.

Someone mentioned on a gzip patch in the tracker that they were also working on porting tar, which could have helped, but it seems not to have panned out.

You could try smushing the files together. Obviously not all in one giant file, as it would be completely unusable, but just enough to end up with files just under 32K, while only making it mostly unusable.  If I haven't completely forgotten my bash, something like this (untested) script in a Unix environment might do that.

Code: [Select]
#!/bin/bash
dest="smushed"
mkdir "$dest"
i=0
for infile in *; do
    if [ -e "$outfile" ]; then
        if [ $(expr $(stat -c "%s" "$outfile") + $(stat -c "%s" "$infile")) -gt 32768 ]; then
            i=$(expr $i + 1)
        fi
    fi
    outfile="$dest/wiki$(printf "%010d" $i).txt" # can't really think of a good naming scheme
    cat "$infile" >> "$outfile"
    echo -e "\n\n\n\n\n" >> "$outfile" # give us some vertical space between articles
done
Logged

Offline Multiplex

  • Member
  • *
  • Posts: 440
Re: Need help installing wikipedia - cluster size problem
« Reply #2 on: December 22, 2007, 05:18:39 AM »
An efficient way of dealing with this might be to write an application to put all the articles in to one great big file with another file that is an index into each article. Obviously you'd need a program to build the files and a plugin to search the article index then display the appropriate article text.
I'm fairly sure that this scheme as been used many times before (I once saw a fortunes application that used an index to make the cookie selection more fair) so you don't have to invent anything from scratch - just adapt existing stuff (including the text viewer plugin for displaying the articles)
Logged

Offline cool_walking_

  • Rockbox Expert
  • Member
  • *
  • Posts: 695
Re: Need help installing wikipedia - cluster size problem
« Reply #3 on: December 22, 2007, 06:22:32 AM »
I think that porting tar would be a better option in the long run, as it could be used for other things, whereas I can't see another use for this index solution.
Logged

Offline Multiplex

  • Member
  • *
  • Posts: 440
Re: Need help installing wikipedia - cluster size problem
« Reply #4 on: December 23, 2007, 05:52:01 AM »
Well I'm assuming the value of something like Wiki on your DAP is to have easy access to all that 'information'. I'm also guessing that searching by article name is the preferred option.

Mashing several files together would make that more difficult, but actually some really cute scheme where articles were combined using the leading digits till you get to an efficient file size (ab.wiki holding, Abba, abbreviation, Abscond, etc) might have some merit.

Now I'm on shaky ground with tar but I think that it just stores the files linearly - without a directory/index (each file presumably has a header) so accessing that will not be very efficient (but I may be wrong on that). But Tar might be a good way of compressing the big file in my previous proposal... (must do some research on tar)

Logged

Offline cool_walking_

  • Rockbox Expert
  • Member
  • *
  • Posts: 695
Re: Need help installing wikipedia - cluster size problem
« Reply #5 on: December 23, 2007, 06:16:50 AM »
Tar is like sort of like .zip, .rar, etc. in the Windows world, except it does no compression.  It's very common in the Unix world, and is mostly combined with gzip or bzip2 (compression programs) so you end up with a file named blah.tar.gz or blah.tar.bz2.  I was thinking of having tar implemented transparently into the file browser, like KDE's KIOslaves, or Windows Explorer's ZIP file browser, so you navigate into the .tar file as if it were a directory.  I have no idea how feasible this is, but I thought it sounded good because it can also be used for images, etc.
« Last Edit: December 23, 2007, 06:20:25 AM by cool_walking_ »
Logged

Offline Multiplex

  • Member
  • *
  • Posts: 440
Re: Need help installing wikipedia - cluster size problem
« Reply #6 on: December 28, 2007, 05:09:48 AM »
Yeah, I ended up asking my brother (also a Rockbox user), I see the logic behind adding an archive reader into a filesystem.
My concern about efficiency remains though.
In the Rockbox environment a linear search through the file (even skiping from herader to header)  has a relatively high cost (Ultimately in battery life) - note how long it takes Windows to open a ZIP file as a subdirectory... (not that I understand that - I thought that ZIP held some kind of directory listing - it aught to be possible to go straight to that).

@Lala - how do you get the files in your text format ?
I may have a play with various file formats (on a PC) to assess efficiency of access
I suppose someone should also ask why? - are you allowed to take music players into exams ;-)
Logged

Offline Didgeridoohan

  • Member
  • *
  • Posts: 102
Re: Need help installing wikipedia - cluster size problem
« Reply #7 on: December 28, 2007, 07:29:59 AM »
Isn't there a patch (or two) for putting wikipedia on your DAP? Try doing a search for wikipedia on flyspray...
Logged
Remember, the MANUAL, WIKI and the SEARCH funtions are your friends.

Offline cool_walking_

  • Rockbox Expert
  • Member
  • *
  • Posts: 695
Re: Need help installing wikipedia - cluster size problem
« Reply #8 on: June 03, 2009, 11:53:20 AM »
Maybe I'm a little (or a lot) late, but I just realised you could use HAVE_MULTIVOLUME to let Rockbox access multiple partitions.

http://rockbox.org/wiki/BigDisk
Logged

Offline froggyman

  • Member
  • *
  • Posts: 214
Re: Need help installing wikipedia - cluster size problem
« Reply #9 on: June 03, 2009, 04:16:48 PM »
OR you could just use FS#4755 and have ability to search and follow links
Logged
iPod Video 5.5G 30GB - Now Dead :(
Sansa Fuzev2 4GB

"To prevent this day from getting worse, I'll just read ERROR as GOOD THING"

  • Print
Pages: [1]
« previous next »
+  Rockbox Technical Forums
|-+  Rockbox General
| |-+  Rockbox General Discussion
| | |-+  Need help installing wikipedia - cluster size problem
 

  • SMF 2.0.17 | SMF © 2019, Simple Machines
  • Rockbox Privacy Policy
  • XHTML
  • RSS
  • WAP2

Page created in 0.093 seconds with 14 queries.