Zerochan.net is one of famous anime/game/CG imageboards with strong community and modest crossposting with other imageboards.
It has specific tagging system - close to e-shuushuu-net - but not to mainstream danbooru / safebooru / yande-re / konachan / sankaku.
That's why Zerochan is a good distinct source for investigation of non-photographic images and their metadata.
This release devoted to paleonthologic part of board **from very start till 31.12.2014 (ID=1820227)** right before [2015-2016 release](https://nyaa.si/view/1313832)
The enormous volume of initial images (~650 GB) brought within reasonable limit with **selective sampling** described below.
#### Release contains:
- **313.044 JPG images in 183 zipped folders** partitioned by 10.000-th ID
* filtered by initial size
~ least(image_height,image_width)>=1080 -- fullHD wallpapers as minimum
~ image_height*image_width>=1200000 -- 1100x1100 included
~ image_width/image_height between 0.32 and 3.2 -- not too disproportional
* renamed **"zerochan - id - up_to_3_sources ~ up_to_5_characters (up_to_2_artists).ext"**
~ tags concatenated via "+", spaces replaced with underscores
~ maximum file name length 220 symbols, characters tags may be truncated if too long
* some gentle deduplication made
- metadata for every image **ZERO_POSTS_2014.TSV** in root folder **313.036 rows**
* from imageboard (original file URL, upload date)
* image info (size, volume, md5 etc) both for original and sample
- tag info for Copyright / Characters / Artists and more **ZERO_TAGS_2014.TSV** with 1.815.947 rows
* as parsed from site with Unicode suppressed / replaced
* many of them used in file naming but there are a lot of more
#### About sampling:
Huge total size of initial images leads to unpractical torrent release - too big and not too worthy to be supportible.
I desided to selectively shrink big / bloated images to practical size with good quality, that was chosen as
**1920 px longer side (Full HD both landscape and portrait) and JPEG quality 92%**
I used **image magick mogrify -thumbnail 1920x1920^> -quality 92 -format jpg** and then
compare initial and sampled image to left initial image when negative or minor effect of resampling.
There are **90.375 original images** left in release, as pointed out in IMG_TYPE column of ZERO_POSTS.
[THERE ARE](https://sukebei.nyaa.si/user/AlexPUA) some - also sampled - rips on Sukebei tracker for Konachan and Yande-re. With nipples.
Comments - 1
SomaHeir