ml4ai/UA-hpc-containers:latest

$ singularity pull shub://ml4ai/UA-hpc-containers:latest

Singularity Recipe

Bootstrap: shub
From: singularityhub/ubuntu


%help
This container provides access to a multi-threaded implementation of the Word2Vec algorithm implmented in pure C. This implementation was created by members of the CLU Lab (http://clulab.cs.arizona.edu).

This container can be run by passing in values for the following variables (in-order):
  IN: A file path to a text file that contains a list of sentences to use when training the vector embedding
  OUT: A file path to a text file that can be used to save the generated vectors
  NUM_THREADS: The number of threads to use during training
  SIZE: The window size to use for training over the words in your set of sequences (Common values range from 5 to 10)
  MIN_OCC: The minimum number of times that a word needs to occur in the input file for it to be included in the embedding space (Commonly 5 is used but this input varies largely based on application)

%files
  trunk /opt

%labels
  Maintainer Paul Hein
  Version 1.0

%post
  apt-get -q update
  apt-get -y install cmake time

  mkdir /rsgrps
  mkdir /extra
  cd /opt/trunk
  make word2vec
  cd /

%runscript
  IN=$1
  OUT=$2
  NUM_THREADS=$3
  SIZE=$4
  MIN_OCC=$5

  W2VDIR=/opt/trunk/
  time $W2VDIR/word2vec -train $IN -output $OUT -cbow 0 -size SIZE -window 10 -negative 0 -hs 1 -sample 1e-3 -threads NUM_THREADS -binary 0 -min-count MIN_OCC

Collection


View on Datalad

Metrics

key value
id /containers/ml4ai-UA-hpc-containers-latest
collection name ml4ai/UA-hpc-containers
branch master
tag latest
commit a05784c3a7527b26192f4e66cb2ac176cc4a0062
version (container hash) d02eb4d1e46ca35f88abe52d9b4421c0
build date 2019-11-18T19:09:37.867Z
size (MB) 427
size (bytes) 188485663
SIF Download URL (please use pull with shub://)
Datalad URL View on Datalad
Singularity Recipe Singularity Recipe on Datalad
We cannot guarantee that all containers will still exist on GitHub.