mcw-rcc/apache-spark:latest

$ singularity pull shub://mcw-rcc/apache-spark:latest

Singularity Recipe

Bootstrap: docker
From: ubuntu:18.04

%labels
  Maintainer Matthew Flister
  Spark 2.4.4
  Hadoop 2.7

%help
  This container will run Apache Spark.

%environment
  export SPARK_HOME=/opt/spark
  export PATH=${SPARK_HOME}/bin:${PATH}

%post
  export SPARK_VERSION=2.4.4
  export HADOOP_VERSION=2.7

  mkdir -p /scratch/global /scratch/local /rcc/stor1/refdata /rcc/stor1/projects /rcc/stor1/depts

  apt-get update
  apt-get install -y --no-install-recommends \
    openjdk-8-jre \
    python \
    python3 \
    python-dev \
    python3-dev \
    python-setuptools \
    python3-setuptools \
    python-pip \
    python3-pip \
    wget
  
  # install python packages
  pip install --no-binary --upgrade \
    wheel \
    numpy \
    scipy \
    jupyter \
    pandas \
    pyspark

  pip3 install --no-binary --upgrade \
    wheel \
    numpy \
    scipy \
    jupyter \
    jupyterhub \
    pandas \
    pyspark

  #install spark
  mkdir -p /opt/spark
  wget http://mirrors.sonic.net/apache/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
  tar -xzf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz -C /opt/spark --strip-components=1

  rm -rf /var/lib/apt/lists/*

Collection


View on Datalad

Metrics

key value
id /containers/mcw-rcc-apache-spark-latest
collection name mcw-rcc/apache-spark
branch master
tag latest
commit eda86a031fc463dd3b49ebe59bdb36416941ffa1
version (container hash) 026d5a2ca3641aa4940a322c0751e67f
build date 2021-01-26T13:11:44.288Z
size (MB) 2758.0
size (bytes) 1436528671
SIF Download URL (please use pull with shub://)
Datalad URL View on Datalad
Singularity Recipe Singularity Recipe on Datalad
We cannot guarantee that all containers will still exist on GitHub.