Why does the YARN application still use resources after the Spark job that I ran on Amazon EMR is completed?

4 minute read
0

I'm running a Jupyter or Zeppelin notebook on my Amazon EMR cluster. The YARN application continues to run even after the Apache Spark job that I submitted is completed.

Short description

When you run a Spark notebook in Zeppelin or Jupyter, Spark starts an interpreter. The interpreter creates a YARN application. This application is the Spark driver that shows up when you list applications. The driver doesn't terminate when you finish running a job from the notebook. By design, the Spark driver stays active so that it can request application containers for on-the-fly code runs. The downside is that the YARN application might be using resources that other jobs need. To resolve this issue, you can manually stop the YARN application. Alternatively, you can set a timeout value that automatically stops the application.

Resolution

Zeppelin

Option 1: Restart the Spark interpreter

Before you begin, be sure that you have permissions to restart the interpreter in Zeppelin.

1.    Open Zeppelin.

2.    From the dropdown list next to the user name, choose Interpreter.

3.    Find the Spark interpreter, and then choose restart. Zeppelin terminates the YARN job when the interpreter restarts.

Option 2: Stop the YARN job manually

Before you begin, be sure of the following:

  • You have SSH access to the Amazon EMR cluster.
  • You have the permission to run YARN commands.

Use the -kill command to terminate the application. In the following example, replace application_id with your application ID.

yarn application -kill application_id

Option 3: Set an interpreter timeout value

Zeppelin versions 0.8.0 and later (available in Amazon EMR versions 5.18.0 and later) include a lifecycle manager for interpreters. Use the TimeoutLifecycleManager setting to terminate interpreters after a specified idle timeout period:

1.    Create a etc/zeppelin/conf/zeppelin-site.xml file with the following content. In this example, the timeout period is set to 120,000 milliseconds (2 minutes). Choose a timeout value that's appropriate for your environment.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

<property>
  <name>zeppelin.interpreter.lifecyclemanager.class</name>
  <value>org.apache.zeppelin.interpreter.lifecycle.TimeoutLifecycleManager</value>
  <description>This is the LifecycleManager class for managing the lifecycle of interpreters. The interpreter terminates after the idle timeout period.</description>
</property>

<property>
  <name>zeppelin.interpreter.lifecyclemanager.timeout.checkinterval</name>
  <value>60000</value>
  <description>The interval for checking whether the interpreter has timed out, in milliseconds.</description>
</property>

<property>
  <name>zeppelin.interpreter.lifecyclemanager.timeout.threshold</name>
  <value>120000</value>
  <description>The idle timeout limit, in milliseconds.</description>
</property>
</configuration>

2.    Run the following commands to restart Zeppelin:

$ sudo stop zeppelin
$  sudo start zeppelin

Jupyter

Option 1: Manually shut down the notebook

After the job is completed, use one of the following methods to stop the kernel in the Jupyter user interface:

  • In the Jupyter notebook interface, open the File menu, and then choose Close and Halt.
  • On the Jupyter dashboard, open the Running tab. Choose Shutdown for the notebook that you want to stop.

Option 2: Manually shut down the kernel

From the Jupyter notebook interface, open the Kernel menu, and then choose Shutdown.

Option 3: Configure the timeout attribute

If you close the notebook tab or browser window before shutting down the kernel, the YARN job continues to run. To prevent this from happening, configure the NotebookApp.shutdown_no_activity_timeout attribute. This attribute terminates the YARN job after a specified idle timeout period, even if you close the tab or browser window.

Do the following to configure the NotebookApp.shutdown_no_activity_timeout attribute:

1.    Open the /etc/jupyter/jupyter_notebook_config.py file on the master node, and then add an entry similar to the following. In this example, the timeout attribute is set to 120 seconds. Choose a timeout value that's appropriate for your environment.

c.NotebookApp.shutdown_no_activity_timeout = 120

2.    Run the following commands to restart jupyterhub:

sudo docker stop jupyterhub
sudo docker start jupyterhub

Related information

Apache Zeppelin

Considerations when using Zeppelin on Amazon EMR

Jupyter Notebook on Amazon EMR

AWS OFFICIAL
AWS OFFICIALUpdated 3 years ago