Skip to content

Commit 82c53aa

Browse files
authored
SAMZA-2804: Concurrency issues identified in run-class.sh on samza-yarn (#1716)
* Add annotations for each line identified as having a potential issue. * Resolve multiple concurrency issues ## Race condition in pathing jar manifest creation A race condition exists when setting up the classpath during container launch. During container launch using samza-yarn, run-class.sh creates a pathing jar file (which holds the classpath for the container launch). However, during the creation of this pathing jar, temporary files, as well as the pathing jar itself is not placed in a location unique to the container. This results in multiple containers writing to the same pathing jar location and temporary file location, which results in a race condition. This race condition may show up in several ways, such as when Yarn removes jars from a finished container (other containers will point to a classpath which no longer exists) or when multiple run-class.sh scripts attempt to write the manifest.txt or pathing jar at the same time. Note that host affinity being enabled will make this problem worse. The pathing.jar is written to the usercache, so when the container which created the pathing.jar is finished and removed, any new container which launches on that host will point to jar files which do not exist anymore. When host affinity is enabled, it will not move to a new host and just keep failing. ## Container logging directory fallback is not unique for each container The fallback log directory is the same among all containers running on the same host. It should be unique per-container. ## Container tmp dir is not unique per-container The JAVA_TMP_DIR directory is the same for all containers. We should make sure that it's safe to use the same directory for all containers. * Simplify comments and print manifest file locations
1 parent 23f52e9 commit 82c53aa

File tree

Image for: File tree

2 files changed

Image for: 2 files changed
+45
-10
lines changed

2 files changed

Image for: 2 files changed
+45
-10
lines changed

‎samza-shell/src/main/bash/run-class.sh

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,12 @@ cd $home_dir
3030

3131
echo "Current time: $(date '+%Y-%m-%d %H:%M:%S')"
3232

33+
# For example, home_dir looks like:
34+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027
3335
echo home_dir=$home_dir
36+
37+
# For example, base_dir looks like:
38+
# /<hadoop path>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/__package
3439
echo "framework base (location of this script). base_dir=$base_dir"
3540

3641
if [ ! -d "$base_dir/lib" ]; then
@@ -78,15 +83,23 @@ fi
7883
# permissions for the classpath-related files when they are in their own directory. An example of where
7984
# this is helpful is when using container images which might have predefined permissions for certain
8085
# directories.
81-
CLASSPATH_WORKSPACE_DIR=$base_dir/classpath_workspace
86+
87+
# For example, CLASSPATH_WORKSPACE_DIR looks like:
88+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/classpath_workspace
89+
CLASSPATH_WORKSPACE_DIR=$home_dir/classpath_workspace
8290
mkdir -p $CLASSPATH_WORKSPACE_DIR
91+
8392
# file containing the classpath string; used to avoid passing long classpaths directly to the jar command
8493
PATHING_MANIFEST_FILE=$CLASSPATH_WORKSPACE_DIR/manifest.txt
94+
echo "Pathing manifest txt located at $PATHING_MANIFEST_FILE"
95+
8596
# jar file to include on the classpath for running the main class
8697
PATHING_JAR_FILE=$CLASSPATH_WORKSPACE_DIR/pathing.jar
98+
echo "Pathing manifest jar located at $PATHING_JAR_FILE"
8799

88100
# Newlines and spaces are intended to ensure proper parsing of manifest in pathing jar
89101
printf "Class-Path: \n $CLASSPATH \n" > $PATHING_MANIFEST_FILE
102+
90103
# Creates a new archive and adds custom manifest information to pathing.jar
91104
eval "$JAR -cvmf $PATHING_MANIFEST_FILE $PATHING_JAR_FILE"
92105

@@ -97,12 +110,18 @@ else
97110
fi
98111

99112
if [ -z "$SAMZA_LOG_DIR" ]; then
100-
SAMZA_LOG_DIR="$base_dir"
113+
# SAMZA_LOG_DIR will point to the symlink located at:
114+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/logs
115+
#
116+
# When the symlink is resolved, this path will point to:
117+
# /<hadoop dir>/userlogs/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027
118+
SAMZA_LOG_DIR="$home_dir/logs"
101119
fi
102120

103-
# add usercache directory
104-
mkdir -p $base_dir/tmp
105-
JAVA_TEMP_DIR=$base_dir/tmp
121+
# JAVA_TEMP_DIR will point to a path similar to:
122+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/tmp
123+
mkdir -p $home_dir/tmp
124+
JAVA_TEMP_DIR=$home_dir/tmp
106125

107126
# Check whether the JVM supports GC Log rotation, and enable it if so.
108127
function check_and_enable_gc_log_rotation {

‎samza-shell/src/main/bash/run-framework-class.sh

100644100755
Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,12 @@ cd $base_dir
2828
base_dir=`pwd`
2929
cd $home_dir
3030

31+
# Note: When using samza-yarn, home_dir looks like:
32+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027
3133
echo home_dir=$home_dir
34+
35+
# Note: When using samza-yarn, base_dir looks like:
36+
# /<hadoop path>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/__package
3237
echo "framework base (location of this script). base_dir=$base_dir"
3338

3439
if [ ! -d "$base_dir/lib" ]; then
@@ -107,10 +112,15 @@ fi
107112
# permissions for the classpath-related files when they are in their own directory. An example of where
108113
# this is helpful is when using container images which might have predefined permissions for certain
109114
# directories.
110-
CLASSPATH_WORKSPACE_DIR=$base_dir/classpath_workspace
115+
116+
# Note: When on samza-yarn, CLASSPATH_WORKSPACE_DIR looks like:
117+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/classpath_workspace
118+
CLASSPATH_WORKSPACE_DIR=$home_dir/classpath_workspace
111119
mkdir -p $CLASSPATH_WORKSPACE_DIR
120+
112121
# file containing the classpath string; used to avoid passing long classpaths directly to the jar command
113122
PATHING_MANIFEST_FILE=$CLASSPATH_WORKSPACE_DIR/manifest.txt
123+
114124
# jar file to include on the classpath for running the main class
115125
PATHING_JAR_FILE=$CLASSPATH_WORKSPACE_DIR/pathing.jar
116126

@@ -126,12 +136,18 @@ else
126136
fi
127137

128138
if [ -z "$SAMZA_LOG_DIR" ]; then
129-
SAMZA_LOG_DIR="$base_dir"
139+
# When on samza-yarn, SAMZA_LOG_DIR will point to the symlink located at:
140+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/logs
141+
#
142+
# When the symlink is resolved, this path will point to:
143+
# /<hadoop dir>/userlogs/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027
144+
SAMZA_LOG_DIR="$home_dir"
130145
fi
131146

132-
# add usercache directory
133-
mkdir -p $base_dir/tmp
134-
JAVA_TEMP_DIR=$base_dir/tmp
147+
# When on samza-yarn, JAVA_TEMP_DIR will point to a path similar to:
148+
# /<hadoop dir>/usercache/<linux account>/appcache/application_1745893616511_0059/container_e64_1745893616511_0059_01_002027/tmp
149+
mkdir -p $home_dir/tmp
150+
JAVA_TEMP_DIR=$home_dir/tmp
135151

136152
# Check whether the JVM supports GC Log rotation, and enable it if so.
137153
function check_and_enable_gc_log_rotation {

0 commit comments

Image for: 0 commit comments
Comments
 (0)