17/07/19 08:46:11 main INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 17/07/19 08:46:12 main INFO SecurityManager: Changing view acls to: yarn,ec2-user 17/07/19 08:46:12 main INFO SecurityManager: Changing modify acls to: yarn,ec2-user 17/07/19 08:46:12 main INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, ec2-user); users with modify permissions: Set(yarn, ec2-user) 17/07/19 08:46:13 main INFO SecurityManager: Changing view acls to: yarn,ec2-user 17/07/19 08:46:13 main INFO SecurityManager: Changing modify acls to: yarn,ec2-user 17/07/19 08:46:13 main INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, ec2-user); users with modify permissions: Set(yarn, ec2-user) 17/07/19 08:46:13 sparkExecutorActorSystem-akka.actor.default-dispatcher-2 INFO Slf4jLogger: Slf4jLogger started 17/07/19 08:46:13 sparkExecutorActorSystem-akka.actor.default-dispatcher-2 INFO Remoting: Starting remoting 17/07/19 08:46:13 sparkExecutorActorSystem-akka.actor.default-dispatcher-2 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutorActorSystem@ip-10-156-14-210.ec2.internal:36755] 17/07/19 08:46:13 main INFO Utils: Successfully started service 'sparkExecutorActorSystem' on port 36755. 17/07/19 08:46:13 main INFO DiskBlockManager: Created local directory at /media/ephemeral0/yarn/local/usercache/ec2-user/appcache/application15004408343520004/blockmgr-ea54d43d-1ebc-49f7-9770-bbf6a913c5e7 17/07/19 08:46:13 main INFO MemoryStore: MemoryStore started with capacity 14.2 GB 17/07/19 08:46:14 dispatcher-event-loop-1 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.153.214.122:36433 17/07/19 08:46:14 dispatcher-event-loop-2 INFO CoarseGrainedExecutorBackend: Successfully registered with driver 17/07/19 08:46:14 dispatcher-event-loop-2 INFO Executor: Starting executor ID 4 on host ip-10-156-14-210.ec2.internal 17/07/19 08:46:14 dispatcher-event-loop-2 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33355. 17/07/19 08:46:14 dispatcher-event-loop-2 INFO NettyBlockTransferService: Server created on 33355 17/07/19 08:46:14 dispatcher-event-loop-2 INFO BlockManager: external shuffle service port = 7337 17/07/19 08:46:14 dispatcher-event-loop-2 INFO BlockManagerMaster: Trying to register BlockManager 17/07/19 08:46:14 dispatcher-event-loop-2 INFO BlockManagerMaster: Registered BlockManager 17/07/19 08:46:14 dispatcher-event-loop-2 INFO BlockManager: Registering executor with local external shuffle service. 17/07/19 08:46:15 dispatcher-event-loop-1 INFO CoarseGrainedExecutorBackend: Got assigned task 1 17/07/19 08:46:15 Executor task launch worker-0 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 17/07/19 08:46:16 Executor task launch worker-0 INFO TorrentBroadcast: Started reading broadcast variable 2 17/07/19 08:46:16 Executor task launch worker-0 INFO MemoryStore: Block broadcast2piece0 stored as bytes in memory (estimated size 7.6 KB, free 7.6 KB) 17/07/19 08:46:16 Executor task launch worker-0 INFO TorrentBroadcast: Reading broadcast variable 2 took 103 ms 17/07/19 08:46:16 Executor task launch worker-0 INFO MemoryStore: Block broadcast2 stored as values in memory (estimated size 11.5 KB, free 19.1 KB) 2017-07-19 08:46:16,562 INFO (MainThread-14608) connected to server at ('ip-10-153-214-122', 33927) 2017-07-19 08:46:16,564 INFO (MainThread-14608) TFSparkNode.reserve: {'authkey': '\xce\xb9 \xd2\xec\x97J>\x84\x15\x08\x1fY\xf9vE', 'workernum': 1, 'host': 'ip-10-156-14-210', 'tbport': 0, 'addr': '/tmp/pymp-8T8aOT/listener-8c7obu', 'ppid': 14602, 'taskindex': 0, 'jobname': 'worker', 'tbpid': 0, 'port': 35991} 2017-07-19 08:46:18,569 INFO (MainThread-14608) node: {'addr': ('ip-10-111-178-35', 41949), 'taskindex': 0, 'jobname': 'ps', 'authkey': '\x12\xa5\x18$$\xb8O\x85\xa6nO\x97\x81+\xd4C', 'workernum': 0, 'host': 'ip-10-111-178-35', 'ppid': 14643, 'port': 37951, 'tbpid': 0, 'tbport': 0} 2017-07-19 08:46:18,570 INFO (MainThread-14608) node: {'addr': '/tmp/pymp-8T8aOT/listener-8c7obu', 'taskindex': 0, 'jobname': 'worker', 'authkey': '\xce\xb9 \xd2\xec\x97J>\x84\x15\x08\x1fY\xf9vE', 'workernum': 1, 'host': 'ip-10-156-14-210', 'ppid': 14602, 'port': 35991, 'tbpid': 0, 'tbport': 0} 2017-07-19 08:46:18,570 INFO (MainThread-14608) node: {'addr': '/tmp/pymp-jI4zyL/listener-Ucfnew', 'taskindex': 1, 'jobname': 'worker', 'authkey': "\xa9\xc83L\x17\xacN\xc5\xab'\x01\xef]\xb3\xd48", 'workernum': 2, 'host': 'ip-10-167-218-220', 'ppid': 13976, 'port': 35779, 'tbpid': 0, 'tbport': 0} 2017-07-19 08:46:18,570 INFO (MainThread-14608) node: {'addr': '/tmp/pymp-4n8JKO/listener-RJ0DGB', 'taskindex': 2, 'jobname': 'worker', 'authkey': '\x8f\x1a\xb4tA\xc2G\xfb\x94\xd1\xe2g\x03\xba\xf5\x84', 'workernum': 3, 'host': 'ip-10-153-214-122', 'ppid': 14517, 'port': 39217, 'tbpid': 0, 'tbport': 0} 2017-07-19 08:46:18,732 INFO (MainThread-14608) Starting TensorFlow worker:0 on cluster node 1 on background process 17/07/19 08:46:18 Executor task launch worker-0 INFO PythonRunner: Times: total = 2458, boot = 239, init = 38, finish = 2181 17/07/19 08:46:18 Executor task launch worker-0 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1015 bytes result sent to driver 17/07/19 08:46:18 dispatcher-event-loop-3 INFO CoarseGrainedExecutorBackend: Got assigned task 6 17/07/19 08:46:18 Executor task launch worker-0 INFO Executor: Running task 2.0 in stage 1.0 (TID 6) 17/07/19 08:46:18 Executor task launch worker-0 INFO TorrentBroadcast: Started reading broadcast variable 3 17/07/19 08:46:18 Executor task launch worker-0 INFO MemoryStore: Block broadcast3piece0 stored as bytes in memory (estimated size 7.0 KB, free 26.1 KB) 17/07/19 08:46:18 Executor task launch worker-0 INFO TorrentBroadcast: Reading broadcast variable 3 took 28 ms 17/07/19 08:46:18 Executor task launch worker-0 INFO MemoryStore: Block broadcast3 stored as values in memory (estimated size 14.2 KB, free 40.3 KB) 17/07/19 08:46:18 Executor task launch worker-0 INFO HadoopRDD: Input split: hdfs://ec2-54-87-28-131.compute-1.amazonaws.com:9000/user/ec2-user/mnist/csv/train/images/part-00002:0+11214784 17/07/19 08:46:18 Executor task launch worker-0 INFO TorrentBroadcast: Started reading broadcast variable 0 17/07/19 08:46:18 Executor task launch worker-0 INFO MemoryStore: Block broadcast0piece0 stored as bytes in memory (estimated size 24.8 KB, free 65.1 KB) 17/07/19 08:46:18 Executor task launch worker-0 INFO TorrentBroadcast: Reading broadcast variable 0 took 57 ms 17/07/19 08:46:19 Executor task launch worker-0 INFO MemoryStore: Block broadcast0 stored as values in memory (estimated size 360.6 KB, free 425.7 KB) 17/07/19 08:46:19 Executor task launch worker-0 INFO deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 17/07/19 08:46:20 Executor task launch worker-0 INFO GPLNativeCodeLoader: Loaded native gpl library 17/07/19 08:46:20 Executor task launch worker-0 INFO LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 049362b7cf53ff5f739d6b1532457f2c6cd495e8] 17/07/19 08:46:20 Executor task launch worker-0 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 17/07/19 08:46:20 Executor task launch worker-0 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 17/07/19 08:46:20 Executor task launch worker-0 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 17/07/19 08:46:20 Executor task launch worker-0 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 17/07/19 08:46:20 Executor task launch worker-0 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 2017-07-19 08:46:20,173 INFO (MainThread-14646) 1: ======== worker:0 ======== 2017-07-19 08:46:20,173 INFO (MainThread-14646) 1: Cluster spec: {'ps': ['ip-10-111-178-35:37951'], 'worker': ['ip-10-156-14-210:35991', 'ip-10-167-218-220:35779', 'ip-10-153-214-122:39217']} 2017-07-19 08:46:20,173 INFO (MainThread-14646) 1: Using CPU 2017-07-19 08:46:20.174249: W tensorflow/core/platform/cpufeatureguard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-07-19 08:46:20.174272: W tensorflow/core/platform/cpufeatureguard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-07-19 08:46:20.174282: W tensorflow/core/platform/cpufeatureguard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-07-19 08:46:20.184093: I tensorflow/core/distributedruntime/rpc/grpcchannel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> ip-10-111-178-35:37951} 2017-07-19 08:46:20.184143: I tensorflow/core/distributedruntime/rpc/grpcchannel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:35991, 1 -> ip-10-167-218-220:35779, 2 -> ip-10-153-214-122:39217} 2017-07-19 08:46:20.184663: I tensorflow/core/distributedruntime/rpc/grpcserverlib.cc:316] Started server with target: grpc://localhost:35991 tensorflow model path: hdfs://ec2-54-87-28-131.compute-1.amazonaws.com:9000/user/ec2-user/mnistmodel 17/07/19 08:46:20 Executor task launch worker-0 INFO HadoopRDD: Input split: hdfs://ec2-54-87-28-131.compute-1.amazonaws.com:9000/user/ec2-user/mnist/csv/train/labels/part-00002:0+245760 17/07/19 08:46:20 Executor task launch worker-0 INFO TorrentBroadcast: Started reading broadcast variable 1 17/07/19 08:46:20 Executor task launch worker-0 INFO MemoryStore: Block broadcast1piece0 stored as bytes in memory (estimated size 24.8 KB, free 450.5 KB) 17/07/19 08:46:20 Executor task launch worker-0 INFO TorrentBroadcast: Reading broadcast variable 1 took 16 ms 17/07/19 08:46:20 Executor task launch worker-0 INFO MemoryStore: Block broadcast1 stored as values in memory (estimated size 360.6 KB, free 811.1 KB) 17/07/19 08:46:20 Executor task launch worker-0 INFO deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2017-07-19 08:46:21,037 INFO (MainThread-14673) Connected to TFSparkNode.mgr on ip-10-156-14-210, ppid=14602, state='running' 2017-07-19 08:46:21,073 INFO (MainThread-14673) mgr.state='running' 2017-07-19 08:46:21,073 INFO (MainThread-14673) Feeding partition stream at 0x7f1b38b31460> into input queue 17/07/19 08:46:24 stdout writer for Python/bin/python INFO PythonRunner: Times: total = 3805, boot = -1542, init = 1656, finish = 3691 17/07/19 08:46:24 stdout writer for Python/bin/python INFO PythonRunner: Times: total = 307, boot = 2, init = 81, finish = 224