Skip to content
Projects
Groups
Snippets
Help
This project
Loading...
Sign in / Register
Toggle navigation
M
mobvista-dmp
Project
Overview
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
王金锋
mobvista-dmp
Commits
15972dc8
Commit
15972dc8
authored
May 28, 2021
by
wang-jinfeng
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
optimize dmp
parent
fd0560d0
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
37 additions
and
19 deletions
+37
-19
rtdmp.job
azkaban/rtdmp/rtdmp.job
+0
-0
rtdmp.sh
azkaban/rtdmp/rtdmp.sh
+37
-16
Logic.scala
src/main/scala/mobvista/dmp/datasource/rtdmp/Logic.scala
+0
-3
RTDmpMain.scala
src/main/scala/mobvista/dmp/datasource/rtdmp/RTDmpMain.scala
+0
-0
No files found.
azkaban/rtdmp/rtdmp
→
azkaban/rtdmp/rtdmp
.job
View file @
15972dc8
File moved
azkaban/rtdmp/rtdmp.sh
View file @
15972dc8
...
...
@@ -4,26 +4,48 @@ source ../dmp_env.sh
today
=
${
ScheduleTime
}
date_time
=
$(
date
+
"%Y-%m-%d
%H"
-d
"-1 hour
$today
"
)
date_time
=
$(
date
+
"%Y-%m-%d
.
%H"
-d
"-1 hour
$today
"
)
date_path
=
$(
date
+%Y/%m/%d/%H
-d
"-1 hour
$today
"
)
part_num
=
$(
hadoop fs
-ls
s3://mob-emr-test/dataplatform/rtdmp_pre/
${
date_path
}
/ |
wc
-l
)
if
[[
${
part_num
}
-le
50
]]
;
then
echo
"This Dir No Data !!!"
partition
=
10
coalesce
=
10
executor
=
2
memory
=
4
core
=
2
flag
=
0
else
partition
=
2000
coalesce
=
200
executor
=
8
memory
=
10
core
=
4
flag
=
1
fi
INPUT
=
"s3://mob-emr-test/dataplatform/rtdmp_pre/
${
date_path
}
"
OUTPUT
=
"s3://mob-emr-test/dataplatform/rtdmp_deal/
${
date_path
}
/0
"
OUTPUT
=
"s3://mob-emr-test/dataplatform/rtdmp_deal/
${
date_path
}
"
spark-submit
--class
mobvista.dmp.datasource.rtdmp.RTDmpMainDeal
\
--name
"RTDmpMainDeal.
${
date_time
}
"
\
--conf
spark.sql.shuffle.partitions
=
10000
\
--conf
spark.default.parallelism
=
500
\
--conf
spark.kryoserializer.buffer.max
=
256m
\
--conf
spark.speculation
=
true
\
--conf
spark.speculation.quantile
=
0.9
\
--conf
spark.speculation.multiplier
=
1.3
\
--conf
spark.executor.extraJavaOptions
=
"-XX:+UseG1GC"
\
--master
yarn
--deploy-mode
cluster
--executor-memory
4g
--driver-memory
4g
--executor-cores
4
--num-executors
50
\
../
${
JAR
}
-time
"
${
date_time
}
"
-data_utime
"
${
date_time
}
"
-input
${
INPUT
}
-output
${
OUTPUT
}
-coalesce
200
-partition
10000
before_date_path
=
$(
date
+%Y/%m/%d/%H
-d
"-2 hour
$today
"
)
BEFORE_OUTPUT
=
"s3://mob-emr-test/dataplatform/rtdmp/
${
before_date_path
}
"
check_await
"
${
BEFORE_OUTPUT
}
/_SUCCESS"
spark-submit
--class
mobvista.dmp.datasource.rtdmp.RTDmpMain
\
--name
"RTDmpMain.
${
date_time
}
"
\
--conf
spark.sql.shuffle.partitions
=
${
partition
}
\
--conf
spark.default.parallelism
=
${
partition
}
\
--conf
spark.kryoserializer.buffer.max
=
512m
\
--conf
spark.kryoserializer.buffer
=
64m
\
--master
yarn
--deploy-mode
cluster
\
--executor-memory
${
memory
}
g
--driver-memory
6g
--executor-cores
${
core
}
--num-executors
${
executor
}
\
.././DMP.jar
\
-flag
${
flag
}
-time
${
date_time
}
-input
${
INPUT
}
-output
${
OUTPUT
}
-coalesce
${
coalesce
}
if
[[
$?
-ne
0
]]
;
then
exit
255
fi
\ No newline at end of file
exit
255
fi
src/main/scala/mobvista/dmp/datasource/rtdmp/Logic.scala
View file @
15972dc8
...
...
@@ -31,7 +31,6 @@ import scala.collection.{immutable, mutable}
object
Logic
{
def
getResultFeature
(
session
:
CqlSession
,
iterator
:
Iterator
[
Row
])
:
Iterator
[
AudienceInfo
]
=
{
val
sql
=
"""
|select audience_data from rtdmp.audience_info where devid = '@devid'
...
...
@@ -39,7 +38,6 @@ object Logic {
val
res
=
new
ArrayBuffer
[
AudienceInfo
]()
iterator
.
foreach
(
row
=>
{
// val session = connector.openSession()
val
devId
=
row
.
getAs
[
String
](
0
)
val
audience_data
=
row
.
getAs
[
String
](
1
)
val
query_sql
=
sql
.
replace
(
"@devid"
,
devId
)
...
...
@@ -49,7 +47,6 @@ object Logic {
}
else
{
new
JSONObject
().
toJSONString
}
// session.close()
res
.
add
(
AudienceInfo
(
devId
,
audience_data
,
old_audience_data
))
})
res
.
iterator
()
...
...
src/main/scala/mobvista/dmp/datasource/rtdmp/RTDmpMain.scala
View file @
15972dc8
This diff is collapsed.
Click to expand it.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment