BaiduXJTU_BigData_2019_Semi-Final

Urban Region Function Classification Top18 Solution

Competetition Intro.

Team Brief Intro.

Team Name:

浑南摸鱼队

Key Members:

王德君，东北大学计算机本科大二在读; (Team Leader)
姚来刚，东北大学计算机系硕士，研究方向为机器学习与数据挖掘. (Key Teammate)

Contest Rankings:

Preliminary: Rank 17
Semi-Final: Rank 18

Mission Descriptions

Build models to classify the functions of urban areas with data of satellite images({Area_ID}.jpg) and user behavior({Area_ID}.txt) from given geographical areas.

Tables of the functions of urban areas:

CategoryID	Functions of Areas
001	Residential area
002	School
003	Industrial park
004	Railway station
005	Airport
006	Park
007	Shopping area
008	Administrative district
009	Hospital

For more Detailed Task descriptions, please go to 赛题详情

Environmental Requirements

OS & GPU Configurations

Ubuntu 18.04.1 LTS
GTX 1080Ti x 1 + GTX 1060M x 1
Baidu AI Studio (Tesla V100 x 36, For Model I 36 Networks Training)

Python Package Requirements

Anaconda 4.7.10
python 3.6
pytorch 1.1.0
keras 2.2.4
opencv3
sklearn
numpy
matplotlib

Submission Timeline

Model	Baseline Acc	Top Result
Model I	None	77.08%
Model II	77.08%	81.6200%
Model III	81.6200% 81.2440%	82.1800%

Detailed Solution

Model I: Several Netural Network Stacking (DeepLearning)

# Model Descriptions:
8 Nets
5 folds Stacking
Trained Networks = 7*5+1 (We only trained fold1 on Net6 because of Its Low Local Acc).
# Result:
After 36 NN Stacking, we reached a top online acc of 77.08%.

NetWork Name	Baseline	Descriptions	Online Top1-Acc on Test of 5 folds Merged Result
Net1_raw	DPN26+Resnext50	Train with RAW NPYs (40w 182x24 Npys)	76.21%
Net2_1	Net1	Introduced Resampled Folds_Split for Train	About 76%
Net3_w	Net1	Class_Ratio were considered into Loss Calculation	About 76%
Net4_TTA	Net1	Introduced TTA (Test Time Augumentation)	About 76%
Net5_HR	Net1	Introduced HighResample (linear)	About 76%
Net6_Features	DenseNet	Introduced Feture Engineering (Features: 175)	61.19%
Net7_MS	Net1	Introduced MultiScale	About 76%
Net8_MS_cat	Net1	Introduced MultiScale & Concatenate	About 76%

Model II: Txt Processing (Feature Engineering)

# Steps:
1) Txt Identical Check (Completed)
2) Multivoters based on Total Times A user Appeared in Same Category (Completed)
3) Multivoters based on Total Hours A user Appeared in Same Category (Only 3/2000 json files were processed)
# Result:
After above 3 steps, we got a submission of 81.62%. (81.6200%.txt)
# Notes:
[1] Step 3 was not completed Because of Limited Time and Computation Resources, only 3/2000 data was processed.
[2] In this project we abbreviate {Preliminary,Semi-Final}-{Train,Test}-Datasets as {P,S}{Tr,Te} ==> {PTr,PTe,STr,STe}.

Steps	Content Descriptions	Oringinal Score	After Improved	Source Code
(1)	Utilize Identical txts' Categories in PTr & STr to provide answers for STe	77.04%	78.74%	SelfDuplicateCheck
(2)	Multivoters based on Total Times A user Appeared in Same Category	78.74%	81.62%	MergeVotes
(3)	Multivoters based on Total Hours A user Appeared in Same Category	81.62%	-	AdvM2_train

Model III: Merge & Rebalance the Predicts in Submissions (Post-processing）

Directly Modify Submission.txt:
We Compared predicts in 81.2440%.txt and 81.6200%.txt, finding that 001 was TOO MANY (4k more than True Value), 003/005 were a bit more-predicted, and others were all less-predicted.

Category Distributions in Our Submissions (Take 81.6200%.txt for Example)

Category	Total Predicts in 81.6200%.txt	Estimated True Value	Difference (Pred-Estimated)	Desc
001	34542	30092	+4450	Too Much More
002	22026	22763	-737	Much Less
003	13247	12753	+494	More
004	1510	1647	-137	Little Less
005	4314	4123	+191	More
006	12978	15671	-2693	Much Less
007	4986	5283	-297	Little Less
008	2247	3295	-1048	Much Less
009	4150	4370	-220	Little Less

Therefore, We Merged the Predicts among our ex-Top2 Submissions. (81.6200%.txt & 81.2440%.txt)
Strategy & Rules:
1) Compare & Merge the predicts in the two txt file by Replacing those '001's to other less-predicted categories.
2) While the two gives the same prediction or both predictions are in More-Predicted Categories ['001','003','005'], Choose the answer in 81.6200%.txt as result Beacause of its Higher Acc.
After this operation, we got our final best submission 82.1800%.txt, which reached 82.18%.

Related Source Code (Submission_Check)

def MergeDict(Dict1,Dict2,ModCates):
    Merge_Dict = {}
    _identical,_new_choice,_prior = 0,0,0
    for key,val1 in Dict1.items():
        val2 = Dict2[key]
        if val1==val2:
            Merge_Dict[key] = val1
        elif '001' in [val1,val2] and '003' not in [val1,val2] and '005' not in [val1,val2]:
            Merge_Dict[key] = val2 if val1=='001' else val1
        elif val2 in ModCates:
            Merge_Dict[key] = val2
        else:
            Merge_Dict[key] = val1
    return Merge_Dict

txt1 = '../81.6200%.txt'
txt2 = '../81.2440%.txt'
Dict1 = LoadDictFromTxt(txt1)
Dict2 = LoadDictFromTxt(txt2)

priorlist = ['001',
#             '006',
#             '003',
#             '008'
            ]
submit_txt = MergeName(txt1,txt2,'{}_MOD'.format('_'.join(priorlist)))
Merge_Dict = MergeDict(Dict1,Dict2,[])
##Merge_Dict = MergeDict(Dict1,Dict2,['006'])
##Merge_Dict = MergeDict(Dict1,Dict2,['003','008'])
WriteDictToTxt(submit_txt,Merge_Dict)
Statistics(Merge_Dict)

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Dataloader		Dataloader
Stacking-NN		Stacking-NN
Submission		Submission
Tensors		Tensors
Txt-Process		Txt-Process
basenet		basenet
data		data
GetConfusionMatrix.py		GetConfusionMatrix.py
ReadMe.md		ReadMe.md
Reshape.py		Reshape.py
TxtToNpy.py		TxtToNpy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BaiduXJTU_BigData_2019_Semi-Final

Urban Region Function Classification Top18 Solution

Competetition Intro.

Team Brief Intro.

Team Name:

Key Members:

Contest Rankings:

Mission Descriptions

Environmental Requirements

OS & GPU Configurations

Python Package Requirements

Submission Timeline

Detailed Solution

Model I: Several Netural Network Stacking (DeepLearning)

Model II: Txt Processing (Feature Engineering)

Model III: Merge & Rebalance the Predicts in Submissions (Post-processing）

About

Releases

Packages

Languages

SeaEagleI/BaiduXJTU_BigData_2019_Semi-Final

Folders and files

Latest commit

History

Repository files navigation

BaiduXJTU_BigData_2019_Semi-Final

Urban Region Function Classification Top18 Solution

Competetition Intro.

Team Brief Intro.

Team Name:

Key Members:

Contest Rankings:

Mission Descriptions

Environmental Requirements

OS & GPU Configurations

Python Package Requirements

Submission Timeline

Detailed Solution

Model I: Several Netural Network Stacking (DeepLearning)

Model II: Txt Processing (Feature Engineering)

Model III: Merge & Rebalance the Predicts in Submissions (Post-processing）

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages