Sunday, July 19, 2015

Specification of My Laptop & Address Dataset Used on My Posts

1.  Specification of my Macbook.

Processor Name Intel Core i7
Processor Speed 2.3 GHz
Number of Processors 1
Total Number of Cores 4
L2 Cache (per Core) 256 KB
L3 Cache 6 MB
Memory 16 GB


2.  Dataset: Address DB  
      (Updated on Aug. 6, 2015: Use new address files with a new 5 digit postal code)
On my posts, I will use Korean address data.  This data is published by Korea Post Office.  Total number of data is 6,071,307 in 17 different files.  So, these data are reasonably large for my Solr and/or Hadoop examples.

The files are located at my public repository at https://bitbucket.org/jihwan11/openfiles 
Address data is for Korea address and its contents are written in Korean.  Nevertheless, some fields such as city_en, state_en, etc are in English.  I think these fields would be enough for my examples.

Each line of each file describes one address and each field is separated with a | (vertical bar).
This is the format of each line.  

area_code|state|state_en|city|city_en|sub_city|sub_city_en|street_code|street_name|street_name_en|is_basement|building_num|building_num_sub|building_mgm_num|bulk_delivery_place_name|building_name|legal_dong_code|legal_dong_name|ri_name|admin_dong_name|is_mountain|ground_num|dong_seq|ground_num_sub|postal_code|postal_code_seq

Area_code is a new 5 digit postal code and it may have a leading 0.  Postal_code and postal_code_seq are old postal code.

Data is from the 3rd line of each file and, an example line is
06309|서울특별시|Seoul|강남구|Gangnam-gu|||116804166060|개포로30길|Gaepo-ro 30-gil|0|15|0|1168010300102080012021276||LG전선|1168010300|개포동||개포4동|0|1208|01|12|135962|001

No comments:

Post a Comment

Java 9: Flow - Reactive Programming

Programming world has always been changed fast enough and many programming / design paradigms have been introduced such as object oriented p...