1. Specification of my Macbook.
Processor Name | Intel Core i7 |
Processor Speed | 2.3 GHz |
Number of Processors | 1 |
Total Number of Cores | 4 |
L2 Cache (per Core) | 256 KB |
L3 Cache | 6 MB |
Memory | 16 GB |
2. Dataset: Address DB
(Updated on Aug. 6, 2015: Use new address files with a new 5 digit postal code)
On my posts, I will use Korean address data. This data is published by Korea Post Office. Total number of data is 6,071,307 in 17 different files. So, these data are reasonably large for my Solr and/or Hadoop examples.(Updated on Aug. 6, 2015: Use new address files with a new 5 digit postal code)
The files are located at my public repository at https://bitbucket.org/jihwan11/openfiles
Address data is for Korea address and its contents are written in Korean. Nevertheless, some fields such as city_en, state_en, etc are in English. I think these fields would be enough for my examples.
Each line of each file describes one address and each field is separated with a | (vertical bar).
This is the format of each line.
area_code|state|state_en|city|city_en|sub_city|sub_city_en|street_code|street_name|street_name_en|is_basement|building_num|building_num_sub|building_mgm_num|bulk_delivery_place_name|building_name|legal_dong_code|legal_dong_name|ri_name|admin_dong_name|is_mountain|ground_num|dong_seq|ground_num_sub|postal_code|postal_code_seq
Area_code is a new 5 digit postal code and it may have a leading 0. Postal_code and postal_code_seq are old postal code.
Data is from the 3rd line of each file and, an example line is
06309|서울특별시|Seoul|강남구|Gangnam-gu|||116804166060|개포로30길|Gaepo-ro 30-gil|0|15|0|1168010300102080012021276||LG전선|1168010300|개포동||개포4동|0|1208|01|12|135962|001
No comments:
Post a Comment