Using the all setting explained on the previous test, let's full-import data from a database and create indexes on the Solr server using Solr web UI.
1. Open the solr web UI on http://localhost:8983/solr/#/ for example. --> Select a core "addressDB" --> Click Dataimport --> Make sure "full-import" command is selected.
2. Click an "Execute" button. You may also click a "Refresh Status" button as you want to see an updated status. This is the result of the data import using all configurations on the previous post.
All data had been imported but it took 27 m 45 s for 6,071,307 data. It is very slow especially after we imported all same data within 410 seconds using SolrJ client on a previous post.
The number of 'Requests' and 'Fetched' seems very large and number of fetched data is a double of the 'Processed'. In fact, the Solr joins are more like subqueries in a database query.
- Different structure of the data-config.xml and Performance Comparison
Instead of using an hierarchical <entity> structure, let's use one <entity> with a join sql as a value of the query attribute.
Now, every settings/configuration is same as one shown on a previous post except the following <entity> definition in the data-config.xml file.
<document>
<entity name="address" transformer="script:getFullName" dataSource="addressDB" query="SELECT address_db_id, area_code, state, state_en, city, city_en, sub_city, sub_city_en, street_name, street_name_en, building_num, building_num_sub, bulk_delivery_place_name, building_name, legal_dong_name, ri_name, admin_dong_name, ground_num, ground_num_sub, dong_seq, postal_code FROM address_db, address_state
WHERE state_id = address_state_id">
<field column="address_db_new_id" name="addressId" />
<field column="area_code" name="areaCode" />
<field column="postal_code" name="postalCode" />
<field column="city" name="city" />
<field column="city_en" name="city_en" />
<field column="sub_city" name="subCity" />
<field column="sub_city_en" name="subCity_en" />
<field column="street_name" name="streetName" />
<field column="street_name_en" name="streetName_en" />
<field column="building_number" name="buildingNumber" />
<field column="bulk_delivery_place_name" name="bulkDeliveryPlaceName"/>
<field column="building_name" name="buildingName"/>
<field column="legal_dong_name" name="dongName"/>
<field column="admin_dong_name" name="adminDongName"/>
<field column="ri_name" name="riName"/>
<field column="ground_number" name="groundNumber"/>
<field column="dong_seq" name="dongSeq"/>
<field column="state" name="state" />
<field column="state_en" name="state_en" />
</entity>
</document> |
This is the result of the full import. 4 m 43 s (283 seconds) for the same data with only 1 request!
Thing to Consider: Running one (joined) query in one <entity> element performs much better in general, but we should consider a total data size to be processed on the database and during the data transfer. When the data size is very large, it would take much memory on the database server. With very large data transfer, it will take long time for the Solr to receive/process the first data.
Comment: During the data update including the import, creating/recreating a Searcher and warming up process are important concepts for the Solr search. Unless I mentioned specifically, I used all default configuration. For example, we briefly talked about the 'commit' method using SolrJ on a previous posts. On data import, the following configurations in solrconfig.xml play a role.
<autoCommit> <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> <openSearcher>false</openSearcher> </autoCommit> <autoSoftCommit> <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> </autoSoftCommit> |
This web page is a good reference for the soft commit and hard commit.
No comments:
Post a Comment