Issue
This Content is from Stack Overflow. Question asked by Vovin
I use BallTree
structure from scikitlearn
in order to find nearest points inside a radius. I have a geometry array consisting of 1203 points. It works well. For example, let’s find nearest points inside 100 km around an isolate from Fujisawa:
ball_tree = BallTree(
geometry_array,
metric="haversine",
)
isolate_index = foreign_AVP1.index.get_loc("LC035020.1")
near_indices = ball_tree.query_radius(
geometry_array[isolate_index].reshape(1, -1), r=kmToRadians(100)
).item()
foreign_AVP1.iloc[near_indices].locale
The head of the output:
taxon | locale |
---|---|
LC037396.1 | Japan Yokohama |
LC035020.1 | Japan Fujisawa |
LC037395.1 | Japan Yokohama |
LC014787.1 | Japan Tokyo |
AB279734.1 | Japan Tokyo |
I was curious to look at the tree structure & used get_tree_stats
method to know the leaves number. Unawares, I got zeros tuple: (0, 0, 0)
. This is strange as there are a default leaf size 40 & the number of samples 1203 that should result in about 16-30 leaves. When I used get_arrays
method to get the raw data I saw a normal picture (this is just a part), there are 16 leaves:
array([( 0, 1203, 0, 3.0947303 ), ( 0, 601, 0, 2.41933654),
( 601, 1203, 0, 0.58490656), ( 0, 300, 0, 1.85505229),
( 300, 601, 0, 0.89693236), ( 601, 902, 0, 0.49983081),
( 902, 1203, 0, 0.40970688), ( 0, 150, 0, 1.48087714),
( 150, 300, 0, 0.32291627), ( 300, 450, 0, 0.20494666),
( 450, 601, 0, 0.81787611), ( 601, 751, 0, 0.55418562),
( 751, 902, 0, 0.33448889), ( 902, 1052, 0, 0.01141954),
(1052, 1203, 0, 0.39229974), ( 0, 75, 1, 1.00467692),
( 75, 150, 1, 1.07852748), ( 150, 225, 1, 0.25955598),
( 225, 300, 1, 0.24115038), ( 300, 375, 1, 0.12721095),
( 375, 450, 1, 0.1702189 ), ( 450, 525, 1, 0.78454107),
( 525, 601, 1, 0.09367431), ( 601, 676, 1, 0.0368483 ),
( 676, 751, 1, 0.4762352 ), ( 751, 826, 1, 0.2375712 ),
( 826, 902, 1, 0.17776855), ( 902, 977, 1, 0.01107241),
( 977, 1052, 1, 0.0118055 ), (1052, 1127, 1, 0.32023704),
(1127, 1203, 1, 0.09146244)],
dtype=[('idx_start', '<i8'), ('idx_end', '<i8'), ('is_leaf', '<i8'), ('radius', '<f8')]),
Why does the method get_tree_stats
have the unexpected behavior?
Solution
This question is not yet answered, be the first one who answer using the comment. Later the confirmed answer will be published as the solution.
This Question and Answer are collected from stackoverflow and tested by JTuto community, is licensed under the terms of CC BY-SA 2.5. - CC BY-SA 3.0. - CC BY-SA 4.0.