nanfunctions.py 58 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436143714381439144014411442144314441445144614471448144914501451145214531454145514561457145814591460146114621463146414651466146714681469147014711472147314741475147614771478147914801481148214831484148514861487148814891490149114921493149414951496149714981499150015011502150315041505150615071508150915101511151215131514151515161517151815191520152115221523152415251526152715281529153015311532153315341535153615371538153915401541154215431544154515461547154815491550155115521553155415551556155715581559156015611562156315641565156615671568156915701571157215731574157515761577157815791580158115821583158415851586158715881589159015911592159315941595159615971598159916001601160216031604160516061607160816091610161116121613161416151616161716181619162016211622162316241625162616271628162916301631163216331634163516361637163816391640164116421643164416451646164716481649165016511652165316541655165616571658165916601661166216631664166516661667166816691670167116721673167416751676
  1. """
  2. Functions that ignore NaN.
  3. Functions
  4. ---------
  5. - `nanmin` -- minimum non-NaN value
  6. - `nanmax` -- maximum non-NaN value
  7. - `nanargmin` -- index of minimum non-NaN value
  8. - `nanargmax` -- index of maximum non-NaN value
  9. - `nansum` -- sum of non-NaN values
  10. - `nanprod` -- product of non-NaN values
  11. - `nancumsum` -- cumulative sum of non-NaN values
  12. - `nancumprod` -- cumulative product of non-NaN values
  13. - `nanmean` -- mean of non-NaN values
  14. - `nanvar` -- variance of non-NaN values
  15. - `nanstd` -- standard deviation of non-NaN values
  16. - `nanmedian` -- median of non-NaN values
  17. - `nanquantile` -- qth quantile of non-NaN values
  18. - `nanpercentile` -- qth percentile of non-NaN values
  19. """
  20. import functools
  21. import warnings
  22. import numpy as np
  23. from numpy.lib import function_base
  24. from numpy.core import overrides
  25. array_function_dispatch = functools.partial(
  26. overrides.array_function_dispatch, module='numpy')
  27. __all__ = [
  28. 'nansum', 'nanmax', 'nanmin', 'nanargmax', 'nanargmin', 'nanmean',
  29. 'nanmedian', 'nanpercentile', 'nanvar', 'nanstd', 'nanprod',
  30. 'nancumsum', 'nancumprod', 'nanquantile'
  31. ]
  32. def _nan_mask(a, out=None):
  33. """
  34. Parameters
  35. ----------
  36. a : array-like
  37. Input array with at least 1 dimension.
  38. out : ndarray, optional
  39. Alternate output array in which to place the result. The default
  40. is ``None``; if provided, it must have the same shape as the
  41. expected output and will prevent the allocation of a new array.
  42. Returns
  43. -------
  44. y : bool ndarray or True
  45. A bool array where ``np.nan`` positions are marked with ``False``
  46. and other positions are marked with ``True``. If the type of ``a``
  47. is such that it can't possibly contain ``np.nan``, returns ``True``.
  48. """
  49. # we assume that a is an array for this private function
  50. if a.dtype.kind not in 'fc':
  51. return True
  52. y = np.isnan(a, out=out)
  53. y = np.invert(y, out=y)
  54. return y
  55. def _replace_nan(a, val):
  56. """
  57. If `a` is of inexact type, make a copy of `a`, replace NaNs with
  58. the `val` value, and return the copy together with a boolean mask
  59. marking the locations where NaNs were present. If `a` is not of
  60. inexact type, do nothing and return `a` together with a mask of None.
  61. Note that scalars will end up as array scalars, which is important
  62. for using the result as the value of the out argument in some
  63. operations.
  64. Parameters
  65. ----------
  66. a : array-like
  67. Input array.
  68. val : float
  69. NaN values are set to val before doing the operation.
  70. Returns
  71. -------
  72. y : ndarray
  73. If `a` is of inexact type, return a copy of `a` with the NaNs
  74. replaced by the fill value, otherwise return `a`.
  75. mask: {bool, None}
  76. If `a` is of inexact type, return a boolean mask marking locations of
  77. NaNs, otherwise return None.
  78. """
  79. a = np.asanyarray(a)
  80. if a.dtype == np.object_:
  81. # object arrays do not support `isnan` (gh-9009), so make a guess
  82. mask = np.not_equal(a, a, dtype=bool)
  83. elif issubclass(a.dtype.type, np.inexact):
  84. mask = np.isnan(a)
  85. else:
  86. mask = None
  87. if mask is not None:
  88. a = np.array(a, subok=True, copy=True)
  89. np.copyto(a, val, where=mask)
  90. return a, mask
  91. def _copyto(a, val, mask):
  92. """
  93. Replace values in `a` with NaN where `mask` is True. This differs from
  94. copyto in that it will deal with the case where `a` is a numpy scalar.
  95. Parameters
  96. ----------
  97. a : ndarray or numpy scalar
  98. Array or numpy scalar some of whose values are to be replaced
  99. by val.
  100. val : numpy scalar
  101. Value used a replacement.
  102. mask : ndarray, scalar
  103. Boolean array. Where True the corresponding element of `a` is
  104. replaced by `val`. Broadcasts.
  105. Returns
  106. -------
  107. res : ndarray, scalar
  108. Array with elements replaced or scalar `val`.
  109. """
  110. if isinstance(a, np.ndarray):
  111. np.copyto(a, val, where=mask, casting='unsafe')
  112. else:
  113. a = a.dtype.type(val)
  114. return a
  115. def _remove_nan_1d(arr1d, overwrite_input=False):
  116. """
  117. Equivalent to arr1d[~arr1d.isnan()], but in a different order
  118. Presumably faster as it incurs fewer copies
  119. Parameters
  120. ----------
  121. arr1d : ndarray
  122. Array to remove nans from
  123. overwrite_input : bool
  124. True if `arr1d` can be modified in place
  125. Returns
  126. -------
  127. res : ndarray
  128. Array with nan elements removed
  129. overwrite_input : bool
  130. True if `res` can be modified in place, given the constraint on the
  131. input
  132. """
  133. c = np.isnan(arr1d)
  134. s = np.nonzero(c)[0]
  135. if s.size == arr1d.size:
  136. warnings.warn("All-NaN slice encountered", RuntimeWarning,
  137. stacklevel=5)
  138. return arr1d[:0], True
  139. elif s.size == 0:
  140. return arr1d, overwrite_input
  141. else:
  142. if not overwrite_input:
  143. arr1d = arr1d.copy()
  144. # select non-nans at end of array
  145. enonan = arr1d[-s.size:][~c[-s.size:]]
  146. # fill nans in beginning of array with non-nans of end
  147. arr1d[s[:enonan.size]] = enonan
  148. return arr1d[:-s.size], True
  149. def _divide_by_count(a, b, out=None):
  150. """
  151. Compute a/b ignoring invalid results. If `a` is an array the division
  152. is done in place. If `a` is a scalar, then its type is preserved in the
  153. output. If out is None, then then a is used instead so that the
  154. division is in place. Note that this is only called with `a` an inexact
  155. type.
  156. Parameters
  157. ----------
  158. a : {ndarray, numpy scalar}
  159. Numerator. Expected to be of inexact type but not checked.
  160. b : {ndarray, numpy scalar}
  161. Denominator.
  162. out : ndarray, optional
  163. Alternate output array in which to place the result. The default
  164. is ``None``; if provided, it must have the same shape as the
  165. expected output, but the type will be cast if necessary.
  166. Returns
  167. -------
  168. ret : {ndarray, numpy scalar}
  169. The return value is a/b. If `a` was an ndarray the division is done
  170. in place. If `a` is a numpy scalar, the division preserves its type.
  171. """
  172. with np.errstate(invalid='ignore', divide='ignore'):
  173. if isinstance(a, np.ndarray):
  174. if out is None:
  175. return np.divide(a, b, out=a, casting='unsafe')
  176. else:
  177. return np.divide(a, b, out=out, casting='unsafe')
  178. else:
  179. if out is None:
  180. return a.dtype.type(a / b)
  181. else:
  182. # This is questionable, but currently a numpy scalar can
  183. # be output to a zero dimensional array.
  184. return np.divide(a, b, out=out, casting='unsafe')
  185. def _nanmin_dispatcher(a, axis=None, out=None, keepdims=None):
  186. return (a, out)
  187. @array_function_dispatch(_nanmin_dispatcher)
  188. def nanmin(a, axis=None, out=None, keepdims=np._NoValue):
  189. """
  190. Return minimum of an array or minimum along an axis, ignoring any NaNs.
  191. When all-NaN slices are encountered a ``RuntimeWarning`` is raised and
  192. Nan is returned for that slice.
  193. Parameters
  194. ----------
  195. a : array_like
  196. Array containing numbers whose minimum is desired. If `a` is not an
  197. array, a conversion is attempted.
  198. axis : {int, tuple of int, None}, optional
  199. Axis or axes along which the minimum is computed. The default is to compute
  200. the minimum of the flattened array.
  201. out : ndarray, optional
  202. Alternate output array in which to place the result. The default
  203. is ``None``; if provided, it must have the same shape as the
  204. expected output, but the type will be cast if necessary. See
  205. :ref:`ufuncs-output-type` for more details.
  206. .. versionadded:: 1.8.0
  207. keepdims : bool, optional
  208. If this is set to True, the axes which are reduced are left
  209. in the result as dimensions with size one. With this option,
  210. the result will broadcast correctly against the original `a`.
  211. If the value is anything but the default, then
  212. `keepdims` will be passed through to the `min` method
  213. of sub-classes of `ndarray`. If the sub-classes methods
  214. does not implement `keepdims` any exceptions will be raised.
  215. .. versionadded:: 1.8.0
  216. Returns
  217. -------
  218. nanmin : ndarray
  219. An array with the same shape as `a`, with the specified axis
  220. removed. If `a` is a 0-d array, or if axis is None, an ndarray
  221. scalar is returned. The same dtype as `a` is returned.
  222. See Also
  223. --------
  224. nanmax :
  225. The maximum value of an array along a given axis, ignoring any NaNs.
  226. amin :
  227. The minimum value of an array along a given axis, propagating any NaNs.
  228. fmin :
  229. Element-wise minimum of two arrays, ignoring any NaNs.
  230. minimum :
  231. Element-wise minimum of two arrays, propagating any NaNs.
  232. isnan :
  233. Shows which elements are Not a Number (NaN).
  234. isfinite:
  235. Shows which elements are neither NaN nor infinity.
  236. amax, fmax, maximum
  237. Notes
  238. -----
  239. NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
  240. (IEEE 754). This means that Not a Number is not equivalent to infinity.
  241. Positive infinity is treated as a very large number and negative
  242. infinity is treated as a very small (i.e. negative) number.
  243. If the input has a integer type the function is equivalent to np.min.
  244. Examples
  245. --------
  246. >>> a = np.array([[1, 2], [3, np.nan]])
  247. >>> np.nanmin(a)
  248. 1.0
  249. >>> np.nanmin(a, axis=0)
  250. array([1., 2.])
  251. >>> np.nanmin(a, axis=1)
  252. array([1., 3.])
  253. When positive infinity and negative infinity are present:
  254. >>> np.nanmin([1, 2, np.nan, np.inf])
  255. 1.0
  256. >>> np.nanmin([1, 2, np.nan, np.NINF])
  257. -inf
  258. """
  259. kwargs = {}
  260. if keepdims is not np._NoValue:
  261. kwargs['keepdims'] = keepdims
  262. if type(a) is np.ndarray and a.dtype != np.object_:
  263. # Fast, but not safe for subclasses of ndarray, or object arrays,
  264. # which do not implement isnan (gh-9009), or fmin correctly (gh-8975)
  265. res = np.fmin.reduce(a, axis=axis, out=out, **kwargs)
  266. if np.isnan(res).any():
  267. warnings.warn("All-NaN slice encountered", RuntimeWarning,
  268. stacklevel=3)
  269. else:
  270. # Slow, but safe for subclasses of ndarray
  271. a, mask = _replace_nan(a, +np.inf)
  272. res = np.amin(a, axis=axis, out=out, **kwargs)
  273. if mask is None:
  274. return res
  275. # Check for all-NaN axis
  276. mask = np.all(mask, axis=axis, **kwargs)
  277. if np.any(mask):
  278. res = _copyto(res, np.nan, mask)
  279. warnings.warn("All-NaN axis encountered", RuntimeWarning,
  280. stacklevel=3)
  281. return res
  282. def _nanmax_dispatcher(a, axis=None, out=None, keepdims=None):
  283. return (a, out)
  284. @array_function_dispatch(_nanmax_dispatcher)
  285. def nanmax(a, axis=None, out=None, keepdims=np._NoValue):
  286. """
  287. Return the maximum of an array or maximum along an axis, ignoring any
  288. NaNs. When all-NaN slices are encountered a ``RuntimeWarning`` is
  289. raised and NaN is returned for that slice.
  290. Parameters
  291. ----------
  292. a : array_like
  293. Array containing numbers whose maximum is desired. If `a` is not an
  294. array, a conversion is attempted.
  295. axis : {int, tuple of int, None}, optional
  296. Axis or axes along which the maximum is computed. The default is to compute
  297. the maximum of the flattened array.
  298. out : ndarray, optional
  299. Alternate output array in which to place the result. The default
  300. is ``None``; if provided, it must have the same shape as the
  301. expected output, but the type will be cast if necessary. See
  302. :ref:`ufuncs-output-type` for more details.
  303. .. versionadded:: 1.8.0
  304. keepdims : bool, optional
  305. If this is set to True, the axes which are reduced are left
  306. in the result as dimensions with size one. With this option,
  307. the result will broadcast correctly against the original `a`.
  308. If the value is anything but the default, then
  309. `keepdims` will be passed through to the `max` method
  310. of sub-classes of `ndarray`. If the sub-classes methods
  311. does not implement `keepdims` any exceptions will be raised.
  312. .. versionadded:: 1.8.0
  313. Returns
  314. -------
  315. nanmax : ndarray
  316. An array with the same shape as `a`, with the specified axis removed.
  317. If `a` is a 0-d array, or if axis is None, an ndarray scalar is
  318. returned. The same dtype as `a` is returned.
  319. See Also
  320. --------
  321. nanmin :
  322. The minimum value of an array along a given axis, ignoring any NaNs.
  323. amax :
  324. The maximum value of an array along a given axis, propagating any NaNs.
  325. fmax :
  326. Element-wise maximum of two arrays, ignoring any NaNs.
  327. maximum :
  328. Element-wise maximum of two arrays, propagating any NaNs.
  329. isnan :
  330. Shows which elements are Not a Number (NaN).
  331. isfinite:
  332. Shows which elements are neither NaN nor infinity.
  333. amin, fmin, minimum
  334. Notes
  335. -----
  336. NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
  337. (IEEE 754). This means that Not a Number is not equivalent to infinity.
  338. Positive infinity is treated as a very large number and negative
  339. infinity is treated as a very small (i.e. negative) number.
  340. If the input has a integer type the function is equivalent to np.max.
  341. Examples
  342. --------
  343. >>> a = np.array([[1, 2], [3, np.nan]])
  344. >>> np.nanmax(a)
  345. 3.0
  346. >>> np.nanmax(a, axis=0)
  347. array([3., 2.])
  348. >>> np.nanmax(a, axis=1)
  349. array([2., 3.])
  350. When positive infinity and negative infinity are present:
  351. >>> np.nanmax([1, 2, np.nan, np.NINF])
  352. 2.0
  353. >>> np.nanmax([1, 2, np.nan, np.inf])
  354. inf
  355. """
  356. kwargs = {}
  357. if keepdims is not np._NoValue:
  358. kwargs['keepdims'] = keepdims
  359. if type(a) is np.ndarray and a.dtype != np.object_:
  360. # Fast, but not safe for subclasses of ndarray, or object arrays,
  361. # which do not implement isnan (gh-9009), or fmax correctly (gh-8975)
  362. res = np.fmax.reduce(a, axis=axis, out=out, **kwargs)
  363. if np.isnan(res).any():
  364. warnings.warn("All-NaN slice encountered", RuntimeWarning,
  365. stacklevel=3)
  366. else:
  367. # Slow, but safe for subclasses of ndarray
  368. a, mask = _replace_nan(a, -np.inf)
  369. res = np.amax(a, axis=axis, out=out, **kwargs)
  370. if mask is None:
  371. return res
  372. # Check for all-NaN axis
  373. mask = np.all(mask, axis=axis, **kwargs)
  374. if np.any(mask):
  375. res = _copyto(res, np.nan, mask)
  376. warnings.warn("All-NaN axis encountered", RuntimeWarning,
  377. stacklevel=3)
  378. return res
  379. def _nanargmin_dispatcher(a, axis=None):
  380. return (a,)
  381. @array_function_dispatch(_nanargmin_dispatcher)
  382. def nanargmin(a, axis=None):
  383. """
  384. Return the indices of the minimum values in the specified axis ignoring
  385. NaNs. For all-NaN slices ``ValueError`` is raised. Warning: the results
  386. cannot be trusted if a slice contains only NaNs and Infs.
  387. Parameters
  388. ----------
  389. a : array_like
  390. Input data.
  391. axis : int, optional
  392. Axis along which to operate. By default flattened input is used.
  393. Returns
  394. -------
  395. index_array : ndarray
  396. An array of indices or a single index value.
  397. See Also
  398. --------
  399. argmin, nanargmax
  400. Examples
  401. --------
  402. >>> a = np.array([[np.nan, 4], [2, 3]])
  403. >>> np.argmin(a)
  404. 0
  405. >>> np.nanargmin(a)
  406. 2
  407. >>> np.nanargmin(a, axis=0)
  408. array([1, 1])
  409. >>> np.nanargmin(a, axis=1)
  410. array([1, 0])
  411. """
  412. a, mask = _replace_nan(a, np.inf)
  413. res = np.argmin(a, axis=axis)
  414. if mask is not None:
  415. mask = np.all(mask, axis=axis)
  416. if np.any(mask):
  417. raise ValueError("All-NaN slice encountered")
  418. return res
  419. def _nanargmax_dispatcher(a, axis=None):
  420. return (a,)
  421. @array_function_dispatch(_nanargmax_dispatcher)
  422. def nanargmax(a, axis=None):
  423. """
  424. Return the indices of the maximum values in the specified axis ignoring
  425. NaNs. For all-NaN slices ``ValueError`` is raised. Warning: the
  426. results cannot be trusted if a slice contains only NaNs and -Infs.
  427. Parameters
  428. ----------
  429. a : array_like
  430. Input data.
  431. axis : int, optional
  432. Axis along which to operate. By default flattened input is used.
  433. Returns
  434. -------
  435. index_array : ndarray
  436. An array of indices or a single index value.
  437. See Also
  438. --------
  439. argmax, nanargmin
  440. Examples
  441. --------
  442. >>> a = np.array([[np.nan, 4], [2, 3]])
  443. >>> np.argmax(a)
  444. 0
  445. >>> np.nanargmax(a)
  446. 1
  447. >>> np.nanargmax(a, axis=0)
  448. array([1, 0])
  449. >>> np.nanargmax(a, axis=1)
  450. array([1, 1])
  451. """
  452. a, mask = _replace_nan(a, -np.inf)
  453. res = np.argmax(a, axis=axis)
  454. if mask is not None:
  455. mask = np.all(mask, axis=axis)
  456. if np.any(mask):
  457. raise ValueError("All-NaN slice encountered")
  458. return res
  459. def _nansum_dispatcher(a, axis=None, dtype=None, out=None, keepdims=None):
  460. return (a, out)
  461. @array_function_dispatch(_nansum_dispatcher)
  462. def nansum(a, axis=None, dtype=None, out=None, keepdims=np._NoValue):
  463. """
  464. Return the sum of array elements over a given axis treating Not a
  465. Numbers (NaNs) as zero.
  466. In NumPy versions <= 1.9.0 Nan is returned for slices that are all-NaN or
  467. empty. In later versions zero is returned.
  468. Parameters
  469. ----------
  470. a : array_like
  471. Array containing numbers whose sum is desired. If `a` is not an
  472. array, a conversion is attempted.
  473. axis : {int, tuple of int, None}, optional
  474. Axis or axes along which the sum is computed. The default is to compute the
  475. sum of the flattened array.
  476. dtype : data-type, optional
  477. The type of the returned array and of the accumulator in which the
  478. elements are summed. By default, the dtype of `a` is used. An
  479. exception is when `a` has an integer type with less precision than
  480. the platform (u)intp. In that case, the default will be either
  481. (u)int32 or (u)int64 depending on whether the platform is 32 or 64
  482. bits. For inexact inputs, dtype must be inexact.
  483. .. versionadded:: 1.8.0
  484. out : ndarray, optional
  485. Alternate output array in which to place the result. The default
  486. is ``None``. If provided, it must have the same shape as the
  487. expected output, but the type will be cast if necessary. See
  488. :ref:`ufuncs-output-type` for more details. The casting of NaN to integer
  489. can yield unexpected results.
  490. .. versionadded:: 1.8.0
  491. keepdims : bool, optional
  492. If this is set to True, the axes which are reduced are left
  493. in the result as dimensions with size one. With this option,
  494. the result will broadcast correctly against the original `a`.
  495. If the value is anything but the default, then
  496. `keepdims` will be passed through to the `mean` or `sum` methods
  497. of sub-classes of `ndarray`. If the sub-classes methods
  498. does not implement `keepdims` any exceptions will be raised.
  499. .. versionadded:: 1.8.0
  500. Returns
  501. -------
  502. nansum : ndarray.
  503. A new array holding the result is returned unless `out` is
  504. specified, in which it is returned. The result has the same
  505. size as `a`, and the same shape as `a` if `axis` is not None
  506. or `a` is a 1-d array.
  507. See Also
  508. --------
  509. numpy.sum : Sum across array propagating NaNs.
  510. isnan : Show which elements are NaN.
  511. isfinite : Show which elements are not NaN or +/-inf.
  512. Notes
  513. -----
  514. If both positive and negative infinity are present, the sum will be Not
  515. A Number (NaN).
  516. Examples
  517. --------
  518. >>> np.nansum(1)
  519. 1
  520. >>> np.nansum([1])
  521. 1
  522. >>> np.nansum([1, np.nan])
  523. 1.0
  524. >>> a = np.array([[1, 1], [1, np.nan]])
  525. >>> np.nansum(a)
  526. 3.0
  527. >>> np.nansum(a, axis=0)
  528. array([2., 1.])
  529. >>> np.nansum([1, np.nan, np.inf])
  530. inf
  531. >>> np.nansum([1, np.nan, np.NINF])
  532. -inf
  533. >>> from numpy.testing import suppress_warnings
  534. >>> with suppress_warnings() as sup:
  535. ... sup.filter(RuntimeWarning)
  536. ... np.nansum([1, np.nan, np.inf, -np.inf]) # both +/- infinity present
  537. nan
  538. """
  539. a, mask = _replace_nan(a, 0)
  540. return np.sum(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
  541. def _nanprod_dispatcher(a, axis=None, dtype=None, out=None, keepdims=None):
  542. return (a, out)
  543. @array_function_dispatch(_nanprod_dispatcher)
  544. def nanprod(a, axis=None, dtype=None, out=None, keepdims=np._NoValue):
  545. """
  546. Return the product of array elements over a given axis treating Not a
  547. Numbers (NaNs) as ones.
  548. One is returned for slices that are all-NaN or empty.
  549. .. versionadded:: 1.10.0
  550. Parameters
  551. ----------
  552. a : array_like
  553. Array containing numbers whose product is desired. If `a` is not an
  554. array, a conversion is attempted.
  555. axis : {int, tuple of int, None}, optional
  556. Axis or axes along which the product is computed. The default is to compute
  557. the product of the flattened array.
  558. dtype : data-type, optional
  559. The type of the returned array and of the accumulator in which the
  560. elements are summed. By default, the dtype of `a` is used. An
  561. exception is when `a` has an integer type with less precision than
  562. the platform (u)intp. In that case, the default will be either
  563. (u)int32 or (u)int64 depending on whether the platform is 32 or 64
  564. bits. For inexact inputs, dtype must be inexact.
  565. out : ndarray, optional
  566. Alternate output array in which to place the result. The default
  567. is ``None``. If provided, it must have the same shape as the
  568. expected output, but the type will be cast if necessary. See
  569. :ref:`ufuncs-output-type` for more details. The casting of NaN to integer
  570. can yield unexpected results.
  571. keepdims : bool, optional
  572. If True, the axes which are reduced are left in the result as
  573. dimensions with size one. With this option, the result will
  574. broadcast correctly against the original `arr`.
  575. Returns
  576. -------
  577. nanprod : ndarray
  578. A new array holding the result is returned unless `out` is
  579. specified, in which case it is returned.
  580. See Also
  581. --------
  582. numpy.prod : Product across array propagating NaNs.
  583. isnan : Show which elements are NaN.
  584. Examples
  585. --------
  586. >>> np.nanprod(1)
  587. 1
  588. >>> np.nanprod([1])
  589. 1
  590. >>> np.nanprod([1, np.nan])
  591. 1.0
  592. >>> a = np.array([[1, 2], [3, np.nan]])
  593. >>> np.nanprod(a)
  594. 6.0
  595. >>> np.nanprod(a, axis=0)
  596. array([3., 2.])
  597. """
  598. a, mask = _replace_nan(a, 1)
  599. return np.prod(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
  600. def _nancumsum_dispatcher(a, axis=None, dtype=None, out=None):
  601. return (a, out)
  602. @array_function_dispatch(_nancumsum_dispatcher)
  603. def nancumsum(a, axis=None, dtype=None, out=None):
  604. """
  605. Return the cumulative sum of array elements over a given axis treating Not a
  606. Numbers (NaNs) as zero. The cumulative sum does not change when NaNs are
  607. encountered and leading NaNs are replaced by zeros.
  608. Zeros are returned for slices that are all-NaN or empty.
  609. .. versionadded:: 1.12.0
  610. Parameters
  611. ----------
  612. a : array_like
  613. Input array.
  614. axis : int, optional
  615. Axis along which the cumulative sum is computed. The default
  616. (None) is to compute the cumsum over the flattened array.
  617. dtype : dtype, optional
  618. Type of the returned array and of the accumulator in which the
  619. elements are summed. If `dtype` is not specified, it defaults
  620. to the dtype of `a`, unless `a` has an integer dtype with a
  621. precision less than that of the default platform integer. In
  622. that case, the default platform integer is used.
  623. out : ndarray, optional
  624. Alternative output array in which to place the result. It must
  625. have the same shape and buffer length as the expected output
  626. but the type will be cast if necessary. See :ref:`ufuncs-output-type` for
  627. more details.
  628. Returns
  629. -------
  630. nancumsum : ndarray.
  631. A new array holding the result is returned unless `out` is
  632. specified, in which it is returned. The result has the same
  633. size as `a`, and the same shape as `a` if `axis` is not None
  634. or `a` is a 1-d array.
  635. See Also
  636. --------
  637. numpy.cumsum : Cumulative sum across array propagating NaNs.
  638. isnan : Show which elements are NaN.
  639. Examples
  640. --------
  641. >>> np.nancumsum(1)
  642. array([1])
  643. >>> np.nancumsum([1])
  644. array([1])
  645. >>> np.nancumsum([1, np.nan])
  646. array([1., 1.])
  647. >>> a = np.array([[1, 2], [3, np.nan]])
  648. >>> np.nancumsum(a)
  649. array([1., 3., 6., 6.])
  650. >>> np.nancumsum(a, axis=0)
  651. array([[1., 2.],
  652. [4., 2.]])
  653. >>> np.nancumsum(a, axis=1)
  654. array([[1., 3.],
  655. [3., 3.]])
  656. """
  657. a, mask = _replace_nan(a, 0)
  658. return np.cumsum(a, axis=axis, dtype=dtype, out=out)
  659. def _nancumprod_dispatcher(a, axis=None, dtype=None, out=None):
  660. return (a, out)
  661. @array_function_dispatch(_nancumprod_dispatcher)
  662. def nancumprod(a, axis=None, dtype=None, out=None):
  663. """
  664. Return the cumulative product of array elements over a given axis treating Not a
  665. Numbers (NaNs) as one. The cumulative product does not change when NaNs are
  666. encountered and leading NaNs are replaced by ones.
  667. Ones are returned for slices that are all-NaN or empty.
  668. .. versionadded:: 1.12.0
  669. Parameters
  670. ----------
  671. a : array_like
  672. Input array.
  673. axis : int, optional
  674. Axis along which the cumulative product is computed. By default
  675. the input is flattened.
  676. dtype : dtype, optional
  677. Type of the returned array, as well as of the accumulator in which
  678. the elements are multiplied. If *dtype* is not specified, it
  679. defaults to the dtype of `a`, unless `a` has an integer dtype with
  680. a precision less than that of the default platform integer. In
  681. that case, the default platform integer is used instead.
  682. out : ndarray, optional
  683. Alternative output array in which to place the result. It must
  684. have the same shape and buffer length as the expected output
  685. but the type of the resulting values will be cast if necessary.
  686. Returns
  687. -------
  688. nancumprod : ndarray
  689. A new array holding the result is returned unless `out` is
  690. specified, in which case it is returned.
  691. See Also
  692. --------
  693. numpy.cumprod : Cumulative product across array propagating NaNs.
  694. isnan : Show which elements are NaN.
  695. Examples
  696. --------
  697. >>> np.nancumprod(1)
  698. array([1])
  699. >>> np.nancumprod([1])
  700. array([1])
  701. >>> np.nancumprod([1, np.nan])
  702. array([1., 1.])
  703. >>> a = np.array([[1, 2], [3, np.nan]])
  704. >>> np.nancumprod(a)
  705. array([1., 2., 6., 6.])
  706. >>> np.nancumprod(a, axis=0)
  707. array([[1., 2.],
  708. [3., 2.]])
  709. >>> np.nancumprod(a, axis=1)
  710. array([[1., 2.],
  711. [3., 3.]])
  712. """
  713. a, mask = _replace_nan(a, 1)
  714. return np.cumprod(a, axis=axis, dtype=dtype, out=out)
  715. def _nanmean_dispatcher(a, axis=None, dtype=None, out=None, keepdims=None):
  716. return (a, out)
  717. @array_function_dispatch(_nanmean_dispatcher)
  718. def nanmean(a, axis=None, dtype=None, out=None, keepdims=np._NoValue):
  719. """
  720. Compute the arithmetic mean along the specified axis, ignoring NaNs.
  721. Returns the average of the array elements. The average is taken over
  722. the flattened array by default, otherwise over the specified axis.
  723. `float64` intermediate and return values are used for integer inputs.
  724. For all-NaN slices, NaN is returned and a `RuntimeWarning` is raised.
  725. .. versionadded:: 1.8.0
  726. Parameters
  727. ----------
  728. a : array_like
  729. Array containing numbers whose mean is desired. If `a` is not an
  730. array, a conversion is attempted.
  731. axis : {int, tuple of int, None}, optional
  732. Axis or axes along which the means are computed. The default is to compute
  733. the mean of the flattened array.
  734. dtype : data-type, optional
  735. Type to use in computing the mean. For integer inputs, the default
  736. is `float64`; for inexact inputs, it is the same as the input
  737. dtype.
  738. out : ndarray, optional
  739. Alternate output array in which to place the result. The default
  740. is ``None``; if provided, it must have the same shape as the
  741. expected output, but the type will be cast if necessary. See
  742. :ref:`ufuncs-output-type` for more details.
  743. keepdims : bool, optional
  744. If this is set to True, the axes which are reduced are left
  745. in the result as dimensions with size one. With this option,
  746. the result will broadcast correctly against the original `a`.
  747. If the value is anything but the default, then
  748. `keepdims` will be passed through to the `mean` or `sum` methods
  749. of sub-classes of `ndarray`. If the sub-classes methods
  750. does not implement `keepdims` any exceptions will be raised.
  751. Returns
  752. -------
  753. m : ndarray, see dtype parameter above
  754. If `out=None`, returns a new array containing the mean values,
  755. otherwise a reference to the output array is returned. Nan is
  756. returned for slices that contain only NaNs.
  757. See Also
  758. --------
  759. average : Weighted average
  760. mean : Arithmetic mean taken while not ignoring NaNs
  761. var, nanvar
  762. Notes
  763. -----
  764. The arithmetic mean is the sum of the non-NaN elements along the axis
  765. divided by the number of non-NaN elements.
  766. Note that for floating-point input, the mean is computed using the same
  767. precision the input has. Depending on the input data, this can cause
  768. the results to be inaccurate, especially for `float32`. Specifying a
  769. higher-precision accumulator using the `dtype` keyword can alleviate
  770. this issue.
  771. Examples
  772. --------
  773. >>> a = np.array([[1, np.nan], [3, 4]])
  774. >>> np.nanmean(a)
  775. 2.6666666666666665
  776. >>> np.nanmean(a, axis=0)
  777. array([2., 4.])
  778. >>> np.nanmean(a, axis=1)
  779. array([1., 3.5]) # may vary
  780. """
  781. arr, mask = _replace_nan(a, 0)
  782. if mask is None:
  783. return np.mean(arr, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
  784. if dtype is not None:
  785. dtype = np.dtype(dtype)
  786. if dtype is not None and not issubclass(dtype.type, np.inexact):
  787. raise TypeError("If a is inexact, then dtype must be inexact")
  788. if out is not None and not issubclass(out.dtype.type, np.inexact):
  789. raise TypeError("If a is inexact, then out must be inexact")
  790. cnt = np.sum(~mask, axis=axis, dtype=np.intp, keepdims=keepdims)
  791. tot = np.sum(arr, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
  792. avg = _divide_by_count(tot, cnt, out=out)
  793. isbad = (cnt == 0)
  794. if isbad.any():
  795. warnings.warn("Mean of empty slice", RuntimeWarning, stacklevel=3)
  796. # NaN is the only possible bad value, so no further
  797. # action is needed to handle bad results.
  798. return avg
  799. def _nanmedian1d(arr1d, overwrite_input=False):
  800. """
  801. Private function for rank 1 arrays. Compute the median ignoring NaNs.
  802. See nanmedian for parameter usage
  803. """
  804. arr1d_parsed, overwrite_input = _remove_nan_1d(
  805. arr1d, overwrite_input=overwrite_input,
  806. )
  807. if arr1d_parsed.size == 0:
  808. # Ensure that a nan-esque scalar of the appropiate type (and unit)
  809. # is returned for `timedelta64` and `complexfloating`
  810. return arr1d[-1]
  811. return np.median(arr1d_parsed, overwrite_input=overwrite_input)
  812. def _nanmedian(a, axis=None, out=None, overwrite_input=False):
  813. """
  814. Private function that doesn't support extended axis or keepdims.
  815. These methods are extended to this function using _ureduce
  816. See nanmedian for parameter usage
  817. """
  818. if axis is None or a.ndim == 1:
  819. part = a.ravel()
  820. if out is None:
  821. return _nanmedian1d(part, overwrite_input)
  822. else:
  823. out[...] = _nanmedian1d(part, overwrite_input)
  824. return out
  825. else:
  826. # for small medians use sort + indexing which is still faster than
  827. # apply_along_axis
  828. # benchmarked with shuffled (50, 50, x) containing a few NaN
  829. if a.shape[axis] < 600:
  830. return _nanmedian_small(a, axis, out, overwrite_input)
  831. result = np.apply_along_axis(_nanmedian1d, axis, a, overwrite_input)
  832. if out is not None:
  833. out[...] = result
  834. return result
  835. def _nanmedian_small(a, axis=None, out=None, overwrite_input=False):
  836. """
  837. sort + indexing median, faster for small medians along multiple
  838. dimensions due to the high overhead of apply_along_axis
  839. see nanmedian for parameter usage
  840. """
  841. a = np.ma.masked_array(a, np.isnan(a))
  842. m = np.ma.median(a, axis=axis, overwrite_input=overwrite_input)
  843. for i in range(np.count_nonzero(m.mask.ravel())):
  844. warnings.warn("All-NaN slice encountered", RuntimeWarning,
  845. stacklevel=4)
  846. fill_value = np.timedelta64("NaT") if m.dtype.kind == "m" else np.nan
  847. if out is not None:
  848. out[...] = m.filled(fill_value)
  849. return out
  850. return m.filled(fill_value)
  851. def _nanmedian_dispatcher(
  852. a, axis=None, out=None, overwrite_input=None, keepdims=None):
  853. return (a, out)
  854. @array_function_dispatch(_nanmedian_dispatcher)
  855. def nanmedian(a, axis=None, out=None, overwrite_input=False, keepdims=np._NoValue):
  856. """
  857. Compute the median along the specified axis, while ignoring NaNs.
  858. Returns the median of the array elements.
  859. .. versionadded:: 1.9.0
  860. Parameters
  861. ----------
  862. a : array_like
  863. Input array or object that can be converted to an array.
  864. axis : {int, sequence of int, None}, optional
  865. Axis or axes along which the medians are computed. The default
  866. is to compute the median along a flattened version of the array.
  867. A sequence of axes is supported since version 1.9.0.
  868. out : ndarray, optional
  869. Alternative output array in which to place the result. It must
  870. have the same shape and buffer length as the expected output,
  871. but the type (of the output) will be cast if necessary.
  872. overwrite_input : bool, optional
  873. If True, then allow use of memory of input array `a` for
  874. calculations. The input array will be modified by the call to
  875. `median`. This will save memory when you do not need to preserve
  876. the contents of the input array. Treat the input as undefined,
  877. but it will probably be fully or partially sorted. Default is
  878. False. If `overwrite_input` is ``True`` and `a` is not already an
  879. `ndarray`, an error will be raised.
  880. keepdims : bool, optional
  881. If this is set to True, the axes which are reduced are left
  882. in the result as dimensions with size one. With this option,
  883. the result will broadcast correctly against the original `a`.
  884. If this is anything but the default value it will be passed
  885. through (in the special case of an empty array) to the
  886. `mean` function of the underlying array. If the array is
  887. a sub-class and `mean` does not have the kwarg `keepdims` this
  888. will raise a RuntimeError.
  889. Returns
  890. -------
  891. median : ndarray
  892. A new array holding the result. If the input contains integers
  893. or floats smaller than ``float64``, then the output data-type is
  894. ``np.float64``. Otherwise, the data-type of the output is the
  895. same as that of the input. If `out` is specified, that array is
  896. returned instead.
  897. See Also
  898. --------
  899. mean, median, percentile
  900. Notes
  901. -----
  902. Given a vector ``V`` of length ``N``, the median of ``V`` is the
  903. middle value of a sorted copy of ``V``, ``V_sorted`` - i.e.,
  904. ``V_sorted[(N-1)/2]``, when ``N`` is odd and the average of the two
  905. middle values of ``V_sorted`` when ``N`` is even.
  906. Examples
  907. --------
  908. >>> a = np.array([[10.0, 7, 4], [3, 2, 1]])
  909. >>> a[0, 1] = np.nan
  910. >>> a
  911. array([[10., nan, 4.],
  912. [ 3., 2., 1.]])
  913. >>> np.median(a)
  914. nan
  915. >>> np.nanmedian(a)
  916. 3.0
  917. >>> np.nanmedian(a, axis=0)
  918. array([6.5, 2. , 2.5])
  919. >>> np.median(a, axis=1)
  920. array([nan, 2.])
  921. >>> b = a.copy()
  922. >>> np.nanmedian(b, axis=1, overwrite_input=True)
  923. array([7., 2.])
  924. >>> assert not np.all(a==b)
  925. >>> b = a.copy()
  926. >>> np.nanmedian(b, axis=None, overwrite_input=True)
  927. 3.0
  928. >>> assert not np.all(a==b)
  929. """
  930. a = np.asanyarray(a)
  931. # apply_along_axis in _nanmedian doesn't handle empty arrays well,
  932. # so deal them upfront
  933. if a.size == 0:
  934. return np.nanmean(a, axis, out=out, keepdims=keepdims)
  935. r, k = function_base._ureduce(a, func=_nanmedian, axis=axis, out=out,
  936. overwrite_input=overwrite_input)
  937. if keepdims and keepdims is not np._NoValue:
  938. return r.reshape(k)
  939. else:
  940. return r
  941. def _nanpercentile_dispatcher(a, q, axis=None, out=None, overwrite_input=None,
  942. interpolation=None, keepdims=None):
  943. return (a, q, out)
  944. @array_function_dispatch(_nanpercentile_dispatcher)
  945. def nanpercentile(a, q, axis=None, out=None, overwrite_input=False,
  946. interpolation='linear', keepdims=np._NoValue):
  947. """
  948. Compute the qth percentile of the data along the specified axis,
  949. while ignoring nan values.
  950. Returns the qth percentile(s) of the array elements.
  951. .. versionadded:: 1.9.0
  952. Parameters
  953. ----------
  954. a : array_like
  955. Input array or object that can be converted to an array, containing
  956. nan values to be ignored.
  957. q : array_like of float
  958. Percentile or sequence of percentiles to compute, which must be between
  959. 0 and 100 inclusive.
  960. axis : {int, tuple of int, None}, optional
  961. Axis or axes along which the percentiles are computed. The
  962. default is to compute the percentile(s) along a flattened
  963. version of the array.
  964. out : ndarray, optional
  965. Alternative output array in which to place the result. It must
  966. have the same shape and buffer length as the expected output,
  967. but the type (of the output) will be cast if necessary.
  968. overwrite_input : bool, optional
  969. If True, then allow the input array `a` to be modified by intermediate
  970. calculations, to save memory. In this case, the contents of the input
  971. `a` after this function completes is undefined.
  972. interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
  973. This optional parameter specifies the interpolation method to
  974. use when the desired percentile lies between two data points
  975. ``i < j``:
  976. * 'linear': ``i + (j - i) * fraction``, where ``fraction``
  977. is the fractional part of the index surrounded by ``i``
  978. and ``j``.
  979. * 'lower': ``i``.
  980. * 'higher': ``j``.
  981. * 'nearest': ``i`` or ``j``, whichever is nearest.
  982. * 'midpoint': ``(i + j) / 2``.
  983. keepdims : bool, optional
  984. If this is set to True, the axes which are reduced are left in
  985. the result as dimensions with size one. With this option, the
  986. result will broadcast correctly against the original array `a`.
  987. If this is anything but the default value it will be passed
  988. through (in the special case of an empty array) to the
  989. `mean` function of the underlying array. If the array is
  990. a sub-class and `mean` does not have the kwarg `keepdims` this
  991. will raise a RuntimeError.
  992. Returns
  993. -------
  994. percentile : scalar or ndarray
  995. If `q` is a single percentile and `axis=None`, then the result
  996. is a scalar. If multiple percentiles are given, first axis of
  997. the result corresponds to the percentiles. The other axes are
  998. the axes that remain after the reduction of `a`. If the input
  999. contains integers or floats smaller than ``float64``, the output
  1000. data-type is ``float64``. Otherwise, the output data-type is the
  1001. same as that of the input. If `out` is specified, that array is
  1002. returned instead.
  1003. See Also
  1004. --------
  1005. nanmean
  1006. nanmedian : equivalent to ``nanpercentile(..., 50)``
  1007. percentile, median, mean
  1008. nanquantile : equivalent to nanpercentile, but with q in the range [0, 1].
  1009. Notes
  1010. -----
  1011. Given a vector ``V`` of length ``N``, the ``q``-th percentile of
  1012. ``V`` is the value ``q/100`` of the way from the minimum to the
  1013. maximum in a sorted copy of ``V``. The values and distances of
  1014. the two nearest neighbors as well as the `interpolation` parameter
  1015. will determine the percentile if the normalized ranking does not
  1016. match the location of ``q`` exactly. This function is the same as
  1017. the median if ``q=50``, the same as the minimum if ``q=0`` and the
  1018. same as the maximum if ``q=100``.
  1019. Examples
  1020. --------
  1021. >>> a = np.array([[10., 7., 4.], [3., 2., 1.]])
  1022. >>> a[0][1] = np.nan
  1023. >>> a
  1024. array([[10., nan, 4.],
  1025. [ 3., 2., 1.]])
  1026. >>> np.percentile(a, 50)
  1027. nan
  1028. >>> np.nanpercentile(a, 50)
  1029. 3.0
  1030. >>> np.nanpercentile(a, 50, axis=0)
  1031. array([6.5, 2. , 2.5])
  1032. >>> np.nanpercentile(a, 50, axis=1, keepdims=True)
  1033. array([[7.],
  1034. [2.]])
  1035. >>> m = np.nanpercentile(a, 50, axis=0)
  1036. >>> out = np.zeros_like(m)
  1037. >>> np.nanpercentile(a, 50, axis=0, out=out)
  1038. array([6.5, 2. , 2.5])
  1039. >>> m
  1040. array([6.5, 2. , 2.5])
  1041. >>> b = a.copy()
  1042. >>> np.nanpercentile(b, 50, axis=1, overwrite_input=True)
  1043. array([7., 2.])
  1044. >>> assert not np.all(a==b)
  1045. """
  1046. a = np.asanyarray(a)
  1047. q = np.true_divide(q, 100.0) # handles the asarray for us too
  1048. if not function_base._quantile_is_valid(q):
  1049. raise ValueError("Percentiles must be in the range [0, 100]")
  1050. return _nanquantile_unchecked(
  1051. a, q, axis, out, overwrite_input, interpolation, keepdims)
  1052. def _nanquantile_dispatcher(a, q, axis=None, out=None, overwrite_input=None,
  1053. interpolation=None, keepdims=None):
  1054. return (a, q, out)
  1055. @array_function_dispatch(_nanquantile_dispatcher)
  1056. def nanquantile(a, q, axis=None, out=None, overwrite_input=False,
  1057. interpolation='linear', keepdims=np._NoValue):
  1058. """
  1059. Compute the qth quantile of the data along the specified axis,
  1060. while ignoring nan values.
  1061. Returns the qth quantile(s) of the array elements.
  1062. .. versionadded:: 1.15.0
  1063. Parameters
  1064. ----------
  1065. a : array_like
  1066. Input array or object that can be converted to an array, containing
  1067. nan values to be ignored
  1068. q : array_like of float
  1069. Quantile or sequence of quantiles to compute, which must be between
  1070. 0 and 1 inclusive.
  1071. axis : {int, tuple of int, None}, optional
  1072. Axis or axes along which the quantiles are computed. The
  1073. default is to compute the quantile(s) along a flattened
  1074. version of the array.
  1075. out : ndarray, optional
  1076. Alternative output array in which to place the result. It must
  1077. have the same shape and buffer length as the expected output,
  1078. but the type (of the output) will be cast if necessary.
  1079. overwrite_input : bool, optional
  1080. If True, then allow the input array `a` to be modified by intermediate
  1081. calculations, to save memory. In this case, the contents of the input
  1082. `a` after this function completes is undefined.
  1083. interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
  1084. This optional parameter specifies the interpolation method to
  1085. use when the desired quantile lies between two data points
  1086. ``i < j``:
  1087. * linear: ``i + (j - i) * fraction``, where ``fraction``
  1088. is the fractional part of the index surrounded by ``i``
  1089. and ``j``.
  1090. * lower: ``i``.
  1091. * higher: ``j``.
  1092. * nearest: ``i`` or ``j``, whichever is nearest.
  1093. * midpoint: ``(i + j) / 2``.
  1094. keepdims : bool, optional
  1095. If this is set to True, the axes which are reduced are left in
  1096. the result as dimensions with size one. With this option, the
  1097. result will broadcast correctly against the original array `a`.
  1098. If this is anything but the default value it will be passed
  1099. through (in the special case of an empty array) to the
  1100. `mean` function of the underlying array. If the array is
  1101. a sub-class and `mean` does not have the kwarg `keepdims` this
  1102. will raise a RuntimeError.
  1103. Returns
  1104. -------
  1105. quantile : scalar or ndarray
  1106. If `q` is a single percentile and `axis=None`, then the result
  1107. is a scalar. If multiple quantiles are given, first axis of
  1108. the result corresponds to the quantiles. The other axes are
  1109. the axes that remain after the reduction of `a`. If the input
  1110. contains integers or floats smaller than ``float64``, the output
  1111. data-type is ``float64``. Otherwise, the output data-type is the
  1112. same as that of the input. If `out` is specified, that array is
  1113. returned instead.
  1114. See Also
  1115. --------
  1116. quantile
  1117. nanmean, nanmedian
  1118. nanmedian : equivalent to ``nanquantile(..., 0.5)``
  1119. nanpercentile : same as nanquantile, but with q in the range [0, 100].
  1120. Examples
  1121. --------
  1122. >>> a = np.array([[10., 7., 4.], [3., 2., 1.]])
  1123. >>> a[0][1] = np.nan
  1124. >>> a
  1125. array([[10., nan, 4.],
  1126. [ 3., 2., 1.]])
  1127. >>> np.quantile(a, 0.5)
  1128. nan
  1129. >>> np.nanquantile(a, 0.5)
  1130. 3.0
  1131. >>> np.nanquantile(a, 0.5, axis=0)
  1132. array([6.5, 2. , 2.5])
  1133. >>> np.nanquantile(a, 0.5, axis=1, keepdims=True)
  1134. array([[7.],
  1135. [2.]])
  1136. >>> m = np.nanquantile(a, 0.5, axis=0)
  1137. >>> out = np.zeros_like(m)
  1138. >>> np.nanquantile(a, 0.5, axis=0, out=out)
  1139. array([6.5, 2. , 2.5])
  1140. >>> m
  1141. array([6.5, 2. , 2.5])
  1142. >>> b = a.copy()
  1143. >>> np.nanquantile(b, 0.5, axis=1, overwrite_input=True)
  1144. array([7., 2.])
  1145. >>> assert not np.all(a==b)
  1146. """
  1147. a = np.asanyarray(a)
  1148. q = np.asanyarray(q)
  1149. if not function_base._quantile_is_valid(q):
  1150. raise ValueError("Quantiles must be in the range [0, 1]")
  1151. return _nanquantile_unchecked(
  1152. a, q, axis, out, overwrite_input, interpolation, keepdims)
  1153. def _nanquantile_unchecked(a, q, axis=None, out=None, overwrite_input=False,
  1154. interpolation='linear', keepdims=np._NoValue):
  1155. """Assumes that q is in [0, 1], and is an ndarray"""
  1156. # apply_along_axis in _nanpercentile doesn't handle empty arrays well,
  1157. # so deal them upfront
  1158. if a.size == 0:
  1159. return np.nanmean(a, axis, out=out, keepdims=keepdims)
  1160. r, k = function_base._ureduce(
  1161. a, func=_nanquantile_ureduce_func, q=q, axis=axis, out=out,
  1162. overwrite_input=overwrite_input, interpolation=interpolation
  1163. )
  1164. if keepdims and keepdims is not np._NoValue:
  1165. return r.reshape(q.shape + k)
  1166. else:
  1167. return r
  1168. def _nanquantile_ureduce_func(a, q, axis=None, out=None, overwrite_input=False,
  1169. interpolation='linear'):
  1170. """
  1171. Private function that doesn't support extended axis or keepdims.
  1172. These methods are extended to this function using _ureduce
  1173. See nanpercentile for parameter usage
  1174. """
  1175. if axis is None or a.ndim == 1:
  1176. part = a.ravel()
  1177. result = _nanquantile_1d(part, q, overwrite_input, interpolation)
  1178. else:
  1179. result = np.apply_along_axis(_nanquantile_1d, axis, a, q,
  1180. overwrite_input, interpolation)
  1181. # apply_along_axis fills in collapsed axis with results.
  1182. # Move that axis to the beginning to match percentile's
  1183. # convention.
  1184. if q.ndim != 0:
  1185. result = np.moveaxis(result, axis, 0)
  1186. if out is not None:
  1187. out[...] = result
  1188. return result
  1189. def _nanquantile_1d(arr1d, q, overwrite_input=False, interpolation='linear'):
  1190. """
  1191. Private function for rank 1 arrays. Compute quantile ignoring NaNs.
  1192. See nanpercentile for parameter usage
  1193. """
  1194. arr1d, overwrite_input = _remove_nan_1d(arr1d,
  1195. overwrite_input=overwrite_input)
  1196. if arr1d.size == 0:
  1197. return np.full(q.shape, np.nan)[()] # convert to scalar
  1198. return function_base._quantile_unchecked(
  1199. arr1d, q, overwrite_input=overwrite_input, interpolation=interpolation)
  1200. def _nanvar_dispatcher(
  1201. a, axis=None, dtype=None, out=None, ddof=None, keepdims=None):
  1202. return (a, out)
  1203. @array_function_dispatch(_nanvar_dispatcher)
  1204. def nanvar(a, axis=None, dtype=None, out=None, ddof=0, keepdims=np._NoValue):
  1205. """
  1206. Compute the variance along the specified axis, while ignoring NaNs.
  1207. Returns the variance of the array elements, a measure of the spread of
  1208. a distribution. The variance is computed for the flattened array by
  1209. default, otherwise over the specified axis.
  1210. For all-NaN slices or slices with zero degrees of freedom, NaN is
  1211. returned and a `RuntimeWarning` is raised.
  1212. .. versionadded:: 1.8.0
  1213. Parameters
  1214. ----------
  1215. a : array_like
  1216. Array containing numbers whose variance is desired. If `a` is not an
  1217. array, a conversion is attempted.
  1218. axis : {int, tuple of int, None}, optional
  1219. Axis or axes along which the variance is computed. The default is to compute
  1220. the variance of the flattened array.
  1221. dtype : data-type, optional
  1222. Type to use in computing the variance. For arrays of integer type
  1223. the default is `float64`; for arrays of float types it is the same as
  1224. the array type.
  1225. out : ndarray, optional
  1226. Alternate output array in which to place the result. It must have
  1227. the same shape as the expected output, but the type is cast if
  1228. necessary.
  1229. ddof : int, optional
  1230. "Delta Degrees of Freedom": the divisor used in the calculation is
  1231. ``N - ddof``, where ``N`` represents the number of non-NaN
  1232. elements. By default `ddof` is zero.
  1233. keepdims : bool, optional
  1234. If this is set to True, the axes which are reduced are left
  1235. in the result as dimensions with size one. With this option,
  1236. the result will broadcast correctly against the original `a`.
  1237. Returns
  1238. -------
  1239. variance : ndarray, see dtype parameter above
  1240. If `out` is None, return a new array containing the variance,
  1241. otherwise return a reference to the output array. If ddof is >= the
  1242. number of non-NaN elements in a slice or the slice contains only
  1243. NaNs, then the result for that slice is NaN.
  1244. See Also
  1245. --------
  1246. std : Standard deviation
  1247. mean : Average
  1248. var : Variance while not ignoring NaNs
  1249. nanstd, nanmean
  1250. :ref:`ufuncs-output-type`
  1251. Notes
  1252. -----
  1253. The variance is the average of the squared deviations from the mean,
  1254. i.e., ``var = mean(abs(x - x.mean())**2)``.
  1255. The mean is normally calculated as ``x.sum() / N``, where ``N = len(x)``.
  1256. If, however, `ddof` is specified, the divisor ``N - ddof`` is used
  1257. instead. In standard statistical practice, ``ddof=1`` provides an
  1258. unbiased estimator of the variance of a hypothetical infinite
  1259. population. ``ddof=0`` provides a maximum likelihood estimate of the
  1260. variance for normally distributed variables.
  1261. Note that for complex numbers, the absolute value is taken before
  1262. squaring, so that the result is always real and nonnegative.
  1263. For floating-point input, the variance is computed using the same
  1264. precision the input has. Depending on the input data, this can cause
  1265. the results to be inaccurate, especially for `float32` (see example
  1266. below). Specifying a higher-accuracy accumulator using the ``dtype``
  1267. keyword can alleviate this issue.
  1268. For this function to work on sub-classes of ndarray, they must define
  1269. `sum` with the kwarg `keepdims`
  1270. Examples
  1271. --------
  1272. >>> a = np.array([[1, np.nan], [3, 4]])
  1273. >>> np.nanvar(a)
  1274. 1.5555555555555554
  1275. >>> np.nanvar(a, axis=0)
  1276. array([1., 0.])
  1277. >>> np.nanvar(a, axis=1)
  1278. array([0., 0.25]) # may vary
  1279. """
  1280. arr, mask = _replace_nan(a, 0)
  1281. if mask is None:
  1282. return np.var(arr, axis=axis, dtype=dtype, out=out, ddof=ddof,
  1283. keepdims=keepdims)
  1284. if dtype is not None:
  1285. dtype = np.dtype(dtype)
  1286. if dtype is not None and not issubclass(dtype.type, np.inexact):
  1287. raise TypeError("If a is inexact, then dtype must be inexact")
  1288. if out is not None and not issubclass(out.dtype.type, np.inexact):
  1289. raise TypeError("If a is inexact, then out must be inexact")
  1290. # Compute mean
  1291. if type(arr) is np.matrix:
  1292. _keepdims = np._NoValue
  1293. else:
  1294. _keepdims = True
  1295. # we need to special case matrix for reverse compatibility
  1296. # in order for this to work, these sums need to be called with
  1297. # keepdims=True, however matrix now raises an error in this case, but
  1298. # the reason that it drops the keepdims kwarg is to force keepdims=True
  1299. # so this used to work by serendipity.
  1300. cnt = np.sum(~mask, axis=axis, dtype=np.intp, keepdims=_keepdims)
  1301. avg = np.sum(arr, axis=axis, dtype=dtype, keepdims=_keepdims)
  1302. avg = _divide_by_count(avg, cnt)
  1303. # Compute squared deviation from mean.
  1304. np.subtract(arr, avg, out=arr, casting='unsafe')
  1305. arr = _copyto(arr, 0, mask)
  1306. if issubclass(arr.dtype.type, np.complexfloating):
  1307. sqr = np.multiply(arr, arr.conj(), out=arr).real
  1308. else:
  1309. sqr = np.multiply(arr, arr, out=arr)
  1310. # Compute variance.
  1311. var = np.sum(sqr, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
  1312. if var.ndim < cnt.ndim:
  1313. # Subclasses of ndarray may ignore keepdims, so check here.
  1314. cnt = cnt.squeeze(axis)
  1315. dof = cnt - ddof
  1316. var = _divide_by_count(var, dof)
  1317. isbad = (dof <= 0)
  1318. if np.any(isbad):
  1319. warnings.warn("Degrees of freedom <= 0 for slice.", RuntimeWarning,
  1320. stacklevel=3)
  1321. # NaN, inf, or negative numbers are all possible bad
  1322. # values, so explicitly replace them with NaN.
  1323. var = _copyto(var, np.nan, isbad)
  1324. return var
  1325. def _nanstd_dispatcher(
  1326. a, axis=None, dtype=None, out=None, ddof=None, keepdims=None):
  1327. return (a, out)
  1328. @array_function_dispatch(_nanstd_dispatcher)
  1329. def nanstd(a, axis=None, dtype=None, out=None, ddof=0, keepdims=np._NoValue):
  1330. """
  1331. Compute the standard deviation along the specified axis, while
  1332. ignoring NaNs.
  1333. Returns the standard deviation, a measure of the spread of a
  1334. distribution, of the non-NaN array elements. The standard deviation is
  1335. computed for the flattened array by default, otherwise over the
  1336. specified axis.
  1337. For all-NaN slices or slices with zero degrees of freedom, NaN is
  1338. returned and a `RuntimeWarning` is raised.
  1339. .. versionadded:: 1.8.0
  1340. Parameters
  1341. ----------
  1342. a : array_like
  1343. Calculate the standard deviation of the non-NaN values.
  1344. axis : {int, tuple of int, None}, optional
  1345. Axis or axes along which the standard deviation is computed. The default is
  1346. to compute the standard deviation of the flattened array.
  1347. dtype : dtype, optional
  1348. Type to use in computing the standard deviation. For arrays of
  1349. integer type the default is float64, for arrays of float types it
  1350. is the same as the array type.
  1351. out : ndarray, optional
  1352. Alternative output array in which to place the result. It must have
  1353. the same shape as the expected output but the type (of the
  1354. calculated values) will be cast if necessary.
  1355. ddof : int, optional
  1356. Means Delta Degrees of Freedom. The divisor used in calculations
  1357. is ``N - ddof``, where ``N`` represents the number of non-NaN
  1358. elements. By default `ddof` is zero.
  1359. keepdims : bool, optional
  1360. If this is set to True, the axes which are reduced are left
  1361. in the result as dimensions with size one. With this option,
  1362. the result will broadcast correctly against the original `a`.
  1363. If this value is anything but the default it is passed through
  1364. as-is to the relevant functions of the sub-classes. If these
  1365. functions do not have a `keepdims` kwarg, a RuntimeError will
  1366. be raised.
  1367. Returns
  1368. -------
  1369. standard_deviation : ndarray, see dtype parameter above.
  1370. If `out` is None, return a new array containing the standard
  1371. deviation, otherwise return a reference to the output array. If
  1372. ddof is >= the number of non-NaN elements in a slice or the slice
  1373. contains only NaNs, then the result for that slice is NaN.
  1374. See Also
  1375. --------
  1376. var, mean, std
  1377. nanvar, nanmean
  1378. :ref:`ufuncs-output-type`
  1379. Notes
  1380. -----
  1381. The standard deviation is the square root of the average of the squared
  1382. deviations from the mean: ``std = sqrt(mean(abs(x - x.mean())**2))``.
  1383. The average squared deviation is normally calculated as
  1384. ``x.sum() / N``, where ``N = len(x)``. If, however, `ddof` is
  1385. specified, the divisor ``N - ddof`` is used instead. In standard
  1386. statistical practice, ``ddof=1`` provides an unbiased estimator of the
  1387. variance of the infinite population. ``ddof=0`` provides a maximum
  1388. likelihood estimate of the variance for normally distributed variables.
  1389. The standard deviation computed in this function is the square root of
  1390. the estimated variance, so even with ``ddof=1``, it will not be an
  1391. unbiased estimate of the standard deviation per se.
  1392. Note that, for complex numbers, `std` takes the absolute value before
  1393. squaring, so that the result is always real and nonnegative.
  1394. For floating-point input, the *std* is computed using the same
  1395. precision the input has. Depending on the input data, this can cause
  1396. the results to be inaccurate, especially for float32 (see example
  1397. below). Specifying a higher-accuracy accumulator using the `dtype`
  1398. keyword can alleviate this issue.
  1399. Examples
  1400. --------
  1401. >>> a = np.array([[1, np.nan], [3, 4]])
  1402. >>> np.nanstd(a)
  1403. 1.247219128924647
  1404. >>> np.nanstd(a, axis=0)
  1405. array([1., 0.])
  1406. >>> np.nanstd(a, axis=1)
  1407. array([0., 0.5]) # may vary
  1408. """
  1409. var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  1410. keepdims=keepdims)
  1411. if isinstance(var, np.ndarray):
  1412. std = np.sqrt(var, out=var)
  1413. else:
  1414. std = var.dtype.type(np.sqrt(var))
  1415. return std