Hi there, here you can find some notes

about data science and (un)related things. Mostly for myself.

Setting up NextCloud on Raspberry Pi 4 using k3s

Introduction I was setting up a NextCloud instance on my Raspberry Pi 4, using k3s, and found out that there are quite some step-by-step guides on how to do that, none of them fully addressed all the issues I had, so I decided to write yet another guide on how to that. Mostly for myself, but maybe it will be useful for someone else. In particular, I faced the following issues: ...

November 20, 2023

OneDrive on Linux

OneDrive sync on Linux There is no official client for OneDrive for Linux, but there are some open source alternatives, e.g.: OneDrive Client for Linux Rclone OneDrive Client for Linux TL;DR does what you expect, available only for OneDrive. Syncs a local folder with remote OneDrive: monitors changes both locally and remotely and synchronizes them. In short, does pretty much what you expect from a file hosting and syncing service, similar to native clients. ...

February 11, 2021

Listing files and folders sorted by size

Sort files and directories by size on disk The following command sorts files and directories in descending order by their disk usage: du -hs * | sort -rh Explanation du du summarizes disk usage of files, and for directories it summarizes them recursively. The -s option tells du to display “only a total for each argument”. Without it, du also displays recursively the sizes of each nested directory. So for the following file structure ...

November 15, 2020

pathos.multiprocessing

Multiprocessing in Python Although Python is not very well suited for parallel programming, sometimes it could be useful. If it’s a computation, then we are probably better off using something like Dask, Numba, etc. But if it’s not computations, then there is a built-in solution in Python: multiprocessing. We’ll stick with computations for examples though, since their are simpler. Comparison of parallel and not parallel A quick illustration of why parallelization is great when the problem is embarrassingly parallel. ...

August 9, 2020

Acceptance-rejection method for generating random variables

Acceptance-rejection method is a method for generating samples from a distribution, for which the probability density function is known, but inverse cumulative probability function is not known, and thus, using the inverse CDF method is not possible. Although there is quite a lot of information on the topic available, I will try to explain the method the way that I (a.k.a 5-year-old) understand. Idea Majorizing distribution Let us say that we want to draw numbers from some distribution1 $ f(x) $—target distribution—but we have only distribution $ g(x) $, such that by multiplying it with some constant $ c $ it is always larger than $ f(x) $, ...

February 10, 2017

LaTeX fonts in R Markdown plots

Let us be honest, one of the reasons we use R Markdown to compile documents into PDF is the aesthetic pleasure provided by LaTeX. However, all the efforts can be ruined by wrong fonts in plots that are not the same as in the rest of the document. For example Well, these are some ugly fonts (not by itself, but in combination with the rest of the document). Would not it be much better to have something like this? ...

February 3, 2017

Table and figure captions in R Markdown

R Markdown is an extremely useful tool for producing reports using R. The problem is that decent quality reports require captions for figures and tables, and it is not straightforward to do. The good news is that it is still quite easy. Pandoc’s Markdown: numbered captions The key to adding captions is that knitr actually converts your .Rmd file to .md file first, and then uses pandoc to conert it to html, pdf or another format. Therefore, everything that works in Pandoc also works in R Markdown. It is worth noting, though, that Pandoc uses its own extended version of Markdown called Pandoc’s Markdown. Among other things it allows captioning your figures and tables. This is done in the following way for the figures ...

November 10, 2016