Sometimes, it is practically or legally not possible to move corpus data to a local machine. This vignette explains the usage of CWB corpora that are hosted on an OpenCPU server.
## polmineR is throttled to use 2 cores as required by CRAN Repository Policy. To get full performance:
## * Use `n_cores <- parallel::detectCores()` to detect the number of cores available on your machine
## * Set number of cores using `options('polmineR.cores' = n_cores - 1)` and `data.table::setDTthreads(n_cores - 1)`
##
## Attaching package: 'polmineR'
## The following object is masked from 'package:base':
##
## use
The GermaParl corpus is hosted on an OpenCPU server with the IP
132.252.238.66 (subject to change). To use the corpus, use the
corpus()
-method. The only difference is that you will need
to supply the IP address using the argument server
.
The gparl
object is an object of class
remote_corpus
.
The polmineR at this stage exposes a limited set of its functionality for remote corpora. Simple investigations in the remote corpus are possible.
The returned object has the class remote_subcorpus
.
The count()
-method works for
remote_subcorpus
objects, too.
Create directory for registry file-style files with credentials
Create file with credentials for your corpus in this directory
Note: Filename is corpus id in lowercase
##
## registry entry for corpus GERMAPARLSAMPLE
##
# long descriptive name for the corpus
NAME "GermaParlSample"
# corpus ID (must be lowercase in registry!)
ID germaparlsample
# path to binary data files
HOME http://localhost:8005
# optional info file (displayed by ",info;" command in CQP)
INFO https://zenodo.org/record/3823245#.XsrU-8ZCT_Q
# corpus properties provide additional information about the corpus:
##:: user = "YOUR_USER_NAME"
##:: password = "YOUR_PASSWORD"
Set environment variable “OPENCPU_REGISTRY” in .Renviron to dir just mentioned
Get server whereabouts
Upcoming versions of polmineR will expose further functionality. This is a simple proof-of-concept!