Glossary based on NFU-D4LS
Send additions to info [at] lcrdm.nl and mention: Term, Description, Source.
Quick jump to a specific letter?
Search on page (ctrl-f) for: -[letter]- (For example: -b-)
Quick jump to a specific letter?
Search on page (ctrl-f) for: -[letter]- (For example: -b-)
Access control - The regulation of who or what is privileged to enter/use an IT service. Permission to access an IT service is called authorization.
Accountability - The obligation to explain and justify conduct.
ACRIS - An Advanced Clinical Research Information System is a complex constellation of capabilities that can assist in the management of patients during clinical trials and rapidly assemble data assets for research questions. It also provides data mining and research process support to meet the needs of clinical and translational research, and related biostatistics and biocomputation. It may includes open-source components. See also: Current Research Information System (CRIS)
Active Directory - Initially, it was a directory service for Windows domain networks included in Windows Server operating systems. Eventually, it became an umbrella term for directory-based identity-related services, which are used to authenticate and authorize users and computers in a Windows domain network.
Analytics to Data - Within the workspace analytics is created or incorporated from an analytics repository (repository might resides outside workspace). The analytics is sent to one or more federated stored data that resides outside the workspace (could be a different workspace) and returns (aggregated and/or pseudo/anonymized) outcomes to the workspace. There is no (direct) human access to the targeted data. NFU Workspaces Architecture: Use Cases Descriptions
Anonymisation - The irreversible delinking of identifying information from associated data. See also: Pseudonymisation or Coding
Application Layer - The Application Layer depicts application services that support the business, and the application components that realize them (logical level).
Archimate - ArchiMate is an open and independent enterprise architecture modeling language to support the description, analysis and visualization of architecture within and across business domains in an unambiguous way. ArchiMate is a technical standard from The Open Group and is based on the concepts of the IEEE 1471 standard. The ArchiMate core language defines a structure of generic elements and their relationships, which can be specialized in different layers. Three layers are defined within the ArchiMate core language as follows:Business layer, Application Layer, Technology layer. See: nl.wikipedia.org/wiki/ArchiMate Cf.: Open Group, IEEE 1471
Architecture - The fundamental organization of a system embodied in its components, their relationships to each other, and to the environment, and the principles guiding its design and evolution. See also: RDM Reference Architecture
Architecture Building Block (ABB) - A constituent of the architecture model that describes a single aspect of the overall model.
ABBs capture (RDM) architecture requirements; e.g. business, data, application, and technology requirements. They direct and guide the development of Solution Building Blocks (SBBs). ABB specifications include the following as a minimum: Fundamental (RDM) functionality and attributes: semantic, unambiguous, including security capability and manageability · Interfaces: chosen set, supplied · Interoperability and relationship with other building blocks · Dependent building blocks with required functionality and named user interfaces · Map to business/organizational entities and policies. Source: The Open Group,
Architecture Building Block (ABB) - See also: Building Block
Assessment of vendor solutions - The process of determining whether the solutions (/products) can be incorporated in the service catalogue and how they are positioned. (The (vendor) solutions building blocks can be assessed by deriving the other viewpoints in the Reference Architecture Framework from the solutions viewpoint. Especially important is here the RDM policy viewpoint of the Reference architecture; to which RDM policy requirements does the solution adhere.) See: Self Assessment of Vendor solutions
Audit - A systematic review to evaluate adherence to applicable laws and policies.
Authentication (AuthN) - Verifying the identity of an user by validating the credentials (e.g. username, password) provided by the user.
(Attribute-based) Authorization (AuthZ) - Determining whether an entity (e.g. user) has permission to access (parts of) IT services. This can be done on the basis of credentials (e.g. username, password) or on the basis of entity attributes (e.g. roles, group memberships).
Base Line - A specification that has been formally reviewed and agreed upon, that thereafter serves as the basis for further development or change and that can be changed only through formal change control procedures or a type of procedure such as configuration management.
Baseline-Target comparisons - Getting insight in the gap between the baseline (IST) and the target (SOLL) architecture.
Big Data - Large and complex datasets typically combining multiple sources of information.
Big Data Analysis - The analysis of Big Data. See also: Big Data
Big Data to Knowledge (BD2K) - A trans-National Institues of Health initiative established to enable biomedical research as a digital research enterprise, to facilitate discovery and support new knowledge, and to maximise community engagement.
Biobank - An organized collection of (human) biological material and associated data which is stored, processed and searchable.
Building Block - Represents a (potentially re-usable) component of business, IT, or architectural capability that can be combined with other building blocks to deliver architectures and solutions. Building blocks can be defined at various levels of detail, depending on what stage of architecture development has been reached. For instance, at an early stage, a building block can simply consist of a name or an outline description. Later on, a building block may be decomposed into multiple supporting building blocks and may be accompanied by a full specification. Building blocks can relate to ‘‘architectures’’ or ‘‘solutions’’. See Open Group.
See also: Architecture Building Block (ABB) and Solution Building Block (SBB)
Business Intelligence (BI) - The collection of data within their trading activity. It can be described as the process of converting data into information, which should then lead to knowledge and encourage appropriate action.
Business Layer - The Business Layer depicts business services offered to customers (i.c. researchers), which are realized in the organization by business processes performed by business actors.
CIA classification - Also: BIV classificatie. Classification of information, systems, or applications regarding confidentiality (vertrouwelijkheid), integrity (integriteit), and availability (beschikbaarheid). Not all types of information and systems have to meet the same standards for security. This classification system helps to determine the desired level of security.
Clinical Trial - A study involving human participants to investigate the efficacy and/or safety of one or more medicines or other health-related interventions.
Clinical Trial Monitors - Sponsored clinical trials have monitors who make sure that the primary data are collected and recorded properly. They meet periodically with research coordinators and review their study records. They ensure that the reporting of adverse events is complete. This very useful auditing function serves to promote Good Clinical Practices and to enhance the compulsive collection of data. It is required by the FDA, which does not like to review incomplete studies. These monitors do not relate to the subjects. Within the framework of Workspace, the Data Cleansing Zone provides the support for the Monitors. See also: Data and Safety Monitoring Boards (DSMBs)
Cloud Computing - A type of Internet-based computing that provides shared computer processing resources and data to computers and other devices on demand. It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources (e.g., computer networks, servers, storage, applications and services), which can be rapidly provisioned and released with minimal management effort. Cloud computing and storage solutions provide users and enterprises with various capabilities to store and process their data in third-party data centers that may be located far from the user–ranging in distance from across a city to across the world. Cloud computing relies on sharing of resources to achieve coherence and economy of scale. Coding or Pseudonymisation See Pseudonymisation or Coding
Cohort - A group of individuals identified by common characteristic(s) (e.g., demographic, exposure, illness) or studied over time using a common protocol.
COmanage - This is a collaboration management platform (funded by NSF and Internet2) that allows collaboration members to share IT services in a secure environment. By using federated identity management services, the authentication and authorization of members can be handled by a single predefined process. For instance, it can automatically set access control for (scientific) IT services available for the collaboration.
Common denominators extraction - Comparisons between SOLL architectures to extract common denominators -working towards (an update of the) RDM Reference Architecture. The reference Framework facilitates common denominator extraction.
Community Cloud - The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises.
Confidentiality - The ethical and legal obligation of an individual or organization to safeguard data or information by controlling access as authorized by law or by the data donor.
Conflict of Interest - One or more connections or interests (personal, social, financial or professional) that influence, or could be perceived to influence, professional integrity and independence.
Consent - Voluntary and informed expression of the will of a person, or if incompetent, his/her legal representative.
Controlled or Restricted Access - Access to data that is subject to conditions and an approval process.
Current Research Information System (CRIS) - A database or other information system to store and manage data about research conducted at an institution. A standard for current research information system is the CERIF (Common European Research Information Format) standard, proposed by the EU and developed and maintained by euroCRIS. See also: ACRIS, euroCRIS Click here for linked document.
Data - A set of values of qualitative or quantitative variables that can be measured, collected and reported, and analyzed.
Data Access Committee - A committee that reviews and authorizes applications for data access and use.
Data Analyst - OECD: This is someone who knows statistics. They may know programming, or they may be expert in Excel. Either way, they can build models based on low-level data. Most importantly, they know which questions to ask of the data. See: Data Roles
Data Asset - A Data Asset, or Digital Asset is an entity that is comprised of data. For example, a database is a data asset that is comprised of data records. A data asset may be an output file from a system / application / computer script / simulation, a database, document, or Web page. Data Assets are typically defined in a Data Management Plan, based on the different stages of a data workflow. The concept of Data Assets comes from Digital Asset Management.
A Data Asset, or Digital Asset is an entity that is comprised of data. For example, a database is a data asset that is comprised of data records. A data asset may be an output file from a system / application / computer script / simulation, a database, document, or Web page.
Data Assets are typically defined in a Data Management Plan (DMP), based on the different stages of a data workflow. The concept of Data Assets comes from Digital Asset Management.
Using the Data Asset concept, makes it suitable for labeling with different attributes like Data Format or File Format, expected size, if it contains Personal Data or Identifiable Data, the storage facility of the research Workspace or Data Lab, it is is suitable for sharing, the suitable facility for archiving this asset, the data collector and data owner, the conditions for re-use, access control, etc.
Having defined the labels for each Data Asset helps managing the data throughout the data lifecycle, form creation till archiving and deletion. The data assets can be selected for deletion or collected in a data set for archiving.
A collection of Data Assets with a description are called a Data Set.
Data Sets are often archived in a Data Archive.
Data Archive - A data archive is a facility which moves data to an environment for long-term retention. A data archive is indexed and has search facilities, enabling data to be retrieved.
Data Audit - Checking institutional practices in collecting, preserving and disseminating datasets, resulting in suggestions for improving current procedures. Data Availability Statement A statement accompanying a research paper that clarifies how and where the accompanying research data is available. For some papers that could be inside the paper itself, other data availability statements contain references to data repositories. Some journals demand a data availability statement with every paper submission.
Data Backup - A copy of data.
Data Breach - The unauthorized collection, access, use, disclosure or release of data.
Data Cleansing Zone - The area for controlled detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data or coarse data. The controlled ability to track the data modification of data(sets) is necessary for data trailing / audits, meeting conditions of certain grants and is key for monitors. The licence agreement can require that appropriate corrections are proposed/made to correct/improve the source data. See also: Digital Research Environment (DRE)
Data Conversion - The transformation of data from one format to another.
Data Curation (research data curation, digital curation) - Data curation is the activity of managing the use of data from its point of creation to ensure it is available for discovery and reuse in the future. Or, the process of selecting, annotating, maintaining, archiving and tracking data.
Data Curation Lifecycle See: Digital Curation Lifecycle.
Data Curation Profile - A document with organized information about the content and context of the creation and use of a dataset.
Data Custodian - A skilled person responsible for the safe custody, transport, storage of the data and implementation of business rules; responsible for the technical environment and database structure. See also: Data Steward
Data Destruction - All necessary steps to ensure that data is no longer stored or able to be used.
Data Diode - A unidirectional network (also referred to as a unidirectional security gateway or data diode) is a network appliance or device allowing data to travel only in one direction, used in guaranteeing information security. They are most commonly found in high security environments where they serve as connections between two or more networks of differing security classifications. A means to directly ingest data pushed from an external source into a file/database in the Data Landing Zone of a workspace in a safe and easy way.
Data Discovery - The ability to discover data in known data files / sets. The discovery can return data usage policies/governance, meta data, samples and is tied into a formal data request procedure if this is applicable. See also: Big Data
Data Donor - The individual whose data have been collected, held, used and shared.
Data Embargo - Data submitted to public repositories but not available for download until a certain time.
Data Engineer - OECD: Operating at a low level close to the data, these are people who write the code that handles data and moves it around. They may have some machine learning skills. See: Data Roles
Data Format or File Format - The way in which data/information is coded and stored. A file format gives information on how to process the data.
Data Interview - The conversation a data consultant has with a researcher to obtain necessary information about a dataset.
Data Lab - A data lab is a virtual research environment which enables researchers to organize and share their research data and related output during their research project.
Data Landing Zone - A place where the data owner places data and where the data cannot be changed. The landing zone contains two parts. A log of who provided when what data and the data + licence agreement.The licence agreement isconditions are inherited from the source and a direct derivative of the governance provided by the research question. See also: Data Cleansing Zone, Data Sharing Zone, Data Zones Click here for linked document
Data Linkage - The process by which records representing the same entity or individual are linked across multiple data sources.
Data Management - 'People often assume that data management finishes and data stewardship starts when the project ends. Regardless of the terminology, creating FAIR data requires attention from the planning phase of a scientific experiment to the life-long maintenance of the data.' (https://www.dtls.nl/fair-data/research-data-management/research-data-management). See: Research Data Management
Data Management Plan (DMP) - A written agreement stating which data will be saved, how they will be saved (file format, version control, metadata), whether data will be submitted to a repository and under which terms.
Data Manager - OECD: A person responsible for the management of data objects including metadata. These people think about managing and preserving data. They are information specialists, archivists, librarians and compliance officers. See: Data Roles
Data Migration - The process of transferring data between storage types, formats, or computer systems. It is a key consideration for any system implementation, upgrade, or consolidation.
Data Portal - A gateway to data.
Data Protections - The set of laws, policies and procedures that aim to minimize intrusion into people’s privacy, uphold confidentiality, and penalize undue intrusions and/or breaches.
Data Repository - A logical (and sometimes physical) partitioning of data where multiple databases which apply to specific applications or sets of applications reside.
Data Roles - OECD distinguishes: Data analyst, Data engineer, Data manager, Data scientist, Research software engineer (RSE), Research support professionals
Data analyst - This is someone who knows statistics. They may know programming, or they may be expert in Excel. Either way, they can build models based on low-level data. Most importantly, they know which questions to ask of the data.
Data engineer - Operating at a low level close to the data, these are people who write the code that handles data and moves it around. They may have some machine learning skills
Data manager - A person responsible for the management of data objects including metadata. These people think about managing and preserving data. They are information specialists, archivists, librarians and compliance officers.
Data scientist - A practitioner of data science. It is a generic term that encompasses many fields of specialised expertise. In the current report, data analysts, data engineers, data stewards and research software engineers are considered as sub-groups of data scientists. In certain contexts, data scientist is also sometimes used in a more limited ways that make it equivalent to either the data analyst or software engineer roles.
Research software engineer (RSE) - A growing number of people in academia combine expertise in programming with an intricate understanding of research. These RSEs may start off as researchers who spend time developing software to progress their research or they may come from a more conventional software-development background and are drawn to research by the challenge of using software to further research.
Research support professionals - In the context of digitalisation, these are the people who support scientific researchers conducting data-intensive science. They are not necessarily part of a research team and might be considered as service providers. This is a broad category that can include data stewards, research software engineers, data managers, data engineers, librarians and archivists.
Data and Safety Monitoring Boards (DSMBs) - A system for the appropriate oversight and monitoring of the conduct of clinical trials to ensure the safety of participants and the validity and integrity of the data for conducted(multi-site) clinical trials.The data and safety monitoring functions and oversight of such activities are distinct from the requirement for study review and approval by an Institutional Review Board (IRB). See also: Clinical Trial Monitors
Data Scientist - OECD: A practitioner of data science. It is a generic term that encompasses many fields of specialised expertise. In the current [OECD] report, data analysts, data engineers, data stewards and research software engineers are considered as sub-groups of data scientists. In certain contexts, data scientist is also sometimes used in a more limited ways that make it equivalent to either the data analyst or software engineer roles. See: Data Roles
Data Seal of Approval - A certification for repositories that are committed to archiving and providing access to scholarly research data in a sustainable way.
Data Security - The protection of the confidentiality, availability and integrity of data.
Data Sharing - Extending access to data for the purpose of research or analyses.
Data Sharing Policy - An institutional policy concerning the sharing of research data. It is often written as a letter of intent declaring that research data will be submitted to dedicated repositories as soon as possible, complying to international data and exchange formats.
Data Sharing Zone - A place where a workspace user can put data to be shared; this includes archiving of the study. The data in the Data sharing zone cannot be modified, only be deleted by a workspace user. The Data sharing zone contains three parts: Meta, log and Data & Conditions. The difference with a data landing zone, is who is in control of the data: data landing zone = supplier/owner of the data; data sharing zone = workspace user. See also: Data Cleansing Zone, Data Landing Zone, Data Zones Click here for linked document
Data Steward - A person responsible for keeping the quality, integrity, and access arrangements of data and metadata in a manner that is consistent with applicable law, institutional policy, and individual permissions. Data stewardship implies professional and careful treatment of data throughout all stages of a research process. A data steward aims at guaranteeing that data is appropriately treated at all stages of the research cycle (i.e. design, collection, processing, analysis, preservation, data sharing and reuse).
Responsible planning and executing of all actions on digital data before, during and after a research project, with the aim of optimizing the usability, reusability and reproducibility of the resulting data.
Data (or Material) Transfer Agreement - A binding legal agreement between the provider and the recipient of data (or materials) that sets forth conditions of transfer, use and disclosure.
Data Zones - A workspace has different data zones to ensure audit trail of the data and ease of use. See also: Data Cleansing Zone, Data Landing Zone, Data Sharing Zone
Database - Data and information that are managed and stored in a systematic way to enable data analyses.
Dataset - A collection of data which may be a subset in a database.
Dataset Merger - User friendly tooling to merge different data sets even when the datasets originates from different sources. Click here for linked document
De-identification - The removal or alteration of any data that identifies an individual or could, foreseeably, identify an individual in the future. Digital curation See: Data curation
Digital Competence Centre (DCC) - In Digital Competence Centers (DCCs) researchers find support and technological tools in the field of research data, research software, and open and FAIR data. Local DCCs within knowledge institutions are the first point of contact for researchers. Mutual exchange of knowledge and technology can be stimulated through a secure and federated network and the creation of thematic DCCs. NWO wants to stimulate the development of DCCs. Source: NWO report: Integrated approach to digitization in science, November 2019.
Digital Curation Lifecycle - Digital curation and data preservation are ongoing processes, requiring considerable thought and the investment of adequate time and resources. You must be aware of, and undertake, actions to promote curation and preservation throughout the data lifecycle.
(Digital) Identity - This is information on an entity (object) used by computer systems to represent it. This object may be a person, organisation, application, or device. Commonly, it is defined as a set of attributes related to an object.
Digital object - An entity in which one or more content files and their corresponding metadata are united.
Digital Object Identifier (DOI) - A type of persistent identifier used to uniquely identify objects. The DOI system is particularly used for electronic documents such as journal articles. The DOI system began in 2000 and is managed by the International DOI Foundation. See also: Persistent Identifier (PID)
Digital Research Environment (DRE) - A environment where a researcher has access to and work with all his relevant data, analytics and tooling. This environment is secure, self-serviced, is capable of real-time collaboration, provides data and process audit trails, and is auditable compliant with all the rules & regulations. Where the DRE centers around the data user, the RDP centers around the data supplier/owner. See also: Analytics to Data, Data Cleansing Zone, Data Diode, Data Landing Zone, Data Sharing Zone, Dataset Merger, Research Data Architecture, Research Data Platform (RDP), Workspace, Virtual Research Environment (VRE)
Directory service - This service is a shared information infrastructure for administering network resources (objects) like volumes, folders, files, users, groups, devices, etc. Information about an object is stored as attributes. For instance, directory services enables the sharing of information about users and IT services throughout the network, which can be used in access control.
Disclosure - The revelation of confidential information about an individual.
Disclosure Risk - The possibility of confidential information being revealed about an individual. See also: Risk
DRE Configurator - To create and maintain workspaces. From a template OS, type and amount of storage, memory and compute. Maintaining who has access to the workspace and in what role.
DRE Showcase - A project that clearly demonstrates the functionality in a for the target group meaningful way. • Supporting the science community using a federal approach • Integral approach of research and care/application • Integral data management in a single cohesive, compliant digital research environment • ‘Research Environment as a Service’ • Supporting standards and best practices • Self-service and unburdening (compliance on the background) • Flexible and scalable (also small studies with small budgets). See also: Showcase Approach, Showcase
Dutch shared RDM architectural principles - RDM architectural principles and guidelines which are agreed upon on a national level by dutch academic (research) institutions.
Eenheid van RDM taal - Let op, dit begrip betreft niet de structuur van de data zelf (Nictiz, registratie ad/ bron, FAIR...). Met eenheid van taal wordt hier enkel communicatie afspraken tussen RDM stakeholders (o.a. managers, onderzoekers) en IT architecten bedoeld.
De communicatie tussen besluitvorming (opdrachtgever) en realisatie (IT afdeling) is niet optimaal gezien vanuit de optiek van een informatie manager (de brug tussen business en IT). Het is voor een opdrachtgever (of gemandateerde VSNU werkgroep) bijzonder lastig om aan de hand van (Togaf / Archimate) architectuur ontwerpen besluiten te nemen. Standaard visualisaties die voortkomen uit het informatie management proces kunnen helpen, maar daarmee is het probleem nog niet opgelost. De onderliggende principes (o.a. het institutioneel beleid ) zijn niet zo gemakkelijk te verbinden met het IT architectuurontwerp
Hoe komen we dichter bij een oplossing?
Communicatie afspraken over visualisaties maken op nationaal niveau: Omdat harmonisatie nu eenmaal nodig is bij de realisatie van regionale en nationale netwerken is een gemeenschappelijke visuele werkwijze nodig. Bij voorkeur met een onafhankelijke open methode die goed aansluit bij de in gebruik zijnde methodieken, aanvullend is en waarbij de nadruk ligt op visualisaties die bruikbaar zijn voor met name besluitvormers (de business).
Middel daarbij kan zijn de landelijke adoptie een open methodiek die visueel ondersteunend is voor de business bij het ontwerp en besluitvorming over de Nationaal Open Science Research Infrastructuur.
Daarnaast zijn er werkafspraken tussen architecten (instituten) nodig die als doel hebben een Babylonische spraakverwarring met regionale en landelijke architectuur werkgroepen te voorkomen.
Concrete verbetervoorstellen voor de korte termijn (fase 1):
- Applicatielandschappen worden op basis van de HORA referentie architectuur gemaakt, na vertaling in een instituut overstijgende domeinarchitectuur (bv universiteit Leiden RDM).
- Genoemde applicatiecomponenten (met functies) zijn eenduidig terug te vinden in de HORA en of LCRDM catalogus.
- IT architecten maken werkafspraken die als doel hebben lokale architecturen met elkaar te kunnen vergelijken c.q. te koppelen. Deze afspraken zijn voor iedereen terug te vinden in de LCRDM glossary onder het begrip Enterprise Architectuur Nationaal Research Netwerk .
- Er wordt door LCRDM een moderator benoemd, immers innovatie is per definitie aan discussie onderhevig, die op de wiki pagina van het betreffende begrip gevoerd kan worden.
Verder uit werken door het landelijk coördinatiepunt (LCRDM, fase 2)
Standaard visualisaties van IT architectuurontwerpen moeten inzicht kunnen geven in de vragen die opdrachtgevers vanuit het institutioneel beleid hierover hebben (architectuur principes en standaarden). Adoptie van de open standaard (OSI recognized) Dragon1 (open EA methode) lijkt hiervoor inzetbaar. Een start hiermee wordt gemaakt door UL/ISCC / LUMC.
Work in progress (update 28-11-2018)
- Voorstel LCRDM Pitch : visualisaties ter ondersteuning van strategische management beslissingen over het nationaal Research Data Management netwerk. Wie wil participeren, gaarne aanmelden bij Ingeborg Verheul (emailadres)
- in een infrastructurele werkgroep (universiteit Leiden / LUMC) is de wens uitgesproken dat bij de aanvang van het 2018-2019 RDM UL programma korte termijn eenheid van RDM taal spelregels beschikbaar moeten zijn. Meer informatie bij de architect Joachim Rijsdam (UL/ISCC) en informatie manager Erik Flikkenschild (LUMC).
Dragon1 is an open Method, Framework and Modeling Language for Visual Enterprise Architecture and Visual Project Management:
eduGain - International federation that interconnects participating national federations such as SURFconext across the world. It allows users to access IT services using their one trusted identity.
Encryption - A mechanism of safeguarding stored data or information by making that data or information unreadable without access to the correct decryption method and key.
Enhanced Publication - An enhanced publication is a structured representation of research output in which research assets (dataset, article, underlying material) are gathered and linked.
Entity type - Concerning reference architecture: Structure of top level concepts within the RDM reference architecture Framework.
Enumerations - Concerning reference architecture: Lists of instances for each entity type (whether policies, processes, functions, solutions) and their definitions.
Ethical Guidelines - A framework to guide decision-making based on accepted ethical principles and practice.
Ethics review committee (IRB, REC, REB) - An independent committee for the ethical review of research activities.
euroCRIS - A European organization responsible for publicising work on current research information systems (CRIS). It maintains the CERIF standard for CRIS systems. See also: ACRIS, Current Research Information System / CRIS, See also: euroCRIS: CERIF.
FAIR - Findable, Accessible, Interoperable, Reusable: deze principes vergemakkelijken het vinden van data sets en het integreren ervan (als het kan automatisch) en dat volgens duidelijke licenties. Zie LCRDM handreikingen voor een 'FAIR checklist'.
Feasibility analysis - Assessment of the feasibility of (new) proposed RDM policy.
Federated - On the basis of trust between institutions, accommodated by federation/ community policy.
Federated Search - The simultaneous search of multiple searchable resources. A user makes a single query request which is distributed to the search engines, databases or other query engines participating in the federation. The federated search then aggregates the results that are received from the search engines for presentation to the user.
FORCE11 - A community of scholars, librarians, archivists, publishers and research funders that has arisen organically to help facilitate the change toward improved knowledge creation and sharing. Individually and collectively, with the aim to bring about a change in modern scholarly communications through the effective use of information technology.
Gap analysis - Getting insight in the situations where in the data life cycle RDM policy or shared values are not met or are insufficiently facilitated by IT. Also ‘gap analysis’ in the form of an audit trail to detect intolerable violations of RDM policy.
Governance - The process of policy making and management that guides and oversees research in a consistent and structured manner.
Group Provider - The source of group information. For example within an institute this could be an identity store (e.g. Active Directory) and across institutes this could be provided by SURFconext Teams or COmanage which specifies the memberships of a group. A service provider can make use of this group information for authorization.
Harmonization - The process of unifying certain policies, methodologies and approaches in order to achieve interoperability.
High Level Principles - Architectuurprincipes zijn richtinggevende uitspraken over wat wenselijk is. Ze kunnen worden gezien als een beleidsuitspraak die specifiek betrekking heeft op de inrichting van organisatie, processen en informatievoorziening. Ze verwoorden wat belangrijk is bij deze inrichting en zijn relatief stabiel.
HORA 2.0 - HORA 2.0, de nieuwe release van HORA per 1 november 2018. In deze versie zijn onder meer nieuwe architectuurconcepten opgenomen (zoals de applicatiefunctie) en zijn meer details opgenomen.
De HORA bestaat uit drie delen:
- Deel 1 – Architectuurvisie geeft een perspectief op de toekomst door een vertaling te maken van relevante ontwikkelingen en ambities die zijn beschreven in de i-Strategie. Het maakt concreter wat de impact hiervan is op de inrichting van de informatievoorziening van instellingen en gebruikt daarbij (onderdelen van) de referentiemodellen. Het beschrijft een aantal leidende principes en besteedt aandacht aan een aantal specifieke veranderthema’s.
- Deel 2 – Referentiemodellen biedt een verzameling generieke en relatief stabiele modellen die vooral vanuit business- en informatieperspectief beschrijven wat een hoger onderwijsinstelling doet en heeft. Het creëert een gemeenschappelijke taal die de communicatie kan verbeteren, zowel binnen de sector als binnen een instelling. De toepassingsmogelijkheden zijn heel breed.
- Deel 3 – Implementatiehulpmiddelen biedt ondersteuning bij de implementatie van de referentie-architectuur. Het beschrijft ondermeer hoe de architectuurfunctie kan worden ingericht en hoe de modellen in de HORA kunnen worden gebruikt voor gegevensbeheer en applicatie-integratie.
HORA 2.0 bedrijfsfunctiemodel - Een bedrijfsfunctiemodel is een model van de bedrijfsfuncties van een organisatie. Het beschrijft wat een organisatie doet onafhankelijk van hoe het wordt uitgevoerd. Het kijkt naar een organisatie als een verzameling van activiteiten die worden uitgevoerd en clustert deze tot logische eenheden die soortgelijke kennis en competenties vragen.
Hybrid cloud - The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load balancing between clouds).
Infrastructure as a Service (IaaS) - The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications, and possibly limited control of select networking components (e.g., host firewalls).
Identifiable Data or Personal Data - The data that alone or in combination with other data may reasonably be expected to identify an individual.
Identity and Access Management - A set of processes and supporting technologies that enable the creation, maintenance, use, and revocation of digital identity.
Identity Management (IdM) - This is also known as identity and access management (IAM). It addresses access to the 'right IT services' at the 'right times' and for 'the right reasons' by the 'right entities' across variable technology environments and meeting compliance requirements.
Identity Provider (IdP) - An institute that is able to obtain an identity from a store (e.g. Active Directory), authenticates this identity, and then passes the trusted identity to the servive provider in an agreed-upon way, typically through SAML or OpenID Connect.
Impact analysis - Impact analysis on (new) proposed RDM policy.
Incidental Findings - A finding discovered in the course of clinical care or research concerning an individual that is beyond the aims of the clinical care or research.
Individual Research Results - A finding discovered in the course of research concerning an individual that relates to the aims of the research.
Individually Identifiable - Private Information or specimens to be individually identifiable as defined at 45 CFR 46.102(f) when they can be linked to specific individuals by the investigator(s) either directly or indirectly through coding systems. Conversely, OHRP considers private information or specimens not to be individually identifiable when they cannot be linked to specific individuals by the investigator(s) either directly or indirectly through coding systems. See also: Private Information Click here for linked document
Ingestions - Load and map external data and metadata into Data Landing Zone of the workspace.
Interoperability - The ability of data or tools from non-cooperating resources to integrate or work together with minimal effort.
Investigator - Anyone involved in conducting a proposed research project.
iRODS - + Open Source Data Management Software. For data virtualization, data discovery (adding metadata), workflow automation, and secure collaboration.
iRods - IT service - Infrastructure- , application- or data service.
JDDCP - Joint Declaration of Data Citation Principles, a set of guiding principles for citation of data within scholarly literature, another dataset, or any other research object.
LCRDM RDM architecture toolkit - De LCRDM werkgroep Faciliteiten en Datainfrastructuur heeft een aantal principes van het wetenschappelijk onderzoek naar boven gehaald, die bepalend en/of de basis zijn voor de structuur van de referentiearchitectuur voor wetenschappelijk onderzoek:
Legacy Data or Biospecimens - Data (or biospecimens) previously collected for research or for clinical care, where new proposed uses may not be covered.
Level of Assurance (LoA) - Classification of identity assurance from little or no confidence in the asserted identity to very high confidence in the asserted identity.
Lightweight Directory Access Protocol (LDAP) - An open, standard protocol for accessing and maintaining distributed directory services over an Internet Protocol (IP) network. In other words this protocol allows IT services to look up information from a server.
Linked Data - A term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information and knowledge on the Semantic Web using RDF. See also: Resource Description Framework (RDF).
Medical Privacy - The practice of keeping information about a patient confidential. This involves both conversational discretion on the part of health care providers, and the security of medical records. The terms can also refer to the physical privacy of patients from other patients and providers while in a medical facility. See also: Privacy
Medical Record or Health Record - A paper or electronic record created in the health care system which contains medical and health-related information about an individual and is used to record and support health care for that individual.
Non-web-based (native) IT service - Any IT service (e.g. program) that needs to be installed on the user's device (e.g. computer) with their own user interface and the processing is done within the memory of this device.
OAuth - A protocol that obtains and uses tokens to access web-based and non-web-based IT services.
Ontology - Formal naming and definition of the (RDM) entity types, properties, and interrelationships of the entities that really or fundamentally exist for a particular domain of discourse.
Open Data Access - Making data available without restriction.
OpenID Connect - A protocol to authenticate users, built on top of the OAuth protocol for authorization.
Opt-in - A consent mechanism where an active choice must be made every time to participate.
Opt-out - A consent mechanism where consent is implied unless an active choice is made not to participate.
Platform as a Service (PaaS) - The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment.
Persistent Identifier (PID) - A long-lasting reference to a document, file, web page, or other object (real or abstract). See also: Digital Object Identifier (DOI)
Personal Data or Identifiable Data - Data that alone or in combination with other data may reasonably be expected to identify an individual.
Policy Transitional View - Transitions in RDM policy along the datalifecycle.
Pluggable Authentication Module (PAM) - A service for handling authentication. PAM services can be integrated into non-web based (native) IT services.
Privacy - The ability of an individual or group to seclude themselves, or information about themselves, and thereby express themselves selectively. See also: Medical Privacy
Private Cloud - The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises
Private Information - This includes information about behavior that occurs in a context in which an individual can reasonably expect that no observation or recording is taking place, and information that has been provided for specific purposes by an individual and that the individual can reasonably expect will not be made public (for example, a medical record). Private information must be individually identifiable (i.e., the identity of the subject is or may readily be ascertained by the investigator or associated with the information) in order for obtaining the information to constitute research involving human subjects. See also: Individually identifiable
Proxy - A server that acts as a broker, which is located between the internet and local area network(s). If an user wants to access a server provider, the user is redirected by the proxy to the identity provider for authentication, and after authentication the identity together with attributes will be sent to the service provider for authorization.
Pseudonymisation or Coding - The act of replacing an identifier with a code for the purpose of avoiding direct identification of the participant. See also: Anonymisation
Public Cloud - The cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider and is a form of providing public cloud services and a Cloud Service Providers business model.
Public Domain - The body of knowledge and innovation in relation to which no person or other legal entity can establish or maintain proprietary interests.
Public Engagement - An inclusive act ranging from the active involvement of a population or subpopulation in the development, management or governance of a project, to the provision of information and raising awareness of a project.
Quality - Conformity of data, biospecimens or processes with pre-established specifications appropriate to the purpose to which the data, biospecimens or processes will be put.
Reference Architecture - Concerns a generic inter-institutional architecture for RDM within academic research. The RDM Reference Architecture consists of a set common denominators (viewpoints and relations between them, enumerations) distracted from SOLL architectures which are developed for certain (groups of) disciplines or for institutions. The architecture can be applied to various research collaborations across institutes.
Reference Architecture Framework - The Framework consists of a set of viewpoints and enumerations types for each architectural layer relevant for Research Data Management (RDM). The Framework conforms to the Archimate structure and language.
Re-Identification - The process by which anonymized personal data is matched with its true owner. In order to protect the privacy interests of consumers, personal identifiers, such as name and social security number, are often removed from databases containing sensitive information.
Registered Access - A system of authentication and self-declaration prior to providing access to data.
Research - Doing research is an activity in which systematic and structured statements - based on gathering information and making observations - can be made about reality.
Research data - The recorded factual material commonly accepted in the scientific community as necessary to validate research findings.
Research Data Architecture - The architecture that spans the lifecycle of data for Research. See also: Digital Research Environment / DRE, Research Data Platform / RDP
Research Data management (RDM) - Research data management is an integral part of the research process, which concerns the way you collect, analyze, store, share, archive and publish research data, to satisfy the needs of current and future data users. Source: LCRDM
Research Data Platform (RDP) - A platform that centrally pulls in data from different sources and provides data access to data scientists / researchers via a portal. This platform is secure, auditable, and compliant with rules and regulations. Where the DRE centers around the data user, the RDP centers around the data supplier/owner. See also: Digital Research Environment / DRE, Research Data Architecture
Research Data Zone - A Research Data Zone is a collection of interconnected and networked facilities, tools and services, allowing researchers to conduct scientific work on and with large (and small) volumes of data. One of the most basic parts of an RDZ is a network that facilitates high-bandwidth traffic and research data to both campus and various national and international partners and collaborators, without being encumbered by bottlenecks or impediments such as undersized or ill performing firewalls or badly engineered network links.
Research Playground - The area where the analytical and data crunching part of research takes place.
Research Software Engineer (RSE) - OECD: A growing number of people in academia combine expertise in programming with an intricate understanding of research. These RSEs may start off as researchers who spend time developing software to progress their research or they may come from a more conventional software-development background and are drawn to research by the challenge of using software to further research. See: Data Roles
Research Support Professionals - In the context of digitalisation, these are the people who support scientific researchers conducting data-intensive science. They are not necessarily part of a research team and might be considered as service providers. This is a broad category that can include data stewards, research software engineers, data managers, data engineers, librarians and archivists. See: Data Roles
Resolver - A resolver is a system which brings about the link between a Persistent Identifier (PID) and the location where the object is currently situated.
Resource Description Framework (RDF) - A globally-accepted framework for data and knowledge representation that is intended to be read and interpreted by machines. See also: Linked Data Restricted or Controlled Access See Controlled or Restricted Access
Return of Results - Communication of research results to an individual or a designated health care provider or family member.
Re-use - Data reuse saves time and accelerates the pace of scientific discovery. By making your data open and available to others, you make it possible for future researchers to answer questions that haven’t yet been asked. Thinking about data reuse in advance and documenting it, saves you time by helping you plan research processes and workflow early in the research project. Finally, this documentation makes it easier for you to defend your research... remember back to second grade when your teacher told you to “show your work”. see: https://mozillascience.github.io/working-open-workshop/data_reuse/
Risk - The effect of uncertainty on objectives.
Risk profile - Determinants, chance and impact of the risk RDM policy is exceeded.
Rule based storage management - The management of storage resources based on(system or user defined) rules.
Rules - Rules are definitions of actions that are to be performed by an actor (e.g. a server). These actions are defined in terms of microservices and other actions.
Software as a Service (SaaS) - The capability provided to the consumer is to use the provider’s applications running on a cloud infrastructure.11 The applications are accessible from various client devices through either a thin client interface, such as a web browser (e.g., web-based email) or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited userspecific application configuration settings.
Safe Haven - A repository in which data are stored and accessed in ways that maintain their integrity and quality whilst meeting relevant ethical and legal controls on their use and dissemination.
Science DMZ - A Science DMZ (Data Management Zone) is a framework, model, and best practice to set up a high-performing, scalable and secure portion of a network to facilitate data-intensive science applications and does not include support for general-purpose networking (such as web surfing, email, etc.)—this traffic belongs behind the enterprise firewall and on the general-purpose network. By separating the high-performance science network from the general-purpose network, each network can respectively be optimized without interfering with the other.
While the core mission of a Science DMZ is the support of high-performance science applications, this cannot occur in isolation. The Science DMZ can easily incorporate wide area science support services, including virtual circuits and software defined networking at very high speeds without compromising security policies.
Science DMZ - Secondary Uses - Use of data or biospecimens in a way that differs from the original purpose for which they were generated or collected.
Secure Shell (SSH) - A network protocol for operating non-web based IT services securely over the internet. SSH uses public-keys to authenticate the remote computer and if necessary the user. SSH verifies whether the user offering the public key associated with the user’s identity also owns the matching private key.
Security Assertion Markup Language (SAML) - An XML-based protocol for exchanging authentication and authorization data between identity providers and service providers in order to access web-based IT services.
Self Assessment of Vendor solutions - Assessment of vendor solutions by vendors itself to determine the value for research of their products. (One can even imagine this self-assessment will be part of the application process a vendor has to go through in order to have their product included in the service catalogue.)
Service Catalogue - A catalogue for data management facilities for researchers. It aims to help researchers to make a reasoned choice when planning for the management and the storage of their data. Additionally, the information that is accumulated to fill the catalogue should help to identify potential gaps or other shortcomings within the facilities which have been described.
Service Provider (SP) - The owner of an IT service.
Showcase - A project that clearly demonstrates the functionality in a for the target group meaningful way. See also: Showcase Approach, DRE Showcase
Showcase Approach - An approach where users are an integral part of the team working towards delivering the functionality of the showcase. It is not necessary that the problem is fully understood or defined at start, the focus is on maximizing the team’s ability to deliver quickly, to respond to emerging requirements and to adapt to evolving technologies and changes in community. See also: DRE Showcase, Showcase
Social ID authentication - Using a social media accounts such as Facebook or LinkedIn for authentication.
SOLL architecture - From the German 'soll' versus 'ist': 'as should be' versus 'as is'. The architecture aimed for (versus the current architecture, IST architecture).
Solution Building Block - (SBB) + A candidate solution which conforms to the specification of an Architecture Building Block (ABB).
Solution Building Block - See also: Building Block
Storage management - The term storage management encompasses the technologies and processes organizations use to maximize or improve the performance of their data storage resources. It is a broad category that includes virtualization, replication, mirroring, security, compression, traffic analysis, process automation, storage provisioning and related techniques. See also: Rule based storage management
Student Information System (SIS) - This is an information system to manage student/teacher data within educational/scientific institutes. Is also a main source of group information.
Supervisory Authority - The public authority (or authorities) in a given jurisdiction responsible for monitoring the application of law and administrative measures adopted pursuant to data privacy, data protection and data security.
SURFconext - National federation of identity providers (institutions) and a web-based IT service integrator.
SURFconext Teams - A group management tool to create and manage groups. Services connected to SURFconext can use this group information, e.g. to decide who is allowed access to the IT service and to decide certain rights within a IT service.
Surveillance - The systematic collection, monitoring and dissemination of health data to assist in planning, implementation and evaluation of an action or intervention such as research or public health.
Target Architecture - The description of a future state of the architecture being developed for an organization. There may be several future states developed as a roadmap to show the evolution of the architecture to a target state.
Technology layer - The Technology Layer depicts: - technology (/ infrastructure) services such as processing, storage, and communication services needed to run the applications and the technology components that realize them - the SBB's, i.c. computer and communication hardware and system software that realize those services. Physical elements are added for modeling physical equipment, materials, and distribution networks to this layer.
Traceability - The ability to verify the history, location, or application of an item, by means of documented recorded identification. For a biospecimen this pertains to any step of its handling, including donation, collection, processing, testing, storage, and disposition.
Trusted Third Party (TTP) - An entity which facilitates interactions between two parties who both trust the third party.
Two-factor authentication - In addition to the login with an user account, users complete a second authentication procedure by means of an SMS text message, USB-based key or app.
Type of Service - As used in the Infosheet Services Catalogue. A particular group of services that share similar characteristics and form a smaller division of the larger set of services.
User story - Description of the what an actor ('and consequently also user of RDM faciltities') wants in executing RDM throughout the research data life cycle.
Virtual Research Environment - Concept, synonym for Digital Research Environment (DRE) , and Research Workspace. DRE is also a product name, refering tot the implementation in Nijmegen (Radboud). VRE is used in a generic way but is also the name of the implementation in the university of Groningen. The term VRE is used within HORA 2.0 and thereby the preferred term.
View - A view is a representation of a whole system from the perspective of a related set of concerns. This representation consists one or more structural aspects of an architecture that illustrates how the architecture addresses one or more concerns held by one or more of its stakeholders. Views are interrelated and together they describe the whole system. Views are a means to make the architecture manageable and comprehensible by a range of business and technical stakeholders.
Virtual Organization Orthogonal Technology (VOOT) - A protocol that obtains group related information from group providers to pass on to IT services when needed for authorization. Identity details should already be obtained by for instance SAML or OpenID Connect.
Web-based IT service - Any IT service (e.g. program) that is accessed over a network connection using HTTP. They often run inside a web browser. However, they may also be client-based: the user interface is installed on the user’s device (e.g. computer) and the processing is done over the internet on an external server.
Workspace - A self-serviced secure collaborative environment to receive, process and store and provide access to data and analytics that is compliant with the organization's and/or consortium policies and relevant rules and regulations. See also: Digital Research Environment / DRE.