AI Create Spreadsheet

AI Create Spreadsheet — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Anomaly detection

    Anomaly detection

    In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal behavior. Such examples may arouse suspicions of being generated by a different mechanism, or appear inconsistent with the remainder of that set of data. Anomaly detection finds application in many domains including cybersecurity, medicine, machine vision, statistics, neuroscience, law enforcement and financial fraud to name only a few. Anomalies were initially searched for clear rejection or omission from the data to aid statistical analysis, for example to compute the mean or standard deviation. They were also removed to better predictions from models such as linear regression, and more recently their removal aids the performance of machine learning algorithms. However, in many applications anomalies themselves are of interest and are the observations most desirous in the entire data set, which need to be identified and separated from noise or irrelevant outliers. Three broad categories of anomaly detection techniques exist. Supervised anomaly detection techniques require a data set that has been labeled as "normal" and "abnormal" and involves training a classifier. However, this approach is rarely used in anomaly detection due to the general unavailability of labelled data and the inherent unbalanced nature of the classes. Semi-supervised anomaly detection techniques assume that some portion of the data is labelled. This may be any combination of the normal or anomalous data, but more often than not, the techniques construct a model representing normal behavior from a given normal training data set, and then test the likelihood of a test instance to be generated by the model. Unsupervised anomaly detection techniques assume the data is unlabelled and are by far the most commonly used due to their wider and relevant application. == Definition == Many attempts have been made in the statistical and computer science communities to define an anomaly. The most prevalent ones include the following, and can be categorised into three groups: those that are ambiguous, those that are specific to a method with pre-defined thresholds usually chosen empirically, and those that are formally defined: === Ill defined === An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism. Anomalies are instances or collections of data that occur very rarely in the data set and whose features differ significantly from most of the data. An outlier is an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data. An anomaly is a point or collection of points that is relatively distant from other points in multi-dimensional space of features. Anomalies are patterns in data that do not conform to a well-defined notion of normal behaviour. === Specific === Let T be observations from a univariate Gaussian distribution and O a point from T. Then the z-score for O is greater than a pre-selected threshold if and only if O is an outlier. == History == === Intrusion detection === The concept of intrusion detection, a critical component of anomaly detection, has evolved significantly over time. Initially, it was a manual process where system administrators would monitor for unusual activities, such as a vacationing user's account being accessed or unexpected printer activity. This approach was not scalable and was soon superseded by the analysis of audit logs and system logs for signs of malicious behavior. By the late 1970s and early 1980s, the analysis of these logs was primarily used retrospectively to investigate incidents, as the volume of data made it impractical for real-time monitoring. The affordability of digital storage eventually led to audit logs being analyzed online, with specialized programs being developed to sift through the data. These programs, however, were typically run during off-peak hours due to their computational intensity. The 1990s brought the advent of real-time intrusion detection systems capable of analyzing audit data as it was generated, allowing for immediate detection of and response to attacks. This marked a significant shift towards proactive intrusion detection. As the field has continued to develop, the focus has shifted to creating solutions that can be efficiently implemented across large and complex network environments, adapting to the ever-growing variety of security threats and the dynamic nature of modern computing infrastructures. == Applications == Anomaly detection is applicable in a very large number and variety of domains, and is an important subarea of unsupervised machine learning. As such it has applications in cyber-security, intrusion detection, fraud detection, fault detection, system health monitoring, event detection in sensor networks, detecting ecosystem disturbances, defect detection in images using machine vision, medical diagnosis and law enforcement. === Intrusion detection === Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986. Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with soft computing, and inductive learning. Types of features proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations. The counterpart of anomaly detection in intrusion detection is misuse detection. === Fintech fraud detection === Anomaly detection is vital in fintech for fraud prevention. === Preprocessing === Preprocessing data to remove anomalies can be an important step in data analysis, and is done for a number of reasons. Statistics such as the mean and standard deviation are more accurate after the removal of anomalies, and the visualisation of data can also be improved. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy. === Video surveillance === Anomaly detection has become increasingly vital in video surveillance to enhance security and safety. With the advent of deep learning technologies, methods using Convolutional Neural Networks (CNNs) and Simple Recurrent Units (SRUs) have shown significant promise in identifying unusual activities or behaviors in video data. These models can process and analyze extensive video feeds in real-time, recognizing patterns that deviate from the norm, which may indicate potential security threats or safety violations. An important aspect for video surveillance is the development of scalable real-time frameworks. Such pipelines are required for processing multiple video streams with low computational resources. === IT infrastructure === In IT infrastructure management, anomaly detection is crucial for ensuring the smooth operation and reliability of services. These are complex systems, composed of many interactive elements and large data quantities, requiring methods to process and reduce this data into a human and machine interpretable format. Techniques like the IT Infrastructure Library (ITIL) and monitoring frameworks are employed to track and manage system performance and user experience. Detected anomalies can help identify and pre-empt potential performance degradations or system failures, thus maintaining productivity and business process effectiveness. === IoT systems === Anomaly detection is critical for the security and efficiency of Internet of Things (IoT) systems. It helps in identifying system failures and security breaches in complex networks of IoT devices. The methods must manage real-time data, diverse device types, and scale effectively. Garg et al. have introduced a multi-stage anomaly detection framework that improves upon traditional methods by incorporating spatial clustering, density-based clustering, and locality-sensitive hashing. This tailored approach is designed to better handle the vast and varied nature of IoT data, thereby enhancing security and operational reliability in smart infrastructure and industrial IoT systems. === Petroleum industry === Anomaly detection is crucial in the petroleum industry for monitoring critical machinery. A 2015 paper proposed a novel segmentation algorithm using support vector machines to analyze sensor data for real-time anomaly detection. === Oil and gas pipeline monitoring === In the oil and gas sector, anomaly detection is not just crucial for maintenance and safety, but also for environmental protection. Aljameel et al. propose an advanced machine learning-based model for detecting minor leaks in oil and gas pipelines, a task traditional methods may miss.

    Read more →
  • Species distribution modelling

    Species distribution modelling

    Species distribution modelling (SDM), also known as environmental (or ecological) niche modelling (ENM), habitat suitability modelling, predictive habitat distribution modelling, and range mapping uses ecological models to predict the distribution of a species across geographic space and time using environmental data. The environmental data are most often climate data (e.g. temperature, precipitation), but can include other variables such as soil type, water depth, and land cover. SDMs are used in several research areas in conservation biology, ecology and evolution. These models can be used to understand how environmental conditions influence the occurrence or abundance of a species, and for predictive purposes (ecological forecasting). Predictions from an SDM may be of a species' future distribution under climate change, a species' past distribution in order to assess evolutionary relationships, or the potential future distribution of an invasive species. Predictions of current and/or future habitat suitability can be useful for management applications (e.g. reintroduction or translocation of vulnerable species, reserve placement in anticipation of climate change). There are two main types of SDMs. Correlative SDMs, also known as climate envelope models, bioclimatic models, or resource selection function models, model the observed distribution of a species as a function of environmental conditions. Mechanistic SDMs, also known as process-based models or biophysical models, use independently derived information about a species' physiology to develop a model of the environmental conditions under which the species can exist. The extent to which such modelled data reflect real-world species distributions will depend on a number of factors, including the nature, complexity, and accuracy of the models used and the quality of the available environmental data layers; the availability of sufficient and reliable species distribution data as model input; and the influence of various factors such as barriers to dispersal, geologic history, or biotic interactions, that increase the difference between the realized niche and the fundamental niche. Environmental niche modelling may be considered a part of the discipline of biodiversity informatics. == History == A. F. W. Schimper used geographical and environmental factors to explain plant distributions in his 1898 Pflanzengeographie auf physiologischer Grundlage (Plant Geography Upon a Physiological Basis) and his 1908 work of the same name. Andrew Murray used the environment to explain the distribution of mammals in his 1866 The Geographical Distribution of Mammals. Robert Whittaker's work with plants and Robert MacArthur's work with birds strongly established the role the environment plays in species distributions. Elgene O. Box constructed environmental envelope models to predict the range of tree species. His computer simulations were among the earliest uses of species distribution modelling. The adoption of more sophisticated generalised linear models (GLMs) made it possible to create more sophisticated and realistic species distribution models. The expansion of remote sensing and the development of GIS-based environmental modelling increase the amount of environmental information available for model-building and made it easier to use. == Correlative vs mechanistic models == === Correlative SDMs === SDMs originated as correlative models. Correlative SDMs model the observed distribution of a species as a function of geographically referenced climatic predictor variables using multiple regression approaches. Given a set of geographically referred observed presences of a species and a set of climate maps, a model defines the most likely environmental ranges within which a species lives. Correlative SDMs assume that species are at equilibrium with their environment and that the relevant environmental variables have been adequately sampled. The models allow for interpolation between a limited number of species occurrences. For these models to be effective, it is required to gather observations not only of species presences, but also of absences, that is, where the species does not live. Records of species absences are typically not as common as records of presences, thus often "random background" or "pseudo-absence" data are used to fit these models. If there are incomplete records of species occurrences, pseudo-absences can introduce bias. Since correlative SDMs are models of a species' observed distribution, they are models of the realized niche (the environments where a species is found), as opposed to the fundamental niche (the environments where a species can be found, or where the abiotic environment is appropriate for the survival). For a given species, the realized and fundamental niches might be the same, but if a species is geographically confined due to dispersal limitation or species interactions, the realized niche will be smaller than the fundamental niche. Correlative SDMs are easier and faster to implement than mechanistic SDMs, and can make ready use of available data. Since they are correlative however, they do not provide much information about causal mechanisms and are not good for extrapolation. They will also be inaccurate if the observed species range is not at equilibrium (e.g. if a species has been recently introduced and is actively expanding its range). In standard SDMs, the distribution of a single species is often modeled, with unique parameters describing how environmental (abiotic) factors influence its occurrence probability. This allows for differentiated responses to environmental drivers among species, but can be problematic for data-deficient species. In contrast, similarities in environmental responses can be accounted for in multi-species SDMs, which model several species jointly using shared or hierarchically related parameters. However, neither approach explicitly accounts for community-level biotic interactions, which can be important in explaining species diversity patterns. Joint species distribution models (joint SDMs or J-SDMs) address this by modeling species co-occurrence patterns directly. The occurrence probability of a given species is thus influenced not only by abiotic drivers but also by inferred biotic associations with other species. This can improve accuracy for rarer taxa and provide insights into community ecology. Both standard SDMs and J-SDMs can be used to generate community-level metrics, such as species richness, by aggregating outputs across multiple species. These can be important for decision-making such as conservation planning. === Mechanistic SDMs === Mechanistic SDMs are more recently developed. In contrast to correlative models, mechanistic SDMs use physiological information about a species (taken from controlled field or laboratory studies) to determine the range of environmental conditions within which the species can persist. These models aim to directly characterize the fundamental niche, and to project it onto the landscape. A simple model may simply identify threshold values outside of which a species can't survive. A more complex model may consist of several sub-models, e.g. micro-climate conditions given macro-climate conditions, body temperature given micro-climate conditions, fitness or other biological rates (e.g. survival, fecundity) given body temperature (thermal performance curves), resource or energy requirements, and population dynamics. Geographically referenced environmental data are used as model inputs. Because the species distribution predictions are independent of the species' known range, these models are especially useful for species whose range is actively shifting and not at equilibrium, such as invasive species. Mechanistic SDMs incorporate causal mechanisms and are better for extrapolation and non-equilibrium situations. However, they are more labor-intensive to create than correlational models and require the collection and validation of a lot of physiological data, which may not be readily available. The models require many assumptions and parameter estimates, and they can become very complicated. Dispersal, biotic interactions, and evolutionary processes present challenges, as they aren't usually incorporated into either correlative or mechanistic models. Correlational and mechanistic models can be used in combination to gain additional insights. For example, a mechanistic model could be used to identify areas that are clearly outside the species' fundamental niche, and these areas can be marked as absences or excluded from analysis. See for a comparison between mechanistic and correlative models. == Niche models (correlative) == There are a variety of mathematical methods that can be used for fitting, selecting, and evaluating correlative SDMs. Models include "profile" methods, which are simple statistical techniques that use e.g. environmental distance to known sites of occurrence such as

    Read more →
  • Small data

    Small data

    Small data is data that is 'small' enough for human comprehension. It is data in a volume and format that makes it accessible, informative and actionable. The term "big data" is about machines and "small data" is about people. This is to say that eyewitness observations or five pieces of related data could be small data. Small data is what we used to think of as data. The only way to comprehend Big data is to reduce the data into small, visually-appealing objects representing various aspects of large data sets (such as histogram, charts, and scatter plots). Big Data is all about finding correlations, but Small Data is all about finding the causation, the reason why. A formal definition of small data has been proposed by Allen Bonde, former vice-president of Innovation at Actuate - now part of OpenText: "Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged – often visually – to be accessible, understandable, and actionable for everyday tasks." Another definition of small data is: The small set of specific attributes produced by the Internet of Things. These are typically a small set of sensor data such as temperature, wind speed, vibration and status. It was estimated (2016) that “If one takes the top 100 biggest innovations of our time, perhaps around 60% to 65% percent are really based on Small Data.” as Martin Lindstrom puts it. Small data includes everything from Snapchat to simple objects such as the post-it note. Lindstrom believes we become so focused on Big-Data that we tend to forget about more basic concepts and creativity. Lindstrom defines Small Data "as seemingly insignificant observations you identify in consumers’ homes, is everything from how you place your shoes on how you hang your paintings". He thus considers that one should perfectly master the basic (Small Data) in order to mine and find correlations. == Academic Recognition and Methodology == The growing significance of "small data" as a distinct field of inquiry was highlighted by the 2024 Thematic Einstein Semester (TES) on Small Data Analysis, hosted by the Berlin Mathematics Research Center MATH+. A central focus of this semester was the transition from theoretical analysis to practical decision-making. Because small data sets are primarily used to drive specific actions, the presentation of results becomes an essential methodological step. The semester’s findings emphasized that while small data may lack volume, it often contains a high density of "many possible interpretations." Consequently, the final conference of the TES was structured around the pillars of interpretation, explanation, and knowledge gain. Participants sought to develop new mathematical and methodical representations that could accurately depict this wealth of interpretative possibilities. This work underscores that analyzing small data is not purely a computational task; it requires a robust interface between mathematics and diverse disciplines to ensure that insights are both contextually grounded and scientifically rigorous. == Uses in business == === Marketing === Bonde has written about the topic for Forbes, Direct Marketing News, CMO.com and other publications. According to Martin Lindstrom, in his book, Small Data: "{In customer research, small data is} Seemingly insignificant behavioural observations containing very specific attributes pointing towards an unmet customer need. Small data is the foundation for breakthrough ideas or completely new ways to turnaround brands." His approach is based on the combination of the observation of small samples with intuition. Marketers can obtain market insights from gathering Small Data by engaging with and observing people in their own environments. In comparison to Big Data, Small Data has the power to trigger emotions and to provide insights into the reasons behind the behaviours of customers. It may uncover detailed information on a person's extroversion or introversion, self-confidence, whether one is having problems in his/her relationship, etc. According to Lindstrom, relationships among people and customer segments are organized around four criteria: Climate: It reveals for example how a person's environment affects their diet. Rulership: The power or government in charge Religion: The prevalence of religion in a country, depending on its influence, indicates whether a person's decision making process is impacted by their belief system. Tradition: Cultural norms influence people's behaviors and interactions. Many companies underestimate the power of Small Data, using samples of millions of consumers instead of recognizing the value of closely observing small samples in their market research. In his book, Lindstrom defines "7Cs", which companies should consider in the attempt to derive meaningful customer insights and market trends through small data from their customers: Collecting: Understanding the manner in which observations are translated inside a home. Clues: Uncovering other distinctive emotional reflections that can be observed. Connecting: Identifying the consequences of emotional behaviour. Causation: Understanding what emotions are being evoked. Correlation: Identifying the initial date of appearance of the behaviour or emotion. Compensation: Identifying the unmet or unfulfilled desire. Concept: Defining the “big idea” compensation for the identified consumer need. Some of Lindstrom's clients such as Lowes Foods looked at data in a different way and actually chose to live with the customer. “As you enter their store, they have now created an amazing community where every staff member acts in a character mood, based on Small Data”. The supermarket made everything it can to make the customer feel at home. All the behaviours of employees are inspired by customer feedbacks gathered from interviews directly done at customer’s home. === Healthcare === Researchers at Cornell University started developing applications to monitor health problems in patients, based on small data. This is an initiative of Cornell's Small Data Lab, in close cooperation with Weill Cornell Medicine College, led by Deborah Estrin. The Small Data Lab developed a series of apps, focusing not only on gathering data from patients' pain but also tracking habits in areas such as grocery shopping. In the case of patients with rheumatoid arthritis for example, which has flares and remissions that do not follow a particular cycle, the app gathers information passively, thus allowing to forecast when a flare might be coming up based on small changes in behaviour. Other apps developed also include monitoring online grocery shopping, to use this information from every user to adapt their groceries to the recommendations of nutritionists, or monitoring email language to identify patterns that might indicate "fluctuations in cognitive performance, fatigue, side effects of medication or poor sleep, and other conditions and treatments that are typically self-reported and self-medicated". === Postal Service === The United States Postal Service (USPS) used optical character recognition (OCR) to automatically read and process 98% of all hand-addressed mail and 99.5% of machine-printed mail. By combining this technology with its small data sample of US zip codes, the USPS can now process more than 36,000 pieces of mail per hour. === Aerospace === In 2015, Boeing established the analytics lab for aerospace data in cooperation with the Carnegie Mellon University to leverage the university's leadership in machine learning, language technologies and data analytics. One of the initiatives projects aims to by standardize maintenance logs using AI to dramatically reduce costs. Currently, there is no standardized procedure to document maintenance logs leading to small but highly unstructured data sets. As a result, it becomes highly difficult for maintenance workers to translate these variations in maintenance logs within a short period of time. However, with AI and a narrow data set of common aircraft maintenance terminology, it becomes possible to dynamically translate these logs in real time. By using AI to enhance the speed and accuracy of the airline maintenance workflow, airlines stand to save billions according to the Harvard Business Review.

    Read more →
  • Principles for a Data Economy

    Principles for a Data Economy

    The Principles for a Data Economy – Data Rights and Transactions is a transatlantic legal project carried out jointly by the American Law Institute (ALI) and the European Law Institute (ELI). The Principles for a Data Economy deals with a range of different legal questions that arise in the data economy. Since data is different from other tradeable items, the Principles draw up legal rules for data transactions and data rights that take into account the interests of different stakeholders involved in the data economy. The Principles are designed to facilitate contractual relations as well as the drafting of model agreements and can guide courts and legislators worldwide. The project proposes a set of principles that can be implemented in any legal system and is designed to work in conjunction with any kind of data privacy/data protection law, intellectual property law or trade secret law. The Principles do not address or seek to change any of the substantive rules of these bodies of law. The Project Team consists of Neil B Cohen and Christiane Wendehorst (as Project Reporters) and Lord John Thomas as well as Steven O. Weise (as Project Chairs). == Characteristics of data == The law governing trades in commerce has historically focused on trade in items that are tangible like goods or on intangible assets, such as shares or licenses. However, data does not fit into any of these traditional categories, nor does it qualify as a service. It is often unclear how traditional legal rules and doctrines can apply to data, as data is different from other assets in many ways. For example, data can be multiplied at basically no cost and can be used in parallel for a variety of different purposes by many different people at the same time (data is a “non-rivalrous” resource). Uncertainty regarding the applicable rules to govern the data economy may inhibit innovation and growth and trouble stakeholders like data-driven industries, start-ups, and consumers. == Stakeholders in the data economy == The Principles have taken the basic types of players and relations which can be found in data ecosystems as a starting point to provide guidance in different situations. The central actors in the data economy are data controllers (also called “data holders”). They are in a position to access the data and decide for which purposes and means this data should be processed. A controller may exercise control all by itself or share it with co-controllers, such as under a data pooling arrangement. Data processors provide the processing of data on a controller’s behalf as a service. Another important group of stakeholders includes those that contribute to the generation of data (e.g. data subjects). Other players in the data economy include data assemblers or data intermediaries (e.g. data trusts). == History of the project and timeline == Before the official adoption of the project by ALI and ELI bodies in 2018, the project team carried out a Feasibility Study from October 2016 to February 2018. In the following years, the project team produced a number of drafts (e.g. “Preliminary Drafts” No. 1 to 4, “Tentative Draft No. 1”) and project progress were regularly discussed with advisory bodies and members of both the ALI and the ELI. The project reporters also included feedback and insights from industry stakeholders and experts that was gained after several meetings and workshops, hosted, inter alia by UNCITRAL, UNIDROIT and several national governmental institutions. Tentative Draft No. 2 was presented at the ALI Annual Meeting in May 2021 and approved by ALI membership. The latest draft ("Final Council Draft") was also approved by the ELI Council and ELI Membership. The Principles for a Data Economy were presented at an international conference with representatives from institutions such as the Uniform Law Commission (ULC), the European Commission, UNIDROIT, the OECD, the International Chamber of Commerce (ICC) and the World Economic Forum (WEF) in October 2021. == Project structure == The current draft (“Tentative Draft No. 2”) of the Principles consists of five Parts that each governs different aspects of the data economy: General Provisions, Data Contracts, Data Rights, Third Party Aspects of Data Activities, and Multi-State Issues. === General Provisions === Part I includes general provisions that apply to all other Parts of the Principles for a Data Economy. This Part sets out the purpose of the Principles: they aim to make existing law in the field of the data economy more coherent and support the development of the law in this field by courts and legislators worldwide. It is also clarified that the Principles have a wide scope of application and can be used in a variety of ways by stakeholders in the data economy. The Principles may, for example, serve private parties as a basis for contract formation, guide the deliberations of arbitral tribunals or inspire national legislation. Part I then defines several key terms, such as ‘digital data’ and ‘data right’. The scope of the Principles is limited to matters where information is recorded as an asset, resource or tradeable commodity and where large amounts of data, rather than single pieces of information, are concerned. This Part also clarifies that remedies with respect to data contracts and data rights are left to the applicable national law. === Data Contracts === Part II lists different types of contracts that often occur in the data economy and establishes two broad categories, namely contracts for the supply and sharing of data and contracts for services with regard to data. Contracts for the supply and sharing of data include, e.g. data transfer contracts or data pooling arrangements, while contracts for services with regard to data cover contracts for the processing of data or data intermediary contracts. The Principles provide default terms for each contract type, on issues such as the manner in which data should supply or which characteristics the data supplied should meet. These default terms 'automatically' become part of the contract unless the parties agree otherwise. === Data Rights === Part III governs legally protected interests of players in the data economy that stem from the characteristics of data as a resource (e.g. its non-rivalrous nature) or from public interest considerations. Such data rights may include the right to data access, the right to require the controller to desist from data activities or to correct incorrect/incomplete data, or even to receive an economic share in profits derived from the use of data. For example, the Principles deal with data rights of stakeholders that had a share in the co-generation of data and identify different factors to be considered in determining whether to afford a party a data right. The underlying idea that parties who have contributed to the generation of data should have some rights in the utilization of the data is also recognized by governmental institutions, such as by the Japanese Ministry of Economy, Trade and Industry (METI), and the term co-generated data, which was coined by the Principles for a Data Economy, has been adopted, inter alia by the European Commission, the German Data Ethics Commission and the Global Partnership on Artificial Intelligence (GPAI). This Part also deals with data rights for the public interest, such as data sharing rights in the field of innovation. === Third Party Aspects === Part IV governs different situations in which data transactions interfere with the rights of third parties. Such rights include intellectual property rights or rights derived from data privacy or data protection law. This Part sets out under which circumstances data activities should be considered wrongful vis à vis another party. For example, a data activity (like data processing or the onward supply of data) could be considered wrongful, if a controller interferes with the rights of data subjects that are protected by data-protection law. A data activity could also be wrongful if the controller is non-compliant with contractual limitations on data activities, enforceable by the protected party (e.g. a controller may only process data for a certain purpose). If someone obtained access to data by unauthorized means (i.e. data “theft”) this could also be considered wrongful. The Part on Third-Party Aspects also takes a detailed look at the effects of the onward supply of data can have on third parties, while balancing the protection of third parties on the one hand, with the interests of data recipients and the desire to encourage data sharing on the other. === Multi-State Issues === As transactions in the data economy are international by nature and hardly occur within one legal system alone, the Part V of the Principles also briefly touches upon the applicability of the rules and doctrines of private international law to such transactions. == Links == Website of the “Principles for a Data Economy – Data Rights and Transaction

    Read more →
  • AI browser

    AI browser

    An AI browser is a web browser with integrated artificial intelligence capabilities, such as automatically summarizing web page content or answering questions about it. A more specialized type is an agentic browser, based on the concept of agentic AI, which can take actions – such as navigating webpages or filling out forms – on behalf of the user. Several agentic browsers emerged in 2025, including ChatGPT Atlas (macOS only), Comet, and Dia. As of 2025, this is a recent development in the browser market, including new entrants from OpenAI, Opera and Perplexity. The designation of 'AI browser' also includes established browsers that later added non-agentic AI features, such as Microsoft Edge with the Copilot chatbot, Google Chrome with the Gemini chatbot (for Windows desktop users in the US with their language set to English), and Firefox with multiple chatbot providers (such as ChatGPT, Claude, Copilot, Gemini, and Le Chat). AI browsers have been noted to be susceptible to prompt injection attacks. == Browser extensions and integrations == Rather than creating entirely new browsers, some AI browsing solutions integrate with existing browsers through extensions or companion applications. These tools add agentic capabilities to established browsers without requiring users to switch platforms. Examples include Composite, which functions as a cross-browser agent that works with Chrome, Edge, and other browsers to automate web-based tasks for workers. == Cloud-based implementations == Cloud-based implementations of AI browsers allow users to run automated browsing agents without local installation. These systems operate on remote servers using frameworks such as Puppeteer or Playwright. Examples include Browserbase, Browser-use and AI Browser. The AI typically parses the Document Object Model (DOM) to locate and interact with page elements, and may also analyze browser screenshots to interpret layout and structure. == Criticisms and dangers == AI browsers have been noted to be susceptible to being vulnerable to prompt injection attacks, in which the content of websites can be used to hijack the control of the browser. Multiple organisations have argued against using AI browsers due to this vulnerability. The United Kingdom national cyber security centre and Gartner consider them to be too risky for adoption by most organisations. A study by the CISPA Helmholtz Center and Saarland University concluded that this vulnerability makes them easy targets for malware, fraud, automated defamation, disinformation and biased outputs.

    Read more →
  • DONE

    DONE

    The Data-based Online Nonlinear Extremumseeker (DONE) algorithm is a black-box optimization algorithm. DONE models the unknown cost function and attempts to find an optimum of the underlying function. The DONE algorithm is suitable for optimizing costly and noisy functions and does not require derivatives. An advantage of DONE over similar algorithms, such as Bayesian optimization, is that the computational cost per iteration is independent of the number of function evaluations. == Methods == The DONE algorithm was first proposed by Hans Verstraete and Sander Wahls in 2015. The algorithm fits a surrogate model based on random Fourier features and then uses a well-known L-BFGS algorithm to find an optimum of the surrogate model. == Applications == DONE was first demonstrated for maximizing the signal in optical coherence tomography measurements, but has since then been applied to various other applications. For example, it was used to help extending the field of view in light sheet fluorescence microscopy.

    Read more →
  • Data (word)

    Data (word)

    The word data is most often used as a singular collective mass noun in educated everyday usage. However, due to the history and etymology of the word, considerable controversy has existed on whether it should be considered a mass noun used with verbs conjugated in the singular, or should be treated as the plural of the now-rarely-used datum. == Usage in English == In one sense, data is the plural form of datum. Datum actually can also be a count noun with the plural datums (see usage in datum article) that can be used with cardinal numbers (e.g., "80 datums"); data (originally a Latin plural) is not used like a normal count noun with cardinal numbers and can be plural with plural determiners such as these and many, or it can be used as a mass noun with a verb in the singular form. Even when a very small quantity of data is referenced (one number, for example), the phrase piece of data is often used, as opposed to datum. The debate over appropriate usage continues, but "data" as a singular form is far more common. In English, the word datum is still used in the general sense of "an item given". In cartography, geography, nuclear magnetic resonance and technical drawing, it is often used to refer to a single specific reference datum from which distances to all other data are measured. Any measurement or result is a datum, though data point is now far more common. Data is indeed most often used as a singular mass noun in educated everyday usage. Some major newspapers, such as The New York Times, use it either in the singular or plural. In The New York Times, the phrases "the survey data are still being analyzed" and "the first year for which data is available" have appeared within one day. The Wall Street Journal explicitly allows this usage in its style guide. The Associated Press style guide classifies data as a collective noun that takes the singular when treated as a unit but the plural when referring to individual items (e.g., "The data is sound" and "The data have been carefully collected"). In scientific writing, data is often treated as a plural, as in These data do not support the conclusions, but the word is also used as a singular mass entity like information (e.g., in computing and related disciplines). British usage now widely accepts treating data as singular in standard English, including everyday newspaper usage at least in non-scientific use. UK scientific publishing still prefers treating it as a plural. Some UK university style guides recommend using data for both singular and plural use, and others recommend treating it only as a singular in connection with computers. The IEEE Computer Society allows usage of data as either a mass noun or plural based on author preference, while IEEE in the editorial style manual indicates to always use the plural form. Some professional organizations and style guides require that authors treat data as a plural noun. For example, the Air Force Flight Test Center once stated that the word data is always plural, never singular.

    Read more →
  • Ontology alignment

    Ontology alignment

    Ontology alignment, or ontology matching, is the process of determining correspondences between concepts in ontologies. A set of correspondences is also called an alignment. The phrase takes on a slightly different meaning, in computer science, cognitive science or philosophy. == Computer science == For computer scientists, concepts are expressed as labels for data. Historically, the need for ontology alignment arose out of the need to integrate heterogeneous databases, ones developed independently and thus each having their own data vocabulary. In the Semantic Web context involving many actors providing their own ontologies, ontology matching has taken a critical place for helping heterogeneous resources to interoperate. Ontology alignment tools find classes of data that are semantically equivalent, for example, "truck" and "lorry". The classes are not necessarily logically identical. According to Euzenat and Shvaiko (2007), there are three major dimensions for similarity: syntactic, external, and semantic. Coincidentally, they roughly correspond to the dimensions identified by Cognitive Scientists below. A number of tools and frameworks have been developed for aligning ontologies, some with inspiration from Cognitive Science and some independently. Ontology alignment tools have generally been developed to operate on database schemas, XML schemas, taxonomies, formal languages, entity-relationship models, dictionaries, and other label frameworks. They are usually converted to a graph representation before being matched. Since the emergence of the Semantic Web, such graphs can be represented in the Resource Description Framework line of languages by triples of the form , as illustrated in the Notation 3 syntax. In this context, aligning ontologies is sometimes referred to as "ontology matching". The problem of Ontology Alignment has been tackled recently by trying to compute matching first and mapping (based on the matching) in an automatic fashion. Systems like DSSim, X-SOM or COMA++ obtained at the moment very high precision and recall. The Ontology Alignment Evaluation Initiative aims to evaluate, compare and improve the different approaches. === Formal definition === Given two ontologies i = ⟨ C i , R i , I i , T i , V i ⟩ {\displaystyle i=\langle C_{i},R_{i},I_{i},T_{i},V_{i}\rangle } and j = ⟨ C j , R j , I j , T j , V j ⟩ {\displaystyle j=\langle C_{j},R_{j},I_{j},T_{j},V_{j}\rangle } where C {\displaystyle C} is the set of classes, R {\displaystyle R} is the set of relations, I {\displaystyle I} is the set of individuals, T {\displaystyle T} is the set of data types, and V {\displaystyle V} is the set of values, we can define different types of (inter-ontology) relationships. Such relationships will be called, all together, alignments and can be categorized among different dimensions: similarity vs logic: this is the difference between matchings (predicating about the similarity of ontology terms), and mappings (logical axioms, typically expressing logical equivalence or inclusion among ontology terms) atomic vs complex: whether the alignments we considered are one-to-one, or can involve more terms in a query-like formulation (e.g., LAV/GAV mapping) homogeneous vs heterogeneous: do the alignments predicate on terms of the same type (e.g., classes are related only to classes, individuals to individuals, etc.) or we allow heterogeneity in the relationship? type of alignment: the semantics associated to an alignment. It can be subsumption, equivalence, disjointness, part-of or any user-specified relationship. Subsumption, atomic, homogeneous alignments are the building blocks to obtain richer alignments, and have a well defined semantics in every Description Logic. Let's now introduce more formally ontology matching and mapping. An atomic homogeneous matching is an alignment that carries a similarity degree s ∈ [ 0 , 1 ] {\displaystyle s\in [0,1]} , describing the similarity of two terms of the input ontologies i {\displaystyle i} and j {\displaystyle j} . Matching can be either computed, by means of heuristic algorithms, or inferred from other matchings. Formally we can say that, a matching is a quadruple m = ⟨ i d , t i , t j , s ⟩ {\displaystyle m=\langle id,t_{i},t_{j},s\rangle } , where t i {\displaystyle t_{i}} and t j {\displaystyle t_{j}} are homogeneous ontology terms, s {\displaystyle s} is the similarity degree of m {\displaystyle m} . A (subsumption, homogeneous, atomic) mapping is defined as a pair μ = ⟨ t i , t j ⟩ {\displaystyle \mu =\langle t_{i},t_{j}\rangle } , where t i {\displaystyle t_{i}} and t j {\displaystyle t_{j}} are homogeneous ontology terms. == Cognitive science == For cognitive scientists interested in ontology alignment, the "concepts" are nodes in a semantic network that reside in brains as "conceptual systems." The focal question is: if everyone has unique experiences and thus different semantic networks, then how can we ever understand each other? This question has been addressed by a model called ABSURDIST (Aligning Between Systems Using Relations Derived Inside Systems for Translation). Three major dimensions have been identified for similarity as equations for "internal similarity, external similarity, and mutual inhibition." == Ontology alignment methods == Two sub research fields have emerged in ontology mapping, namely monolingual ontology mapping and cross-lingual ontology mapping. The former refers to the mapping of ontologies in the same natural language, whereas the latter refers to "the process of establishing relationships among ontological resources from two or more independent ontologies where each ontology is labelled in a different natural language". Existing matching methods in monolingual ontology mapping are discussed in Euzenat and Shvaiko (2007). Approaches to cross-lingual ontology mapping are presented in Fu et al. (2011).

    Read more →
  • Latent semantic analysis

    Latent semantic analysis

    Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text (the distributional hypothesis). A matrix containing word counts per document (rows represent unique words and columns represent each document) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents. An information retrieval technique using latent semantic structure was patented in 1988 by Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum and Lynn Streeter. In the context of its application to information retrieval, it is sometimes called latent semantic indexing (LSI). == Overview == === Occurrence matrix === LSA can use a document-term matrix which describes the occurrences of terms in documents; it is a sparse matrix whose rows correspond to terms and whose columns correspond to documents. A typical example of the weighting of the elements of the matrix is tf-idf (term frequency–inverse document frequency): the weight of an element of the matrix is proportional to the number of times the terms appear in each document, where rare terms are upweighted to reflect their relative importance. This matrix is also common to standard semantic models, though it is not necessarily explicitly expressed as a matrix, since the mathematical properties of matrices are not always used. === Rank lowering === After the construction of the occurrence matrix, LSA finds a low-rank approximation to the term-document matrix. There could be various reasons for these approximations: The original term-document matrix is presumed too large for the computing resources; in this case, the approximated low rank matrix is interpreted as an approximation (a "least and necessary evil"). The original term-document matrix is presumed noisy: for example, anecdotal instances of terms are to be eliminated. From this point of view, the approximated matrix is interpreted as a de-noisified matrix (a better matrix than the original). The original term-document matrix is presumed overly sparse relative to the "true" term-document matrix. That is, the original matrix lists only the words actually in each document, whereas we might be interested in all words related to each document—generally a much larger set due to synonymy. The consequence of the rank lowering is that some dimensions are combined and depend on more than one term: {(car), (truck), (flower)} → {(1.3452 car + 0.2828 truck), (flower)} This mitigates the problem of identifying synonymy, as the rank lowering is expected to merge the dimensions associated with terms that have similar meanings. It also partially mitigates the problem with polysemy, since components of polysemous words that point in the "right" direction are added to the components of words that share a similar meaning. Conversely, components that point in other directions tend to either simply cancel out, or, at worst, to be smaller than components in the directions corresponding to the intended sense. === Derivation === Let X {\displaystyle X} be a matrix where element ( i , j ) {\displaystyle (i,j)} describes the occurrence of term i {\displaystyle i} in document j {\displaystyle j} (this can be, for example, the frequency). X {\displaystyle X} will look like this: d j ↓ t i T → [ x 1 , 1 … x 1 , j … x 1 , n ⋮ ⋱ ⋮ ⋱ ⋮ x i , 1 … x i , j … x i , n ⋮ ⋱ ⋮ ⋱ ⋮ x m , 1 … x m , j … x m , n ] {\displaystyle {\begin{matrix}&{\textbf {d}}_{j}\\&\downarrow \\{\textbf {t}}_{i}^{T}\rightarrow &{\begin{bmatrix}x_{1,1}&\dots &x_{1,j}&\dots &x_{1,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{m,1}&\dots &x_{m,j}&\dots &x_{m,n}\\\end{bmatrix}}\end{matrix}}} Now a row in this matrix will be a vector corresponding to a term, giving its relation to each document: t i T = [ x i , 1 … x i , j … x i , n ] {\displaystyle {\textbf {t}}_{i}^{T}={\begin{bmatrix}x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\end{bmatrix}}} Likewise, a column in this matrix will be a vector corresponding to a document, giving its relation to each term: d j = [ x 1 , j ⋮ x i , j ⋮ x m , j ] {\displaystyle {\textbf {d}}_{j}={\begin{bmatrix}x_{1,j}\\\vdots \\x_{i,j}\\\vdots \\x_{m,j}\\\end{bmatrix}}} Now the dot product t i T t p {\displaystyle {\textbf {t}}_{i}^{T}{\textbf {t}}_{p}} between two term vectors gives the correlation between the terms over the set of documents. The matrix product X X T {\displaystyle XX^{T}} contains all these dot products. Element ( i , p ) {\displaystyle (i,p)} (which is equal to element ( p , i ) {\displaystyle (p,i)} ) contains the dot product t i T t p {\displaystyle {\textbf {t}}_{i}^{T}{\textbf {t}}_{p}} ( = t p T t i {\displaystyle ={\textbf {t}}_{p}^{T}{\textbf {t}}_{i}} ). Likewise, the matrix X T X {\displaystyle X^{T}X} contains the dot products between all the document vectors, giving their correlation over the terms: d j T d q = d q T d j {\displaystyle {\textbf {d}}_{j}^{T}{\textbf {d}}_{q}={\textbf {d}}_{q}^{T}{\textbf {d}}_{j}} . Now, from the theory of linear algebra, there exists a decomposition of X {\displaystyle X} such that U {\displaystyle U} and V {\displaystyle V} are orthogonal matrices and Σ {\displaystyle \Sigma } is a diagonal matrix. This is called a singular value decomposition (SVD): X = U Σ V T {\displaystyle {\begin{matrix}X=U\Sigma V^{T}\end{matrix}}} The matrix products giving us the term and document correlations then become X X T = ( U Σ V T ) ( U Σ V T ) T = ( U Σ V T ) ( V T T Σ T U T ) = U Σ V T V Σ T U T = U Σ Σ T U T X T X = ( U Σ V T ) T ( U Σ V T ) = ( V T T Σ T U T ) ( U Σ V T ) = V Σ T U T U Σ V T = V Σ T Σ V T {\displaystyle {\begin{matrix}XX^{T}&=&(U\Sigma V^{T})(U\Sigma V^{T})^{T}=(U\Sigma V^{T})(V^{T^{T}}\Sigma ^{T}U^{T})=U\Sigma V^{T}V\Sigma ^{T}U^{T}=U\Sigma \Sigma ^{T}U^{T}\\X^{T}X&=&(U\Sigma V^{T})^{T}(U\Sigma V^{T})=(V^{T^{T}}\Sigma ^{T}U^{T})(U\Sigma V^{T})=V\Sigma ^{T}U^{T}U\Sigma V^{T}=V\Sigma ^{T}\Sigma V^{T}\end{matrix}}} Since Σ Σ T {\displaystyle \Sigma \Sigma ^{T}} and Σ T Σ {\displaystyle \Sigma ^{T}\Sigma } are diagonal we see that U {\displaystyle U} must contain the eigenvectors of X X T {\displaystyle XX^{T}} , while V {\displaystyle V} must be the eigenvectors of X T X {\displaystyle X^{T}X} . Both products have the same non-zero eigenvalues, given by the non-zero entries of Σ Σ T {\displaystyle \Sigma \Sigma ^{T}} , or equally, by the non-zero entries of Σ T Σ {\displaystyle \Sigma ^{T}\Sigma } . Now the decomposition looks like this: X U Σ V T ( d j ) ( d ^ j ) ↓ ↓ ( t i T ) → [ x 1 , 1 … x 1 , j … x 1 , n ⋮ ⋱ ⋮ ⋱ ⋮ x i , 1 … x i , j … x i , n ⋮ ⋱ ⋮ ⋱ ⋮ x m , 1 … x m , j … x m , n ] = ( t ^ i T ) → [ [ u 1 ] … [ u l ] ] ⋅ [ σ 1 … 0 ⋮ ⋱ ⋮ 0 … σ l ] ⋅ [ [ v 1 ] ⋮ [ v l ] ] {\displaystyle {\begin{matrix}&X&&&U&&\Sigma &&V^{T}\\&({\textbf {d}}_{j})&&&&&&&({\hat {\textbf {d}}}_{j})\\&\downarrow &&&&&&&\downarrow \\({\textbf {t}}_{i}^{T})\rightarrow &{\begin{bmatrix}x_{1,1}&\dots &x_{1,j}&\dots &x_{1,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{m,1}&\dots &x_{m,j}&\dots &x_{m,n}\\\end{bmatrix}}&=&({\hat {\textbf {t}}}_{i}^{T})\rightarrow &{\begin{bmatrix}{\begin{bmatrix}\,\\\,\\{\textbf {u}}_{1}\\\,\\\,\end{bmatrix}}\dots {\begin{bmatrix}\,\\\,\\{\textbf {u}}_{l}\\\,\\\,\end{bmatrix}}\end{bmatrix}}&\cdot &{\begin{bmatrix}\sigma _{1}&\dots &0\\\vdots &\ddots &\vdots \\0&\dots &\sigma _{l}\\\end{bmatrix}}&\cdot &{\begin{bmatrix}{\begin{bmatrix}&&{\textbf {v}}_{1}&&\end{bmatrix}}\\\vdots \\{\begin{bmatrix}&&{\textbf {v}}_{l}&&\end{bmatrix}}\end{bmatrix}}\end{matrix}}} The values σ 1 , … , σ l {\displaystyle \sigma _{1},\dots ,\sigma _{l}} are called the singular values, and u 1 , … , u l {\displaystyle u_{1},\dots ,u_{l}} and v 1 , … , v l {\displaystyle v_{1},\dots ,v_{l}} the left and right singular vectors. Notice the only part of U {\displaystyle U} that contributes to t i {\displaystyle {\textbf {t}}_{i}} is the i 'th {\displaystyle i{\textrm {'th}}} row. Let this row vector be called t ^ i T {\displaystyle {\hat {\textrm {t}}}_{i}^{T}} . Likewise, the only part of V T {\displaystyle V^{T}} that contributes to d j {\displaystyle {\textbf {d}}_{j}} is the j 'th {\displaystyle j{\textrm {'th}}} column, d ^ j {\displaystyle {\hat {\textrm {d}}}_{j}} . These are not the eigenvectors, but depend on all the eigenvectors. I

    Read more →
  • Learning augmented algorithm

    Learning augmented algorithm

    A learning augmented algorithm (also called algorithm with predictions) is an algorithm that can make use of a prediction to improve its performance. Whereas in regular algorithms just the problem instance is inputted, learning augmented algorithms accept an extra parameter. This extra parameter often is a prediction of some property of the solution. This prediction is then used by the algorithm to improve its running time or the quality of its output. The most common application are online algorithms, where a prediction on the uncertain instance is provided. == Description == A learning augmented algorithm typically takes an input ( I , A ) {\displaystyle ({\mathcal {I}},{\mathcal {A}})} . Here I {\displaystyle {\mathcal {I}}} is a problem instance and A {\displaystyle {\mathcal {A}}} is the prediction. A prediction can be any object. Common are the following types: Prediction of an optimal solution. The prediction gives a solution to the problem or characterizes an optimal solution. Prediction of the input. This is mainly used for online problems. Prediction of algorithmic actions. A prediction tailored to a specific algorithm that suggests a specific algorithm execution. Learning augmented algorithms usually satisfy the following three properties: Consistency. A learning augmented algorithm is said to be consistent if the algorithm can be proven to have a good performance when it is provided with an accurate prediction. Smoothness. A learning augmented algorithm is called smooth if its performance can be bounded by a function of the quality of the prediction. Here, the quality can be measured in a problem specific way. This is also called the prediction error. Robustness. A learning augmented algorithm is called robust if its worst-case performance can be bounded even if the given prediction is inaccurate. Learning augmented algorithms generally do not prescribe how the prediction should be done. For this purpose machine learning can be used. == Applications == A few examples of problems where learning augmented algorithms have been applied are the following. === Online algorithms === The ski rental problem The weighted paging problem The set cover problem Nonclairvoyant scheduling The online bipartite matching problem === Warm starting === ==== Data structures ==== The binary search algorithm is an algorithm for finding elements of a sorted list x 1 , … , x n {\displaystyle x_{1},\ldots ,x_{n}} . It needs O ( log ⁡ ( n ) ) {\displaystyle O(\log(n))} steps to find an element with some known value y {\displaystyle y} in a list of length n {\displaystyle n} . With a prediction i {\displaystyle i} for the position of y {\displaystyle y} , the following learning augmented algorithm can be used. First, look at position i {\displaystyle i} in the list. If x i = y {\displaystyle x_{i}=y} , the element has been found. If x i < y {\displaystyle x_{i} y {\displaystyle x_{i}>y} , do the same as in the previous case, but instead consider i − 1 , i − 2 , i − 4 , … {\displaystyle i-1,i-2,i-4,\ldots } . The error is defined to be η = | i − i ∗ | {\displaystyle \eta =|i-i^{}|} , where i ∗ {\displaystyle i^{}} is the real index of y {\displaystyle y} . In the learning augmented algorithm, probing the positions i + 1 , i + 2 , i + 4 , … {\displaystyle i+1,i+2,i+4,\ldots } takes log 2 ⁡ ( η ) {\displaystyle \log _{2}(\eta )} steps. Then a binary search is performed on a list of size at most 2 η {\displaystyle 2\eta } , which takes log 2 ⁡ ( η ) {\displaystyle \log _{2}(\eta )} steps. This makes the total running time of the algorithm 2 log 2 ⁡ ( η ) {\displaystyle 2\log _{2}(\eta )} . So, when the error is small, the algorithm is faster than a normal binary search. This shows that the algorithm is consistent. Even in the worst case, the error will be at most n {\displaystyle n} . Then the algorithm takes at most O ( log ⁡ ( n ) ) {\displaystyle O(\log(n))} steps, so the algorithm is robust. ==== More examples ==== The maximum weight matching problem === Approximation algorithms === The maximum cut problem The vertex cover problem === Mechanism Design === The facility location problem

    Read more →
  • Driver scheduling problem

    Driver scheduling problem

    The driver scheduling problem (DSP) is type of problem in operations research and theoretical computer science. The DSP consists of selecting a set of duties (assignments) for the drivers or pilots of vehicles (e.g., buses, trains, boats, or planes) involved in the transportation of passengers or goods, within the constraints of various legislative and logistical criteria. == Criteria and modelling == This very complex problem involves several constraints related to labour and company rules and also different evaluation criteria and objectives. Being able to solve this problem efficiently can have a great impact on costs and quality of service for public transportation companies. There is a large number of different rules that a feasible duty might be required to satisfy, such as Minimum and maximum stretch duration Minimum and maximum break duration Minimum and maximum work duration Minimum and maximum total duration Maximum extra work duration Maximum number of vehicle changes Minimum driving duration of a particular vehicle Operations research has provided optimization models and algorithms that lead to efficient solutions for this problem. Among the most common models proposed to solve the DSP are the Set Covering and Set Partitioning Models (SPP/SCP). In the SPP model, each work piece (task) is covered by only one duty. In the SCP model, it is possible to have more than one duty covering a given work piece. In both models, the set of work pieces that needs to be covered is laid out in rows, and the set of previously defined feasible duties available for covering specific work pieces is arranged in columns. The DSP resolution, based on either of these models, is the selection of the set of feasible duties that guarantees that there is one (SPP) or more (SCP) duties covering each work piece while minimizing the total cost of the final schedule.

    Read more →
  • Algorithms and Combinatorics

    Algorithms and Combinatorics

    Algorithms and Combinatorics (ISSN 0937-5511) is a book series in mathematics, and particularly in combinatorics and the design and analysis of algorithms. It is published by Springer Science+Business Media, and was founded in 1987. == Books == The books published in this series include: The Simplex Method: A Probabilistic Analysis (Karl Heinz Borgwardt, 1987, vol. 1) Geometric Algorithms and Combinatorial Optimization (Martin Grötschel, László Lovász, and Alexander Schrijver, 1988, vol. 2; 2nd ed., 1993) Systems Analysis by Graphs and Matroids (Kazuo Murota, 1987, vol. 3) Greedoids (Bernhard Korte, László Lovász, and Rainer Schrader, 1991, vol. 4) Mathematics of Ramsey Theory (Jaroslav Nešetřil and Vojtěch Rödl, eds., 1990, vol. 5) Matroid Theory and its Applications in Electric Network Theory and in Statics (Andras Recszki, 1989, vol. 6) Irregularities of Partitions: Papers from the meeting held in Fertőd, July 7–11, 1986 (Gábor Halász and Vera T. Sós, eds., 1989, vol. 8) Paths, Flows, and VLSI-Layout: Papers from the meeting held at the University of Bonn, Bonn, June 20–July 1, 1988 (Bernhard Korte, László Lovász, Hans Jürgen Prömel, and Alexander Schrijver, eds., 1990, vol. 9) New Trends in Discrete and Computational Geometry (János Pach, ed., 1993, vol. 10) Discrete Images, Objects, and Functions in Z n {\displaystyle \mathbb {Z} ^{n}} (Klaus Voss, 1993, vol. 11) Linear Optimization and Extensions (Manfred Padberg, 1999, vol. 12) The Mathematics of Paul Erdős I (Ronald Graham and Jaroslav Nešetřil, eds., 1997, vol. 13) The Mathematics of Paul Erdős II (Ronald Graham and Jaroslav Nešetřil, eds., 1997, vol. 14) Geometry of Cuts and Metrics (Michel Deza and Monique Laurent, 1997, vol. 15) Probabilistic Methods for Algorithmic Discrete Mathematics (M. Habib, C. McDiarmid, J. Ramirez-Alfonsin, and B. Reed, 1998, vol. 16) Modern Cryptography, Probabilistic Proofs and Pseudorandomness (Oded Goldreich, 1999, vol. 17) Geometric Discrepancy: An Illustrated Guide (Jiří Matoušek, 1999, vol. 18) Applied Finite Group Actions (Adalbert Kerber, 1999, vol. 19) Matrices and Matroids for Systems Analysis (Kazuo Murota, 2000, vol. 20; corrected ed., 2010) Combinatorial Optimization (Bernhard Korte and Jens Vygen, 2000, vol. 21; 5th ed., 2012) The Strange Logic of Random Graphs (Joel Spencer, 2001, vol. 22) Graph Colouring and the Probabilistic Method (Michael Molloy and Bruce Reed, 2002, Vol. 23) Combinatorial Optimization: Polyhedra and Efficiency (Alexander Schrijver, 2003, vol. 24. In three volumes: A. Paths, flows, matchings; B. Matroids, trees, stable sets; C. Disjoint paths, hypergraphs) Discrete and Computational Geometry: The Goodman-Pollack Festschrift (B. Aronov, S. Basu, J. Pach, and M. Sharir, eds., 2003, vol. 25) Topics in Discrete Mathematics: Dedicated to Jarik Nešetril on the Occasion of his 60th birthday (M. Klazar, J. Kratochvíl, M. Loebl, J. Matoušek, R. Thomas, and P. Valtr, eds., 2006, vol. 26) Boolean Function Complexity: Advances and Frontiers (Stasys Jukna, 2012, Vol. 27) Sparsity: Graphs, Structures, and Algorithms (Jaroslav Nešetřil and Patrice Ossona de Mendez, 2012, vol. 28) Optimal Interconnection Trees in the Plane (Marcus Brazil and Martin Zachariasen, 2015, vol. 29) Combinatorics and Complexity of Partition Functions (Alexander Barvinok, 2016, vol. 30)

    Read more →
  • Cinema 4D

    Cinema 4D

    Cinema 4D is a 3D software suite developed by the German company Maxon. == Overview == As of R21, only a single version of Cinema 4D is available. It replaces all previous variants, including BodyPaint 3D, and includes all features of the past 'Studio' variant. With R21, all binaries were unified. There is no technical difference between commercial, educational, or demo versions. The difference is now only in licensing. 2014 saw the release of Cinema 4D Lite, which came packaged with Adobe After Effects Creative Cloud 2014. "Lite" acts as an introductory version, with many features withheld. This is part of a partnership between the two companies, where a Maxon-produced plug-in, called Cineware, allows any variant to create a seamless workflow with After Effects. The "Lite" variant is dependent on After Effects CC, needing the latter application running to launch, and is only sold as a package component included with After Effects CC through Adobe. Initially, Cinema 4D was developed for Amiga computers in the early 1990s, and the first three versions of the program were available exclusively for that platform. With v4, however, Maxon began to develop the application for Windows and Macintosh computers as well, citing the wish to reach a wider audience and the growing instability of the Amiga market following Commodore's bankruptcy. It was also released for BeOS. On Linux, Cinema 4D is available as a commandline rendering version. == Modules and older variants == From R12 to R20, Cinema 4D was available in four variants. A core Cinema 4D 'Prime' application, a 'Broadcast' version with additional motion-graphics features, 'Visualize,' which adds functions for architectural design and 'Studio,' which includes all modules. From Release 8 until Release 11.5, Cinema 4D had a modular approach to the application, with the ability to expand upon the core application with various modules. This ended with Release 12, though the functionality of these modules remains in the different flavors of Cinema 4D (Prime, Broadcast, Visualize, Studio) The old modules were: Advanced Render (global illumination/HDRI, caustics, ambient occlusion and sky simulation) BodyPaint 3D (direct painting on UVW meshes; now included in the core. In essence Cinema 4D Core/Prime and the BodyPaint 3D products are identical. The only difference between the two is the splash screen that is shown at startup and the default user interface.) Dynamics (for simulating soft body and rigid body dynamics) Hair (simulates hair, fur, grass, etc.) MOCCA (character animation and cloth simulation) MoGraph (Motion Graphics procedural modelling and animation toolset) NET Render (to render animations over a TCP/IP network in render farms) PyroCluster (simulation of smoke and fire effects) Prime (the core application) Broadcast (adds MoGraph2) Visualize (adds Virtual Walkthrough, Advanced Render, Sky, Sketch and Toon, data exchange, camera matching) Studio (the complete package) == Version history == == Use in industry == A number of films and related works have been modeled and rendered in Cinema 4D, including: == Cinebench == Cinebench is a cross-platform test suite which tests a computer's hardware capabilities. It can be used as a test for Cinema 4D's 3D modeling, animation, motion graphic and rendering performance on multiple CPU cores. The program "target[s] a certain niche and [is] better suited for high-end desktop and workstation platforms". Cinebench is commonly used to demonstrate hardware capabilities at tech shows to show a CPU performance, especially by tech YouTubers and review sites.

    Read more →
  • OpenWSN

    OpenWSN

    OpenWSN aims to build an open standard-based and open source implementation of a complete constrained network protocol stack for wireless sensor networks and Internet of Things. The project was created at the University of California Berkeley and extended at the INRIA and at the Open University of Catalonia (UOC). The root of OpenWSN is a deterministic MAC layer implementing the IEEE 802.15.4e TSCH based on the concept of Time Slotted Channel Hopping (TSCH). Above the MAC layer, the Low Power Lossy Network stack is based on IETF standards including the IETF 6TiSCH management and adaptation layer (a minimal configuration profile, 6top protocol and different scheduling functions). The stack is complemented by an implementation of 6LoWPAN, RPL in non-storing mode, UDP and CoAP, enabling access to devices running the stack from the native IPv6 through open standards. OpenWSN is related to other projects including the following: RIOT OpenMote OpenWSN is available for Linux, Windows and OS X platforms. Current release of OpenWSN is 1.14.0.

    Read more →
  • Timeline of algorithms

    Timeline of algorithms

    The following timeline of algorithms outlines the development of algorithms (mainly "mathematical recipes") since their inception. == Antiquity == Before – writing about "recipes" (on cooking, rituals, agriculture and other themes) c. 1700–2000 BC – Egyptians develop earliest known algorithms for multiplying two numbers c. 1600 BC – Babylonians develop earliest known algorithms for factorization and finding square roots c. 300 BC – Euclid's algorithm c. 200 BC – the Sieve of Eratosthenes 263 AD – Gaussian elimination described by Liu Hui == Medieval Period == 628 – Chakravala method described by Brahmagupta c. 820 – Al-Khawarizmi described algorithms for solving linear equations and quadratic equations in his Algebra; the word algorithm comes from his name 825 – Al-Khawarizmi described the algorism, algorithms for using the Hindu–Arabic numeral system, in his treatise On the Calculation with Hindu Numerals, which was translated into Latin as Algoritmi de numero Indorum, where "Algoritmi", the translator's rendition of the author's name gave rise to the word algorithm (Latin algorithmus) with a meaning "calculation method" c. 850 – cryptanalysis and frequency analysis algorithms developed by Al-Kindi (Alkindus) in A Manuscript on Deciphering Cryptographic Messages, which contains algorithms on breaking encryptions and ciphers c. 1025 – Ibn al-Haytham (Alhazen), was the first mathematician to derive the formula for the sum of the fourth powers, and in turn, he develops an algorithm for determining the general formula for the sum of any integral powers c. 1400 – Ahmad al-Qalqashandi gives a list of ciphers in his Subh al-a'sha which include both substitution and transposition, and for the first time, a cipher with multiple substitutions for each plaintext letter; he also gives an exposition on and worked example of cryptanalysis, including the use of tables of letter frequencies and sets of letters which can not occur together in one word == Before 1940 == 1540 – Lodovico Ferrari discovered a method to find the roots of a quartic polynomial 1545 – Gerolamo Cardano published Cardano's method for finding the roots of a cubic polynomial 1614 – John Napier develops method for performing calculations using logarithms 1671 – Newton–Raphson method developed by Isaac Newton 1690 – Newton–Raphson method independently developed by Joseph Raphson 1706 – John Machin develops a quickly converging inverse-tangent series for π and computes π to 100 decimal places 1768 – Leonhard Euler publishes his method for numerical integration of ordinary differential equations in problem 85 of Institutiones calculi integralis 1789 – Jurij Vega improves Machin's formula and computes π to 140 decimal places, 1805 – FFT-like algorithm known by Carl Friedrich Gauss 1842 – Ada Lovelace writes the first algorithm for a computing engine 1903 – A fast Fourier transform algorithm presented by Carle David Tolmé Runge 1918 - Soundex 1926 – Borůvka's algorithm 1926 – Primary decomposition algorithm presented by Grete Hermann 1927 – Hartree–Fock method developed for simulating a quantum many-body system in a stationary state. 1934 – Delaunay triangulation developed by Boris Delaunay 1936 – Turing machine, an abstract machine developed by Alan Turing, with others developed the modern notion of algorithm. == 1940s == 1942 – A fast Fourier transform algorithm developed by G.C. Danielson and Cornelius Lanczos 1945 – Merge sort developed by John von Neumann 1947 – Simplex algorithm developed by George Dantzig == 1950s == 1950 – Hamming codes developed by Richard Hamming 1952 – Huffman coding developed by David A. Huffman 1953 – Simulated annealing introduced by Nicholas Metropolis 1954 – Radix sort computer algorithm developed by Harold H. Seward 1964 – Box–Muller transform for fast generation of normally distributed numbers published by George Edward Pelham Box and Mervin Edgar Muller. Independently pre-discovered by Raymond E. A. C. Paley and Norbert Wiener in 1934. 1956 – Kruskal's algorithm developed by Joseph Kruskal 1956 – Ford–Fulkerson algorithm developed and published by R. Ford Jr. and D. R. Fulkerson 1957 – Prim's algorithm developed by Robert Prim 1957 – Bellman–Ford algorithm developed by Richard E. Bellman and L. R. Ford, Jr. 1959 – Dijkstra's algorithm developed by Edsger Dijkstra 1959 – Shell sort developed by Donald L. Shell 1959 – De Casteljau's algorithm developed by Paul de Casteljau 1959 – QR factorization algorithm developed independently by John G.F. Francis and Vera Kublanovskaya 1959 – Rabin–Scott powerset construction for converting NFA into DFA published by Michael O. Rabin and Dana Scott == 1960s == 1960 – Karatsuba multiplication 1961 – CRC (Cyclic redundancy check) invented by W. Wesley Peterson 1962 – AVL trees 1962 – Quicksort developed by C. A. R. Hoare 1962 – Bresenham's line algorithm developed by Jack E. Bresenham 1962 – Gale–Shapley 'stable-marriage' algorithm developed by David Gale and Lloyd Shapley 1964 – Heapsort developed by J. W. J. Williams 1964 – multigrid methods first proposed by R. P. Fedorenko 1965 – Cooley–Tukey algorithm rediscovered by James Cooley and John Tukey 1965 – Levenshtein distance developed by Vladimir Levenshtein 1965 – Cocke–Younger–Kasami (CYK) algorithm independently developed by Tadao Kasami 1965 – Buchberger's algorithm for computing Gröbner bases developed by Bruno Buchberger 1965 – LR parsers invented by Donald Knuth 1966 – Dantzig algorithm for shortest path in a graph with negative edges 1967 – Viterbi algorithm proposed by Andrew Viterbi 1967 – Cocke–Younger–Kasami (CYK) algorithm independently developed by Daniel H. Younger 1968 – A graph search algorithm described by Peter Hart, Nils Nilsson, and Bertram Raphael 1968 – Risch algorithm for indefinite integration developed by Robert Henry Risch 1969 – Strassen algorithm for matrix multiplication developed by Volker Strassen == 1970s == 1970 – Dinic's algorithm for computing maximum flow in a flow network by Yefim (Chaim) A. Dinitz 1970 – Knuth–Bendix completion algorithm developed by Donald Knuth and Peter B. Bendix 1970 – BFGS method of the quasi-Newton class 1970 – Needleman–Wunsch algorithm published by Saul B. Needleman and Christian D. Wunsch 1972 – Edmonds–Karp algorithm published by Jack Edmonds and Richard Karp, essentially identical to Dinic's algorithm from 1970 1972 – Graham scan developed by Ronald Graham 1972 – Red–black trees and B-trees discovered 1973 – RSA encryption algorithm discovered by Clifford Cocks 1973 – Jarvis march algorithm developed by R. A. Jarvis 1973 – Hopcroft–Karp algorithm developed by John Hopcroft and Richard Karp 1974 – Pollard's p − 1 algorithm developed by John Pollard 1974 – Quadtree developed by Raphael Finkel and J.L. Bentley 1975 – Genetic algorithms popularized by John Holland 1975 – Pollard's rho algorithm developed by John Pollard 1975 – Aho–Corasick string matching algorithm developed by Alfred V. Aho and Margaret J. Corasick 1975 – Cylindrical algebraic decomposition developed by George E. Collins 1976 – Salamin–Brent algorithm independently discovered by Eugene Salamin and Richard Brent 1976 – Knuth–Morris–Pratt algorithm developed by Donald Knuth and Vaughan Pratt and independently by J. H. Morris 1977 – Boyer–Moore string-search algorithm for searching the occurrence of a string into another string. 1977 – RSA encryption algorithm rediscovered by Ron Rivest, Adi Shamir, and Len Adleman 1977 – LZ77 algorithm developed by Abraham Lempel and Jacob Ziv 1977 – multigrid methods developed independently by Achi Brandt and Wolfgang Hackbusch 1978 – LZ78 algorithm developed from LZ77 by Abraham Lempel and Jacob Ziv 1978 – Bruun's algorithm proposed for powers of two by Georg Bruun 1979 – Khachiyan's ellipsoid method developed by Leonid Khachiyan 1979 – ID3 decision tree algorithm developed by Ross Quinlan == 1980s == 1980 – Brent's Algorithm for cycle detection Richard P. Brendt 1981 – Quadratic sieve developed by Carl Pomerance 1981 – Smith–Waterman algorithm developed by Temple F. Smith and Michael S. Waterman 1983 – Simulated annealing developed by S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi 1983 – Classification and regression tree (CART) algorithm developed by Leo Breiman, et al. 1984 – LZW algorithm developed from LZ78 by Terry Welch 1984 – Karmarkar's interior-point algorithm developed by Narendra Karmarkar 1984 – ACORN PRNG discovered by Roy Wikramaratna and used privately 1985 – Simulated annealing independently developed by V. Cerny 1985 – Car–Parrinello molecular dynamics developed by Roberto Car and Michele Parrinello 1985 – Splay trees discovered by Sleator and Tarjan 1986 – Blum Blum Shub proposed by L. Blum, M. Blum, and M. Shub 1986 – Push relabel maximum flow algorithm by Andrew Goldberg and Robert Tarjan 1986 – Barnes–Hut tree method developed by Josh Barnes and Piet Hut for fast approximate simulation of n-body problems 1987 – Fast multipole method developed by Leslie Greengard and Vladimir

    Read more →