关于算法:Golang-算法导论动态规划Dynamic-Programming理解-一

本篇内容为浏览《算法导论》动静布局算法设计时的一些了解和记录。倡议大家去看原书,真的好。 动静布局有点像分治法,都是通过合并原问题的子问题的解来失去原问题的解。不同的是分治法将原问题划分为不相交的子问题,递归地解决子问题,而后组合它们的解来失去原问题的解。而动静布局须要原问题划分为有重叠的子问题,即其子问题又须要独特的子子问题。 当划分的子问题有重叠时,应用分治法会导致重叠子问题的反复计算。动静布局实际上就是通过存储子问题的解来防止这样的反复计算,这是动静布局的根本思维。 动静布局个别用于求一个最优解的问题,解决这样的问题蕴含四步: 演绎出一个最优解的构造。递归定义一个最优解的值。计算一个最优解的值。(个别应用从底向上的形式)从结算中构建出最优解。这样直白的语言概括其实很难了解的,还须要依据例子来了解,《算法导论》中给出了几个例子,咱们也借用他来记录一下。 例一将一根长度为 n 的铁棒进行宰割,不同长度能够卖不同的价格(遵循一个价格表 P ),问能够卖出的最大的价格是多少?(一个最优解的问题) 设长度为 n 的铁棍通过切割最多能卖出的价格为 rn,第一次宰割的长度为 i ,其余部分为 n-i。其中 i 的范畴是 [1,n],那咱们能够示意出 rn : rn = max (pi + rn-i) 留神i 的范畴是 [1,n],这里的max要求的是例如p1 + rn-1,p2 + rn-2,... 之类的最大值。 很显然,这是个递归,咱们很容易写出代码(Go): func cutRod(p []int, n int) int { if n == 0 { return 0 } q := math.MinInt for i := 1; i <= n; i++ { q = max(q, p[i]+cutRod(p, n-i)) } return q}理论运行中,咱们会发现,效率十分非常低。。。为什么呢?因为计算中存在了大量的冗余,例如 n = 3 的计算过程须要 cutRod(p, 0),cutRod(p, 1),cutRod(p, 2)。而计算 cutRod(p, 2) 须要 cutRod(p, 1),cutRod(p, 0)。计算 cutRod(p, 1) 须要 cutRod(p, 0)。这个过程中,很多雷同的函数计算了屡次。能够证实的是,该算法复杂度是指数级别的。 ...

April 21, 2022 · 2 min · jiezi

关于算法:图文并茂推荐算法架构粗排

导语 | 粗排是介于召回和精排之间的一个模块,是典型的精度与性能之间trade-off的产物。了解粗排各技术细节,肯定要时刻把精度和性能放在心中。本篇将深刻重排这个模块进行论述。 一、总体架构粗排是介于召回和精排之间的一个模块。它从召回获取上万的候选item,输入几百上千的item给精排,是典型的精度与性能之间trade-off的产物。对于举荐池不大的场景,粗排是非必选的。粗排整体架构如下: 二、粗排根本框架:样本、特色、模型目前粗排个别模型化了,根本框架也是包含数据样本、特色工程、深度模型三局部。 (一)数据样本目前粗排个别也都模型化了,其训练样本相似于精排,选取曝光点击为正样本,曝光未点击为负样本。但因为粗排个别面向上万的候选集,而精排只有几百上千,其解空间大很多。只应用曝光样本作为训练,但却要对曝光和非曝光同时预测,存在重大的样本抉择偏差(SSB问题),导致训练与预测不统一。相比精排,显然粗排的SSB问题更重大。 (二)特色工程粗排的特色也能够相似于精排,因为其计算提早要求高,只有10ms~20ms,故个别能够粗分为两类: 一般特色:相似精排,user、context、item三局部。有哪些特色,以及特色如何解决,能够参看精排的特色工程局部。 穿插特色:user和item之间的穿插特色,对晋升模型精度很有帮忙。但因为穿插特色枚举过多,难以离线计算和存储。实时打分时又不像user特色只用计算一次,提早较高。故对于穿插特色要审慎应用。 (三)深度模型粗排目前曾经根本模型化,其倒退历程次要分为四个阶段: 第一代:人工规定策略,能够基于后验统计,构建一个人工规定。比方交融item的历史CTR、CVR、类目价格档、销量等比拟外围的因子。人工规定准确率低,也没有个性化,也不可能实时更新。 第二代:LR线性模型,有肯定的个性化和实时性能力,但模型过于简略,表达能力偏弱。 第三代:DSSM双塔内积深度模型。它将user和item进行解耦合,别离通过两个Tower独立构建。从而能够实现item向量离线存储,升高线上predict提早。次要有两种范式: item和user均离线存储。这个计划只须要计算user和item的内积即可,计算提早低。因为user是离线存储的,故能够应用简单的模型,晋升表达能力。但user侧的实时性较差,对于用户行为不能实时捕获。 item离线,user实时。item绝对user,实时性要求没那么高。因为一次打分是针对同一个用户的,故user侧只须要实时计算一次即可,速度也很快。目前这个计划应用较多。 第四代:item和user隔离,导致二者没有特色穿插能力,模型表达能力弱。故又提出了以COLD为代表的第四代模型,轻量级MLP粗排模型。它通过SE block实现特色裁剪,并配合网络剪枝和工程优化,能够实现精度和性能之间的trade-off。 三、粗排优化粗排的几个次要问题: 精度和特色穿插问题:经典的DSSM模型长处很多,目前在粗排上广泛应用,其最外围的毛病就是不足特色穿插能力。正所谓成也萧何败萧何,正是因为user和item拆散,使得DSSM性能很高。但反过来也是因为二者不足穿插,导致模型表达能力有余,精度降落。典型的精度和性能之间的trade-off。低提早要求:粗排提早要求高,个别只有10ms~20ms,远低于精排的要求。SSB问题:粗排解空间比精排大很多,和精排一样只应用曝光样本,导致重大的样本抉择偏差问题。 (一)精度晋升精度晋升的计划次要有精排蒸馏和特色穿插,次要还是要优化特色穿插问题。 精排蒸馏精排模型作为teacher,对粗排模型进行蒸馏学习,从而晋升粗排成果,这曾经成为了目前粗排训练根本范式 特色穿插特色穿插能够在特色层面,也能够在模型层面实现。特色层面就是手工结构穿插特色,作为模型底层输出,依然能够在独立的Tower中。模型层面则应用FM或者MLP等实现主动穿插。次要办法有: 特色蒸馏:teacher和student应用雷同的网络结构,teacher模型应用一般特色和穿插特色,student则只应用一般特色。student从teacher中能够学到穿插特色的高阶信息。 退出穿插特色:特色层面构建手工穿插特色,独立的Tower中应用。因为穿插特色难以离线存储,实时计算空间也很大,故这个独立的Tower不能过于简单。那咱们第一工夫就想到了wide&deep模型。deep局部依然应用DSSM双塔,wide局部则为穿插特色。 轻量级MLP:模型层面实现特色穿插,不进行独立分塔。比方COLD,通过特色裁剪、网络剪枝、工程优化等形式升高时延,而不是靠独立分塔。 (二)提早升高精度和性能始终以来都是一个trade-off,很多计划都是在二者之间寻找均衡。粗排的性能要求更高,其提早必须管制在10ms~20ms以内。性能优化有很多常见办法。 次要有以下办法: 特色裁剪:如COLD,不重要的特色先滤掉,天然就升高了整体提早。这一层能够做在模型内,从而能够个性化和实时更新。量化和定点化:比方32bit升高为8bit,能够晋升计算和存储性能。网络剪枝:network pruning,包含突触剪枝、神经元剪枝、权重矩阵剪枝等办法,不开展了。模型蒸馏:model distillation,上文曾经提到了,不开展了。网络结构搜寻NAS:应用更轻量级,成果更好的模型。能够尝试网络结构搜寻NAS。 (三)SSB问题粗排解空间比精排大很多,和精排一样只应用曝光样本,导致重大的样本抉择偏差问题。能够把未曝光样本的精排打分给利用起来,缓解SSB问题。 作者简介谢杨易 腾讯利用算法研究员。

April 20, 2022 · 1 min · jiezi

关于算法:LeetCodeGolang-56-合并区间

题目:以数组 intervals 示意若干个区间的汇合,其中单个区间为 intervals[i] = [starti, endi] 。请你合并所有重叠的区间,并返回一个不重叠的区间数组,该数组需恰好笼罩输出中的所有区间 。 示例: 输出:intervals = [[1,3],[2,6],[8,10],[15,18]]输入:[[1,6],[8,10],[15,18]]解释:区间 [1,3] 和 [2,6] 重叠, 将它们合并为 [1,6].1 <= intervals.length <= 104intervals[i].length == 20 <= starti <= endi <= 104题解:排序后好解决,先贴代码(Go): type Intervals [][]intfunc (k Intervals) Len() int { return len(k)}func (k Intervals) Less(i, j int) bool { return k[i][0] < k[j][0]}func (k Intervals) Swap(i, j int) { k[i][0], k[j][0] = k[j][0], k[i][0] k[i][1], k[j][1] = k[j][1], k[i][1]}func merge(intervals [][]int) [][]int { var result [][]int var k Intervals = intervals sort.Sort(k) left := k[0][0] right := k[0][1] for _, interval := range k { if interval[0] <= right { right = max(right, interval[1]) } else { result = append(result, []int{left, right}) left = interval[0] right = interval[1] } } result = append(result, []int{left, right}) return result}func max(a int, b int) int { if a > b { return a } return b}首先咱们将 intervals 中的元素看作一个单元,则这个单元有两个次要属性,即左端点和右端点。 ...

April 20, 2022 · 1 min · jiezi

关于算法:叮咚请查收来自一线数据科学家和大数据工程师的实战经验-IDP-Meetup-No02-回顾

4 月 16 日咱们举办了 IDP Meetup No.02,邀请到当先互联网企业的一线数据科学家和大数据工程师——腾讯赵喜生和当先金融科技公司李峰,别离从个性化举荐场景和大数据处理的角度分享 AI 开发生产平台在其日常工作中的实际和教训。同时,白海科技联结创始人兼技术负责人刘喆与大家探讨了编程辅助性能及其在 IDP 中的实现。 请各位小伙伴查收精彩内容回顾!(文末有彩蛋) 回顾 1: 机器学习平台在个性化举荐中的实际腾讯举荐中台架构师赵喜生,以个性化举荐为例,为咱们分享了 AI 开发生产平台如何使得简单模型的训练及业务利用更为高效。次要内容包含: 举荐中台的组织逻辑数据—举荐特色解决:具体阐释了特色数据流和特色引擎架构训练—高维零碎 CTR 模型训练:包含典型 CTR 网络,分布式 Gradient Descent,分布式训练原理及技术实现等推理—低延时、高可用批量排序服务:包含推理引擎的介绍以及模型优化的具体方法业务—可扩大的举荐经营:包含举荐应用的场景,通用举荐的实现办法以及举荐链路 debug 等。IDP 的实战经验分享:以理论案例展现利用 IDP 进行举荐模型构建的整体流程 直播回放:https://www.bilibili.com/vide... 回顾 2:大数据处理技术演进及工具抉择当先金融科技公司开发工程师李峰从大数据发展史的角度,具体论述了大数据技术体系及其演进。具体内容包含: 大数据处理体系数据处理体系架构大数据处理体系中各模块技术和架构的具体分析,包含:数据源接入、数据存储、集群调度、通用计算、任务调度、数据治理、数据安全等IDP 在大数据处理中的易用性介绍直播回放:https://www.bilibili.com/vide... 回顾 3:IDP 中的编程辅助实现白海科技联结创始人兼技术负责人刘喆具体论述了可极大进步开发人员效率的编程辅助性能的具体技术与实现,次要内容包含: IDE 的根本组成及外围性能什么是动态剖析及其重要性编程辅助中最重要的“代码补全”性能的详述 vim/emacs 等老牌编辑器中代码补全的倒退历程IDP Studio/vscode/intelliJ 等古代 IDE 中代码补全的性能实现代码补全各种组件的比拟“人工智能”辅助下的“智能补全”的原理、技术难点以及最新倒退停顿IDP 中代码补全的实现和应用体验演示直播回放:https://www.bilibili.com/vide... 【彩蛋环节】本次分享 PPT 获取形式:关注公众号 Baihai IDP,回复 IDP02 即可支付

April 20, 2022 · 1 min · jiezi

关于算法:LeetCodeGolang-209-长度最小的子数组

题目:给定一个含有 n 个正整数的数组和一个正整数 target 。 找出该数组中满足其和 ≥ target 的长度最小的间断子数组 [numsl, numsl+1, ..., numsr-1, numsr] ,并返回其长度。如果不存在符合条件的子数组,返回 0。 1 <= target <= 1091 <= nums.length <= 1051 <= nums[i] <= 105题解一:滑动窗口能够很好的解决。先贴代码(Go): func minSubArrayLen(target int, nums []int) int { l, r, n, minValue := 0, 0, len(nums), math.MaxInt32 //[i,r] sum := 0 for r < n && l <= r { sum += nums[r] for sum >= target { minValue = min(minValue, r-l+1) sum -= nums[l] l++ } r++ } if minValue == math.MaxInt32 { return 0 } return minValue}func min(a int, b int) int { if a < b { return a } return b}核心思想是: ...

April 19, 2022 · 2 min · jiezi

关于算法:LeetCodeGolang-167-两数之和-II-输入有序数组

题目:给你一个下标从 1 开始的整数数组 numbers ,该数组已按非递加顺序排列 ,请你从数组中找出满足相加之和等于指标数 target 的两个数。如果设这两个数别离是 numbers[index1] 和 numbers[index2] ,则 1 <= index1 < index2 <= numbers.length 。 以长度为 2 的整数数组 [index1, index2] 的模式返回这两个整数的下标 index1 和 index2。 你能够假如每个输出 只对应惟一的答案 ,而且你 不能够 重复使用雷同的元素。 你所设计的解决方案必须只应用常量级的额定空间。 2 <= numbers.length <= 3 * 104-1000 <= numbers[i] <= 1000numbers 按 非递加程序 排列-1000 <= target <= 1000仅存在一个无效答案题解:能够应用对撞指针的办法,先贴代码(Go): func twoSum(numbers []int, target int) []int { l, r := 0, len(numbers)-1 for l < r { sum := numbers[l] + numbers[r] if sum == target { return []int{l + 1, r + 1} } else if sum > target { r-- } else { l++ } } return nil}留神到题目中原始数组是升序的,所以咱们l和r别离从数组头和数组尾向两头遍历。 ...

April 18, 2022 · 1 min · jiezi

关于算法:LeetCodeGolang-88合并两个有序数组

题目:给你两个按 非递加程序 排列的整数数组 nums1 和 nums2,另有两个整数 m 和 n ,别离示意 nums1 和 nums2 中的元素数目。 请你合并 nums2 到 nums1 中,使合并后的数组同样按 非递加程序 排列。 留神:最终,合并后数组不应由函数返回,而是存储在数组 nums1 中。为了应答这种状况,nums1 的初始长度为 m + n,其中前 m 个元素示意应合并的元素,后 n 个元素为 0 ,应疏忽。nums2 的长度为 n 。 nums1.length == m + nnums2.length == n0 <= m, n <= 2001 <= m + n <= 200-109 <= nums1[i], nums2[j] <= 109题解:先贴代码(Go): func merge(nums1 []int, m int, nums2 []int, n int) { for p1, p2, tail := m-1, n-1, m+n-1; p1 >= 0 || p2 >= 0; tail-- { var cur int if p1 == -1 { cur = nums2[p2] p2-- } else if p2 == -1 { cur = nums1[p1] p1-- } else if nums1[p1] > nums2[p2] { cur = nums1[p1] p1-- } else { cur = nums2[p2] p2-- } nums1[tail] = cur }}LeetCode该题热门评论: ...

April 18, 2022 · 1 min · jiezi

关于算法:CSC1001-算法分析

CSC1001: Introduction to Computer ScienceProgramming MethodologyAssignment 3Assignment description:This assignment will be worth 9% of the final grade.You should write your code for each question in a .py file (please name it using thequestion name, e.g. q1.py). Please pack all your .py files into a single .zip file, name it usingyour student ID (e.g. if your student ID is 123456, then the file should be named as123456.zip), and then submit the .zip file via BlackBoard.Please also write a text file, which provide the details. (Note that the report should besubmitted as PDF) The report should be included in the .zip file as well.Please note that, the teaching assistant may ask you to explain the meaning of yourprogram, to ensure that the codes are indeed written by yourself. Plagiarism will not betolerated. We may check your code using Blackboard.This assignment is due on 5:00PM, 23 Apr (Friday). For each day of late submission, youwill lose 10% of your mark in this assignment. If you submit more than three days laterthan the deadline, you will receive zero in this assignment.Question 1 (20% of this assignment):Write a Python class, Flower, that has three instance variables of type str, int, and float,that respectively represent the name of the flower, its number of petals, and its price.(This will be the input order.) Your class must include an initializer that initializes eachvariable to an appropriate value, and your class should include methods for setting thevalue of each type, and retrieving the value of each type. Your program should be robustenough to handle possible inappropriate inputs.Question 2 (40% of this assignment):Write a Python class that inputs a polynomial in standard algebraic notation and outputsthe first derivative of that polynomial. Both the inputted polynomial and its derivativeshould be represented as strings.For example, when the inputted polynomial is , the output ofyour program should be . Note: (1) The inputted polynomial will contain only one variable, and the variable is notnecessarily ‘x’; (2) In the inputted polynomial, the terms are not necessarily arranged indescending or ascending orders.Question 3 (40% of this assignment):Write a Python class to simulate an ecosystem containing two types of creatures, bearsand fish. The ecosystem consists of a river, which is modeled as a relatively large list.Each element of the list should be a Bear object, a Fish object, or None. In each timestep, based on a random process, each animal either attempts to move into an adjacentlist location or stay where it is. If two animals of the same type are about to collide in thesame cell, then they stay where they are, but they create a new instance of that type ofanimal, which is placed in a random empty (i.e., previously None) location in the list. If abear and a fish collide, however, then the fish dies (i.e., it disappears).Write an initializer for the ecosystem class, the initializer should allow the user to assignthe initial values of the river length, the number of fishes and the number of bears.Before the simulation, fishes and bears should be allocated randomly into the river. Theecosystem class should also contain a simulation() method, which will simulate the nextN steps of the random moving process. N should be inputted by the user. In each step ofyour simulation, all animals in the river should try to take some random moves. In eachstep of your simulation, the animals should take actions one by one. The animals on theleft will take actions first. (The input order: river length, fishes, bears and setps.)For example, assume that before the simulation, the initial state of the river is:In which, ‘F’, ‘B’ and ‘N’ denote fish, bear and empty location respectively. Assume thatin the first step of simulation, the first fish will move to the left, the first bear will moveto the right, and the second bear will remain still. Then after the first step, the state ofthe river is:To generate random numbers in Python, you should import the random() function by usingthe following statement:By assigning the return of the random() function to a variable, you will get a randomfloating point number in the range of [0, 1]. The following code is an example of using therandom() function: ...

April 18, 2022 · 4 min · jiezi

关于算法:LeetCodeGolang-215-数组中的第K个最大元素

题目:给定整数数组 nums 和整数 k,请返回数组中第 k 个最大的元素。 请留神,你须要找的是数组排序后的第 k 个最大的元素,而不是第 k 个不同的元素。 1 <= k <= nums.length <= 104-104 <= nums[i] <= 104本算法题目有两种比拟好的办法,别离为应用小顶堆和应用疾速抉择。 题解一:首先咱们应用小顶堆的办法求解,先贴代码(Go语言): type IntHeap []intfunc (h IntHeap) Len() int { return len(h) }func (h IntHeap) Less(i, j int) bool { return h[i] < h[j] }func (h IntHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }func (h IntHeap) Top() int { return h[0]}func (h *IntHeap) Push(x interface{}) { // Push and Pop use pointer receivers because they modify the slice's length, // not just its contents. *h = append(*h, x.(int))}func (h *IntHeap) Pop() interface{} { old := *h n := len(old) x := old[n-1] *h = old[0 : n-1] return x}func findKthLargest(nums []int, k int) int { h := &IntHeap{} heap.Init(h) for _, num := range nums { if h.Len() < k { heap.Push(h, num) } else if (*h)[0] < num { heap.Pop(h) heap.Push(h, num) } } return heap.Pop(h).(int)}首先咱们确定循环不变量:h中始终存储了最多k个数组元素,且是目前遍历过的最大的k个元素,且其中最小的元素为(*h)[0]。 ...

April 18, 2022 · 2 min · jiezi

关于算法:Go语言实现快速排序QuickSort

疾速排序(QuickSort) 作为最风行的排序算法之一,又有十分杰出的性能,被宽广的编程语言作为规范库默认排序办法。 疾速排序的设计思维是一个很好的分治法(divide-and-conquer) 的实例,了解他的实现原理将有助于咱们在理论生产过程中设计本人的解决问题的算法。最间接的,很多算法题目须要应用到相似的思维。 先贴代码(Go): func quickSort(nums []int, l, r int) { //[l,r] if l < r { m := partition(nums, l, r) quickSort(nums, l, m-1) quickSort(nums, m+1, r) }}func partition(nums []int, l int, r int) int { key := nums[r] //all in [l,i) < key //all in [i,j] > key i := l j := l for j < r { if nums[j] < key { nums[i], nums[j] = nums[j], nums[i] i++ } j++ } nums[i], nums[r] = nums[r], nums[i] return i}首先咱们先大抵介绍一下分治法。很多有用的算法在构造是递归的,为了解决给定的问题,他们屡次递归的调用本人去解决一个相干的子问题。这些算法通常遵循分治法,即他们将原问题划分为多个规模更小且与原问题类似的子问题,而后递归的解决子问题,最初合并子问题的答案失去原问题的答案。 ...

April 18, 2022 · 1 min · jiezi

关于算法:循环不变量loop-invariant的理解

在计算机科学中,循环不变量(loop invariant),是一组在循环体内、每次迭代均放弃为真的某种性质,通常被用来证实程序或算法的正确性。 了解循环不变量这个概念对咱们了解算法过程,和解决算法问题有很大的帮忙。上面参考《算法导论》,对循环不变量的概念进行具体的解释。 咱们应用循环不变量帮忙咱们了解一个算法为什么是对的。对于一个给定的循环不变量,咱们必须遵循以下三个属性: 初始化: 在循环的第一次迭代之前,循环不变量为真。放弃: 如果在循环的一次迭代之前循环不变量为真,那么在下一次迭代之前循环不变量同样为真。终止: 当循环完结时,不变量可能提供咱们有用的属性,用于帮忙咱们证实算法是正确的。当保障前两个属性时,循环不变量在循环的任意迭代之前都满足。留神它与数学归纳法的相似性,当你想证实一个属性存在时,你须要证实一个基准和一个演绎步。相应的,咱们第一次迭代之前保障不变量成立对应于一个基准,咱们在每次迭代之间保障不变量成立对应于一个演绎步。 因为咱们用循环不变量证实算法正确性,所以第三个属性或者是最重要的。通常,咱们必须保障在循环完结时“循环不变量”和“循环终止条件”同时成立。 这与数学归纳法有所不同。数学归纳法常采纳有限的演绎步,而循环不变量的演绎往往随着循环的终止而完结。 接下来,咱们通过插入排序算法来更好的了解循环不变量。先贴代码(Go语言) func insertionSort(nums []int) { j, n := 0, len(nums) for j < n { i := j - 1 key := nums[j] for i >= 0 && nums[i] > key { nums[i+1] = nums[i] i-- } nums[i+1] = key j++ }}循环不变量: j之前(不蕴含nums[j])的元素曾经排好序(升序)。 初始化: j为0,示意目前排好序的子数组没有元素,不变式成立。放弃: 如果前一轮迭代j满足条件,则[0,j)范畴内子数组的元素均为升序。以后迭代中nums[j+1]与子数组的元素从大到小进行比照。如果找到第一个比nums[j+1]小的元素,则在其后插入一个值为nums[j+1]的元素。因而在此次迭代完结后,[0,j+1)范畴内子数组的元素均为升序,不变式成立。终止: j一直递增,当j == n 时,所有数组的元素均被遍历解决,此时[0,n)为升序,不变式成立。通过以上三个属性的证实,能够最终得出整个输出数组nums为升序的论断,满足算法的目标,同时也验证了算法的正确性。

April 17, 2022 · 1 min · jiezi

关于算法:159201算法探讨

159.201 Algorithms & Data StructuresS1 2021Assignment 4Write a C++ program to implement the addition operation for big numbers.  A big number is a positive, wholenumber that can be of any arbitrary size, and therefore will not necessarily fit into an existing C++ type. You must implement each big number as a List of single decimal digits and you must also set up the template classList based on a linked­list (use the sample codes).  You will need to define and implement the appropriate methodsfor big numbers.Your program must be laid out in the following way:Section A : the template class List (with extra methods if you wish to do so)Section B : the class BigNumber (which includes a List of digits)Section C : program (main plus functions)The program must read in two big numbers from a txt file, add them and save the result as another big numberinstance. Then all three big numbers must be displayed on the screen using the format: 32456789112341234000 +65123443122134123445=97580232234475357445Notes:• You must get the two big numbers from a 2 lines text file, one big number in each line (get the sample code)• The “digits” in the List can be stored in any type you want: int, short int, char etc. You probablyrealise that any type is wasting a bit of memory, as you would only need 10 digits (even char can represent256 things).• Remember that when a sum of two nodes goes over the base size minus 1 (in base 10, any number that isbigger than 10­1, then you need to carry part of the result to the next node (grade school arithmetic).  • You need to read the big number as a string, but do not store the big number as a string, integer or float atany other stage of the program. You need to use your own function that converts a string representing adecimal number into a List. Every big number is represented by several base­10 digits, one digit per node.• Once the addition operation is implemented, make sure to carry out the following tests (shown here in oneline due to limited space, for the program follow the format described above):000000000000 + 000000000001 = 000000000001 ( a list with zeros)99 + 1 = 100                                                                                                 (check that the carry works)12345678901234567890 + 9876543210987654321 = 22222222112222222211 (check mix of digits)       99999999999999999999 + 99999999999999999999 = 199999999999999999998  (check that the carry goes all the way)99999999999999999999 + 1 = 100000000000000000000                     (same thing)100000000000000000000 + 1 =  100000000000000000001 (make sure a single digit is added)Use our virtual machine to test your submissions (host name vm000296). The input/output requirements areessential, please follow them carefully to avoid losing marks. Spaces matter and text is case sensitive.After you are satisfied with the performance of your code as tested in the virtual machine, submit a one source filecode on Stream by Friday 9 of April 2021. Your name and ID number must appear on top of the file ascomments. If you are working in a group, then all names and IDs must appear on top of the file as comments, butyou still need to submit individually in both the virtual machine and Stream. ...

April 17, 2022 · 1 min · jiezi

关于算法:COMP-8042-算法求解

Put your Name & Idon the reportAssignment2Due April 11th, 2021 11:45pmCOMP 8042All work should be done individually. (25 points) Modify the the CuckooHashTable class in the CuckooHashing.h file to implementa cuckoo hash, as described in class and chapter 5 of the textbook.(20 points) Complete the implementation of the GetMinDegIndex() and TopologicalSort()functions in the Graph.h file to implement a topological sort on a graph, as covered inclass and chapter 9 of the textbook.(55 points) Sudoku is a logic-based, combinatorial number-placement puzzle. In classicsudoku, the objective is to fill a 9 × 9 grid with digits so that each column, each row,and each of the nine 3 × 3 subgrids that compose the grid (also called “boxes”, “blocks”,or “regions”) contain all of the digits from 1 to 9. The puzzle setter provides a partiallycompleted grid, which for a well-posed puzzle has a single solution. [Wikipedia]The goal of this exercises is to develop a solver for a 9 × 9 sudoku puzzle using what youhave learned in chapter 10 of textbook (algorithm design techniques). For that matter,you are allowed to think of any algorithm design technique that fits this problem. Youwill need to explain what technique you have chosen and how it had helped you solvethe problem in the source code itself.Make sure to comment each change in your code clearly so it is easy to see what you havechanged.Also please do not change the return types of the functions and do not add any coutcalls in the 3 files you submit (you can add them while working on the solution but pleaseremove them before submission).Submit the completed Graph.h, CuckooHashing.h and Sudoku.h files (along with anyother necessary file that you had to implement and use) in a single ZIP file calledA00######.zip to D2L, where A00###### is your A00 number. You do notneed to prepare a report file. You also will not need to submit COMP8042A2Test.cppsince I will use another hidden test file which is different from this one. Make sure youtest all the corner cases for each problem.Important grading point: if your final submission takes more than 2 minutes to run,your submission will timeout and you will not receive a grade.

April 14, 2022 · 2 min · jiezi

关于算法:COMP9315-Signature-Indexes

COMP9315 21T1 - Assignment 2Signature IndexesDBMS ImplementationLast updated: Friday 2nd April 8:40pmMost recent changes are shown in red;older changes are shown in brown.A changelog is at the end of the file.Hopefully, this changelog will be very short.summary introduction commands data-types tasks testing submission changelogAimsThis assignment aims to give you an understanding ofhow database files are structured and accessedhow superimposed codeword (SIMC) signatures are implementedhow concatenated codeword (CATC) signatures are implementedhow partial-match retrieval searching is implemented using signaturesthe performance differences between different types of signaturesThe goal is to build a simple implementation of a signature indexed file, including applications to create such files, insert tuples intothem, and search for tuples based on partial-match retrieval queries.SummaryDeadline: 11:00am on Monday 19 AprilLate Penalty: 0.125 marks off the ceiling mark for each hour late (i.e. 3 marks/day)Marks: contributes 20 marks toward your total mark for this course.Groups: you must complete this assignment individuallySubmission:login to Course Web Site > Assignments > Assignment 2 > Submission > upload ass2.taror login to any CSE server > give cs9315 ass2 ass2.tarWorkspace: any machine wth a C compiler (preferably gcc); you do not need to use GriegThe ass2.tar file must contain the Makefile plus all of the .c and .h files that are needed to compile the create, insert andselect executables.You are not allowed to change the following files: create.c, insert.c, select.c, stats.c, dump.c, hash.h, hash.c, x1.c,x2.c, x3.c. We supply them when we/you test your files, so any changes you make will be overwritten. Do not include them in theass2.tar file. Details on how to build the ass2.tar file are given below.Note that the code in create.c, insert.c, select.c, stats.c, dump.c assumes that you honour the interfaces to the ADTsdefined in the *.[ch] file pairs. If you change the interfaces to data types like bits.h and page.h, then your program will be treatedas incorrect.Make sure that you read this assignment specification carefully and completely before starting work on the assignment.Questions which indicate that you haven't done this will simply get the response "Please read the spec".Note: this assignment does not require you to do anything with PostgreSQL.IntroductionSignatures are a style of indexing where (in its simplest form) each tuple is associated with a compact representation of its values (i.e.its signature). Signatures are used in the context of partial-match retrieval queries, and are particularly effective for large tuples.Selection is performed by first forming a query signature, based on the values of the known attributes, and then scanning the storedsignatures, matching them against the query signature, to identify potentially matching tuples. Only these tuples are read from thedata pages and compared against the query to check whether they are true matching tuples. Signature matching can result in "falsematches", where the query and tuple signatures match, but the tuple is not a valid result for the query. Note that signature matchingcan be quite efficient if the signatures are small, and efficient bit-wise operations are used to check for signature matches.The kind of signature matching described above uses one signature for each tuple (as in the diagram below). Other kinds ofsignatures exist, and one goal is to implement them and compare their performance to that of tuple signatures.2021/4/4 COMP9315 21T1 - Assignment 2https://cgi.cse.unsw.edu.au/~... 2/13In files such as the above, queries are evaluated as follows:Input: pmr query, Output: set of tuples satisfying the queryqrySig = makeSignature(query)Pages = {} // set of pages containing possibly matching tuplesforeach tupSig in SignatureFile {if (tupSig matches qrySig) {// potential matchPID = page of tuple associated with tupSigadd PID to Pages}}Results = {} // set of tuples satisfying queryforeach PID in Pages {buf = fetch data page PIDforeach tuple in buf {// check for real matchif (tuple satisfies query) add tuple to Results}}Note that above algorithm is an abstract view of what you must implement in your code. The function makeSignature() does notliterally exist, but you need to build analogues to it in your code.SignaturesWe will consider two methods for building signatures: superimposed codewords (SIMC), and concatenated codewords (CATC). Eachcodeword is formed using the value from one attribute.In SIMC signatures, all codewords and signatures are m bits wide, and each codeword has k bits set to 1. In CATC signatures,signatures are m bits wide, but codewords occupy approximately equal numbers of bits of the signature. Since there are m bits in thesignature and n attributes, each codeword is u = m/n bits long, except for the lower-order codeword (the one for the first attribute).This codeword is u bits long + m mod n bits, so that the total number of codeword bits is equal to m. The following diagram shows theparts of a concenated codeword signature:In this example, the signature is m=42 bits wide. Each codeword, except the lower-order one, is u=10 bits wide. The lower-ordercodeword has two extra bits to make up to 42. Each codeword has half of its bits set to 1; in CATC codewords, k = u/2. This is differentto SIMC codewords, where we need to determine k to ensure that roughly half of the bits in the signature are set to 1.2021/4/4 COMP9315 21T1 - Assignment 2https://cgi.cse.unsw.edu.au/~... 3/13The way we build CATC signatures is conceptually straightforward: form n codewords, each of which is m/n bits wide, andconcatenate them. In practice, we build n codewords, each of which is m bits wide, with the lower-order u bits set as the codeword,and then, shifted into the position that it would occupy in a concatenated codeword signature. The diagram below illustrates this:Note: the fact individual codewords are 8-bits long is not intended to suggest that codewords will always be whole bytes. Individualcodewords would be 6-bits if m = 24, or 9-bits if m = 36. And, as noted above, if m = 42, the codeword for attribute 1 would be 12-bitsand all other attributes would have 10-bit codewords.In subsequent discussions, we denote the length of tuple signatures as m, the length of page signatures as mp, and the length ofCATC codewords as u (remembering that all SIMC codewords have the same length as the signatures they produce).RelationsIn our system, a relation R is represented by five physical files:R.info containing global information such asthe number of attributes and size of each tuplethe number of data pages and number of tuplesthe base type of signatures (simc or catc)the sizes of the various kinds of signaturesthe number of signatures and signature pagesetc. etc. etc.The R.info file contains a copy of the RelnParams structure given in the reln.h file (see below).R.data containing data pages, where each data page containsa count of the number of tuples in the pagethe tuples (as comma-separated character sequences)Each data page has a capacity of c tuples. If there are n tuples then there will be b = ⌈n/c⌉ pages in the data file. All pagesexcept the last are full. Tuples are never deleted.R.tsig containing tuple signatures, where each page containsa count of the number of signatures in the pagethe signatures themselves (as bit strings)Each tuple signature is formed by incorporating the codewords from each attribute in the tuple. How this is done differs betweenSIMC and CATC, but the overall result is a single m-bit long signature. If there are n tuples in the relation, there will be n tuplesignatures, in bt pages. All tuple signature pages except the last are full.R.psig containing page signatures, where each page containsa count of the number of signatures in the pagethe signatures themselves (as bit strings)Page signatures are much larger than tuple signatures, and are formed by incorporating the codewords of all attribute values inall tuples in the page. How this is done differs between SIMC and CATC, but the result is a single mp-bit long signature There isone page signature for each page in the data file.R.bsig containing bit-sliced signatures, where each page containsa count of the number of signatures in the pagethe bit-slices themselves (as bit strings)Bit-slices give an alternate 90o-rotated view of page signatures. If there are b data pages, then each bit-slice is b-bits long. Ifpage signatures are pm bits long, then there are pm bit-slices.The following diagram gives a very simple example of the correspondence between page signatures and bit-slices:2021/4/4 COMP9315 21T1 - Assignment 2https://cgi.cse.unsw.edu.au/~... 4/13PagesThe different types of pages (tuple, signature, slice) were described above. Internally, all pages have a similar structure: a counterholding the number of items in the page, and the items themselves (tuples or signatures or slices). All of the items in a page are thesame size. The following diagram shows the structure of pages in the files of a signature-indexed relation:We have developed some infrastructure for you to use in implementing these signatur-indexed files. The code we give you is notcomplete; you can find the bits that need to be completed by searching for TODO in the code.How you implement the missing parts of the code is up to you, but your implementation must conform to the conventions used in ourcode. In particular, you should preserve the interfaces to the supplied modules (e.g. Bits, Reln, Query, Tuple) and ensure that yoursubmitted modules work with the supplied code in the create, insert and select commands.CommandsIn our context, signature-indexed relations are a collection of files that represent one relational table. These relations can bemanipulated by a number of supplied commands:create RelName SigType #tuples #attrs 1/pFCreates an empty relation called RelName with all tuples having #attrs attributes. SigType specifies how signatures should beformed, and can have one of two values: simc or catc. The #tuples parameter gives the expected number of tuples that arelikely to be inserted into a relation; this, in turn, determines parameters like the number of data pages and length of bit-slicedsuperimposed codewords. The 1/pF parameter gives the inverse of the false match probability; for example, a value of 1000corresponds to a false match probability of 1/1000 (0.001).These parameters are combined using the formulas given in lectures to determine how large tuple- and page-signatures are.Each bit-slice has a number of bits equal to the number of data pages, which is determined from #attrs, #tuples and the pagesize.This gives you storage for one relation/table, and is analogous to making an SQL data definition like:create table R ( a1 integer, a2 text, ... an text );Note that internally, attributes are indexed 0..n-1 rather than 1..n.2021/4/4 COMP9315 21T1 - Assignment 2https://cgi.cse.unsw.edu.au/~... 5/13The following example of using create makes a relation called abc where each tuple has 4 attributes and the indexing has afalse match probability of 1/100. The relation can hold up to 10000 tuples (it can actually hold more, but only the first 10000 willbe indexed via the bit-sliced signatures).$ ./create abc simc 10000 4 100insert RelNameReads tuples, one per line, from standard input and inserts them into the relation specified on the command line. Tuples all takethe form val1,val2,...,valn. The values can be any sequence of alpha-numeric characters and '-'. The characters ',' (fieldseparator) and '?' (query wildcard) are treated specially.Since all tuples need to be the same length, it is simplest to use gendata to generate them, and pipe the generated tuples intothe insert commandselect RelName QueryString IndexTypeTakes a "query tuple" on the command line, and finds all tuples in the data pages of the relation RelName that match the query.IndexType has a value of either t, p or p, indicating whether it should used the tuple, page, or bit-sliced signatures. Queriestake the form val1,val2,...,valn, where some of the vali can be '?' (without the quotes). Some examples, and their interpretationare given below. You can find more examples in the lecture slides and course notes.?,?,? # matches any tuple in the relation10,?,? # matches any tuple with 10 as the value of attribute 1?,abc,? # matches any tuple with abc as the value of attribute 210,abc,? # matches any tuple with 10 and abc as the values of attributes 1 and 2There are also a number of auxiliary commands to assist with building and examining relations:gendata #tuples #attributes [startID] [seed]Generates a specified number of n-attribute tuples in the appropriate format to insert into a created relation. All tuples are thesame format and look likeUniqID,RandomString,a3-Num,a4-Num,...,an-NumFor example, the following 4-attribute tuples could be generated by a call like gendata 1000 47654321,aTwentyCharLongStrng,a3-013,a4-0013456789,aTwentyChrLongString,a3-042,a4-128Of course, the above call to gendata will generate 1000 tuples like these.A tuple is represented by a sequence of comma-separated fields. The first field is a unique 7-digit number; the second field is arandom 20-char string (most likely unique in a given database); the remaining fields have a field identifier followed by a nonunique3-digit number. The size of each tuple is7+1 + 20+1 + (n-2)(6+1)-1 = 28 + 7(n-2) bytesThe -1 is because the last attribute doesn't have a trailing comma, and (n-2)*(6+1) assumes that it does.Note that tuples are limited to at most 9 attributes, which means that the maximum tuple size is a modest 77 bytes. (If you wish,you can work with larger tuples by tweaking the gendata and create commands and the newRelation() function, but thisnot required for the assignment).stats RelNamePrints information about the sizes of various aspects of the relation. Note that some aspects are static (e.g. the size of tuples)and some aspects are dynamic (e.g. the number of tuples). An example of using the stats command is given below.You can use it to help with debugging, by making sure that the files have been correctly built after the create command, andthat the files have been correctly updated after some tuples have been inserted.dump RelNameWrites all tuples from the relation RelName, one per line, to standard output. This is like an inverse of the insert command.Tuples are dumped in a form that could be used by insert to rebuild a database.You can use it to help with debugging, by making sure that the tuples are inserted correctly into the data file.Setting UpYou should make a working directory for this assignment and put the supplied code there, and start reading to make sure that youunderstand all of the data types and operations used in the system.2021/4/4 COMP9315 21T1 - Assignment 2https://cgi.cse.unsw.edu.au/~... 6/13$ mkdir your/ass2/directory$ cd your/ass2/directory$ unzip /web/cs9315/21T1/assignments/ass2/ass2.zipYou should see the following files in the directory:$ lsMakefile dump.c psig.c stats.c x1.cbits.c gendata.c psig.h tsig.c x2.cbits.h hash.c query.c tsig.h x3.cbsig.c hash.h query.h tuple.cbsig.h insert.c reln.c tuple.hcreate.c page.c reln.h util.cdefs.h page.h select.c util.hThe .h files define data types and function interfaces for the various types used in the system. The corresponding .c files contain theimplementation of the functions on the data type. The remaining .c files either provide the commands described above, or are testharnesses for individual types (x1.c, x2.c, x3.c). You can add additional testing files, bu there is no need to submit them.The above files give you a partial implementation of signature-based indexing. You need to complete the code so that it provides thefunctionality described above.You should be able to build the supplied partial implementation via the following:$ makegcc -std=gnu99 -Wall -Werror -g -c -o query.o query.cgcc -std=gnu99 -Wall -Werror -g -c -o page.o page.cgcc -std=gnu99 -Wall -Werror -g -c -o reln.o reln.cgcc -std=gnu99 -Wall -Werror -g -c -o tuple.o tuple.cgcc -std=gnu99 -Wall -Werror -g -c -o util.o util.cgcc -std=gnu99 -Wall -Werror -g -c -o tsig.o tsig.cgcc -std=gnu99 -Wall -Werror -g -c -o psig.o psig.cgcc -std=gnu99 -Wall -Werror -g -c -o bsig.o bsig.cgcc -std=gnu99 -Wall -Werror -g -c -o hash.o hash.cgcc -std=gnu99 -Wall -Werror -g -c -o bits.o bits.cgcc -std=gnu99 -Wall -Werror -g -c -o create.o create.cgcc -o create create.o query.o page.o reln.o tuple.o util.o tsig.o psig.o bsig.o hash.o bits.o -lmgcc -std=gnu99 -Wall -Werror -g -c -o insert.o insert.cgcc insert.o query.o page.o reln.o tuple.o util.o tsig.o psig.o bsig.o hash.o bits.o -o insertgcc -std=gnu99 -Wall -Werror -g -c -o select.o select.cgcc select.o query.o page.o reln.o tuple.o util.o tsig.o psig.o bsig.o hash.o bits.o -o selectgcc -std=gnu99 -Wall -Werror -g -c -o stats.o stats.cgcc stats.o query.o page.o reln.o tuple.o util.o tsig.o psig.o bsig.o hash.o bits.o -o statsgcc -std=gnu99 -Wall -Werror -g -c -o gendata.o gendata.cgcc -o gendata gendata.o util.o -lmgcc -std=gnu99 -Wall -Werror -g -c -o dump.o dump.cgcc dump.o query.o page.o reln.o tuple.o util.o tsig.o psig.o bsig.o hash.o bits.o -o dumpgcc -std=gnu99 -Wall -Werror -g -c -o x1.o x1.cgcc -o x1 x1.o query.o page.o reln.o tuple.o util.o tsig.o psig.o bsig.o hash.o bits.ogcc -std=gnu99 -Wall -Werror -g -c -o x2.o x2.cgcc -o x2 x2.o query.o page.o reln.o tuple.o util.o tsig.o psig.o bsig.o hash.o bits.ogcc -std=gnu99 -Wall -Werror -g -c -o x3.o x3.cgcc -o x3 x3.o query.o page.o reln.o tuple.o util.o tsig.o psig.o bsig.o hash.o bits.oThis should not produce any errors on the CSE servers; let me know ASAP if this is not the case.The gendata command should work completely without change. For example, the following command generates 5 tuples, each ofwhich has 4 attributes. Values in the first attribute are unique; values in the second attribute are highly likely to be unique. Note thatthe third and fourth attributes cycle through values at different rates, so they won't always have the same number.$ ./gendata 5 41000000,lrfkQyuQFjKXyQVNRTyS,a3-000,a4-000 -> 01000001,FrzrmzlYGFvEulQfpDBH,a3-001,a4-001 -> 01000002,lqDqrrCRwDnXeuOQqekl,a3-002,a4-002 -> 01000003,AITGDPHCSPIjtHbsFyfv,a3-003,a4-003 -> 01000004,lADzPBfudkKlrwqAOzMi,a3-004,a4-004 -> 0The create command itself is complete, but some of the functions it calls are not complete. It will allow you to make an emptyrelation, although without a complete bit-slice file (you add this as one of the assignment tasks). The stats command is complete andcan display information about a relation. Using these commands, you could do the following: use the create command to create anempty relation which can hold 4-attribute tuples and able to index up to 5000 tuples (using bit-slices), with a false match probability of1/1000. The stats command then displays the generated parameter values.$ ./create R simc 5000 4 1000$ ./stats RGlobal Info:Dynamic:2021/4/4 COMP9315 21T1 - Assignment 2https://cgi.cse.unsw.edu.au/~... 7/13 ...

April 14, 2022 · 29 min · jiezi

关于算法:LeetCodeGolang-75-颜色分类

题目: 给定一个蕴含红色、红色和蓝色、共 n 个元素的数组 nums ,原地对它们进行排序,使得雷同色彩的元素相邻,并依照红色、红色、蓝色顺序排列。 咱们应用整数 0、 1 和 2 别离示意红色、红色和蓝色。 必须在不应用库的sort函数的状况下解决这个问题。 示例: 输出: nums = [2,0,2,1,1,0]输入: [0,0,1,1,2,2]n == nums.length1 <= n <= 300nums[i] 为 0、1 或 2进阶: 你能够不应用代码库中的排序函数来解决这道题吗?你能想出一个仅应用常数空间的一趟扫描算法吗?先贴代码:(GO语言) func sortColors(nums []int) { left, i, right := 0, 0, len(nums)-1 // 不变量: left之前(不蕴含nums[left])均为0,right之后(不蕴含nums[right])均为2 // all in [0,left) == 0 // all in [left,i) == 1 // all in (right,len) == 2 for i <= right { if nums[i] == 0 { nums[left], nums[i] = nums[i], nums[left] left++ i++ } else if nums[i] == 2 { nums[right], nums[i] = nums[i], nums[right] right-- } else { i++ } }}咱们能够借鉴疾速排序的思维对问题进行剖析。首先设置循环不变量如下: ...

April 13, 2022 · 1 min · jiezi

关于算法:冷启动算法系列云音乐歌曲冷启动初探

图片起源:https://revolutionmotors.ca/b... 作者:原点、正超 一、冷启动问题介绍1 什么是冷启动在举荐零碎中,存在着成千上万的用户,也存在着成千上万的物品,举荐零碎的实质工作是为用户举荐其感兴趣的物品。在这外面,用户和物品都是不断更新的,如何给新用户举荐其感兴趣的物品,如何把新物品举荐给对其感兴趣的用户,就是举荐零碎的冷启动问题。 所以,举荐零碎的冷启动问题,次要包含用户冷启动、物品冷启动两大类。 2 冷启动的重要性用户的流动、不确定性是客观存在的事实,物品的上架、更新、下架也是客观存在的事实,在当今信息过载的时代,用户的不确定性体现的更加显著,如何给这些不确定性的用户举荐好的物品,是举荐零碎的次要职能之一。既然用户和物品都是一直产生的,在互联网上是一种常态,那么冷启动问题就会随同产品的整个生命周期。 互联网上的每个产品都在关注MAU、DAU,在这个流量为王的时代,一个产品是否存活、是否很好的存活,用户起了至关重要的作用。新用户对产品是否称心,是否留存,间接关系到一个产品的用户增长和收益增长。在商业畛域,顾客就是上帝,在互联网上,这句话仍然实用,并把这句话体现的酣畅淋漓。 另外,一个产品是否对其物品新陈代谢,是吸引用户的关键所在,从某种意义上来讲,物品的好坏间接决定了一个产品的好坏。 所以,如何解决好新用户、新物品问题,即冷启动问题,对于举荐零碎来说十分重要。 3 冷启动的办法依据用户和物品的不同特点,将采取不同的冷启动办法,接下来别离论述。 3.1 用户冷启动非个性化举荐 热门举荐是一个不错的办法,尽管没有个性化,然而很多人都有从众心理,依据二八准则,把热门物品举荐给新用户,是否满足80%用户的需要。比方热门电影、热门歌曲、热门短视频的举荐。 利用注册信息举荐 当初很多app均要求用户注册之后能力应用,所以能够依据这些注册信息进行个性化举荐,比方婚恋网站,能够给男士举荐美女,为美女举荐帅哥等。 另外还能够依据注册的年龄、地区、职业、学历、支出等信息造成人物画像,而后依据这些人物画像进行个性化举荐。 依据趣味点进行举荐 当初有些app要求用户在应用之前抉择本人的趣味点,以便举荐零碎可能很好进行的举荐,比方新闻类app要求用户抉择感兴趣的标签,游戏类app要求用户抉择感兴趣的游戏品种,音乐类app要求用户抉择感兴趣的曲风等等。 基于大量行为进行举荐 有些用户活跃度比拟低,行为也比拟少,然而能够依据这些少有的用户行为进行个性化举荐,比方用户看过某个短视频,则能够依据这个看过的短视频进行举荐。 采纳试探的办法进行举荐 探测利用的办法是举荐零碎罕用的办法之一。首先随机给用户举荐几个物品,而后依据用户的反馈获取用户的趣味。这种办法次要实用于较少耗费用户工夫的app上,能够疾速的定位用户的趣味,比方新闻类,短视频类app。 依据趣味迁徙策略进行举荐 有些公司有比拟成熟的app,或者一个app上有多类举荐,则能够借助其余的用户趣味进行迁徙举荐,比方有些app即举荐音乐,也举荐短视频,则能够依据用户对音乐的趣味点举荐相干的短视频。 3.2 物品冷启动基于side information进行举荐 物品人造具备某些属性信息,比方商品的商家、分类、价格等,再比方音乐的语种、格调、曲风、乐器等,举荐零碎则能够依据这些根本信息举荐给相应感兴趣的用户。 基于大量行为进行举荐 有些物品具备大量的行为信息,则能够依据这些大量的行为信息进行个性化举荐,比方某个用户残缺播放了某个短视频,则能够把这个短视频举荐给类似的用户。 采纳试探的办法进行举荐 探测利用的办法同样也实用于物品的冷启动,首先把一个冷启动的物品随机分发给一批用户,依据用户的反馈举荐给相应感兴趣的用户。 4 冷启动办法的评估指标评估一个冷启动办法的好坏,次要思考以下三点: 覆盖度 第一个须要思考的评估指标是覆盖度,覆盖度的大小间接决定了线上成果的好坏,如果覆盖度过低,线上覆盖范围内的成果再好,整体的成果也会大打折扣。对于后面叙述的办法,基于side information的物品冷启动办法和非个性化的用户冷启动办法覆盖度均很高,简直能够笼罩100%。而基于大量行为的用户冷启动办法和依据趣味迁徙的用户冷启动办法要求就绝对严苛一点,其覆盖度就没那么高了。 准确度 第二个须要思考的评估指标是准确度,比方依据趣味迁徙的用户冷启动办法因为有较多的用户信息,其举荐的准确度就绝对很高,而如基于side information的办法,尽管覆盖度很高,但其准确度就不那么高了。 可解释 举荐零碎中的举荐可解释性,无论对于用户还是对于举荐零碎来说都十分重要,当初很多举荐零碎也越来越器重举荐的可解释性。同样,对于冷启动问题,好的可解释性,也有利于晋升举荐的准确度。比方依据趣味点进行的用户冷启动办法,就能够很好的为用户解释举荐的物品。 从上述几个评估指标来看,没有一个办法占据所有的长处,所以实际中的举荐零碎,也是多种冷启动办法并存,以达到多种办法长处互补的成果。 上述咱们简要介绍了冷启动问题的定义,解决两类冷启动问题的个别办法,以及评估冷启动办法的好坏规范。接下来将介绍在在音乐举荐零碎中,咱们对于解决歌曲冷启动问题的实际计划。 二、云音乐歌曲冷启动实际1 业务背景目前入驻网易云音乐的独立音乐人超40万,独立音乐人每天有大量优良的新作品公布,如何疾速精准的将这些优良的新作品散发到指标听众的播放列表中,实现歌曲的冷启动进而进入歌曲成长体系是网易云音乐的举荐零碎要解决的一个重要问题。 冷启动歌曲因为其特殊性,很难间接套用针对非冷启动歌曲建设的举荐模型,因而须要针对冷启动歌曲建设一套无效的歌曲举荐模型。 2 歌曲冷启动面临的问题2.1 歌曲特色缺失冷启动歌曲面临的基本难题是短少用户对歌曲的历史交互数据,从而导致特色和样本的缺失。 短少歌曲统计特色 包含歌曲的各类行为,如播放,下载,珍藏,分享等行为的次数和转化率特色,这类特色通常是歌曲召回和排序模型中重要组成部分,冷启动歌曲因为不蕴含这些特色无奈间接应用现有模型。 短少样本来训练冷启动歌曲的embedding向量 举荐零碎中召回和排序模型中的歌曲embedding通常是端到端训练失去的,而冷启动歌曲不存在于词表内,无奈间接失去对应的歌曲embedding示意。 2.2 业务可解释性歌曲冷启动零碎的终极目标还是服务于业务,除了实现将冷启动歌曲胜利散发进来这一指标外,还心愿冷启动的过程尽量具备可解释性。一个具备可解释性的冷启动零碎,将更好的帮忙业务去解答诸如什么样的歌曲更容易冷启动胜利这类问题,从而为后续冷启动歌曲提供成功经验。 3 解决方案解决歌曲冷启动的核心思想还是尽量减少可用数据,利用最宽泛的办法是利用side information的冷启动办法,这类办法通常实现简略且,对数据特色的要求低,同时具备良好的业务可解释性。 上面将从冷启动歌曲召回和冷启动歌曲的排序两个角度别离介绍基于内容标签的冷启动计划。 3.1 冷启动召回因为无奈收集到冷启动歌曲的用户交互行为记录,通常无奈对冷启动歌曲应用惯例的i2i或向量进行召回,但能够退而求其次,利用冷启动歌曲的内容标签进行召回,召回的过程如下图 第一步,召回局部将冷启动歌曲对应的内容标签做预处理归一化,包含仅保留主曲风,小语种归一,专辑艺人和演唱艺人对立等。 第二步,冷启动的歌曲依照内容标签进行归类,每一类外部依照带工夫衰减的转化率给予召回候选分,候选分的计算形式为: ...

April 13, 2022 · 1 min · jiezi

关于算法:CMDA-并行计算

CMDA 3634 SP2021 Parallelizing the Wave Equation with OpenMP Project 03Project 03: Parallelizing the Wave Equation with OpenMPVersion: Current as of: 2021-03-22 14:30:01Due:– Preparation: 2021-03-30 23:59:00– Coding & Analysis: 2021-04-09 23:59:00 (24 hour grace period applies to this due date.)Points: 100Deliverables:– Preparation work as a PDF, typeset with LaTeX, through Canvas.– All code through code.vt.edu, including all LaTeX source.– Project report and analysis as a PDF, typeset with LaTeX, through Canvas.Collaboration:– This assignment is to be completed by yourself.– For conceptual aspects of this assignment, you may seek assistance from your classmates. In your submissionyou must indicate from whom you received assistance.– You may not assist or seek assistance from your classmates on matters of programming, code design, orderivations.– If you are unsure, ask course staff.Honor Code: By submitting this assignment, you acknowledge that you have adhered to the Virginia TechHonor Code and attest to the following:I have neither given nor received unauthorized assistance on this assignment. The work I am presentingis ultimately my own.References The Ising model:– Section 2.6 of “Parallel Algorithms in Computational Science” by Heermann and Burkitt✯ Available for free on VT Library site https://link-springer-com.ezp...book/10.1007%2F978-3-642-76265-9 OpenMP:– General tutorial https://computing.llnl.gov/tu...– Reference Book https://mitpress.mit.edu/book...– Timing https://gcc.gnu.org/onlinedoc...– General discussion https://pages.tacc.utexas.edu... (14-26)1CMDA 3634 SP2021 Parallelizing the Wave Equation with OpenMP Project 03Code Environment All necessary code must be put in your class Git repository. Use the standard directory structure from previous projects. All experiments will be run on TinkerCliffs. You may develop in your VMs or directly on TinkerCliffs. Be an ARC good citizen. Use the resources you request. Use your scratch space for large data.A code solution to Project 2 has been provided in the materials repository. Any scripts which are necessary to recreate your data must be in the data directory.DO NOT INCLUDE VIDEO FILES OR LARGE NUMBERS OF IMAGE FILES OR IN YOURGIT REPOSITORY! Your tex source for project preparation and final report go in the report directory. Remember: commit early, commit often, commit after minor changes, commit after major changes. Pushyour code to code.vt.edu frequently. You must separate headers, source, and applications. You must include a makefile. You must include a README explaining how to build and run your programs.2CMDA 3634 SP2021 Parallelizing the Wave Equation with OpenMP Project 03RequirementsPreparation (20%) ...

April 13, 2022 · 9 min · jiezi

关于算法:详细解读推荐算法架构召回

导语 | 召回模块面对几百上千万的举荐池物料规模,候选集非常宏大。因为后续有排序模块作为保障,故不须要非常精确,但必须保障不要脱漏和低提早。目前次要通过多路召回来实现,一方面各路能够并行计算,另一方面舍短取长。召回通路次要有非个性化和个性化两大类。一、举荐算法总体架构(一)举荐算法意义随着互联网近十年来的大力发展,用户规模和内容规模均出现迅猛发展。用户侧日活过亿早已不是什么新鲜事,内容侧因为UGC生产方式的遍及,领有几十亿内容库的平台也不足为奇。如何让海量用户在海量内容中找到本人喜爱的,以及如何让海量内容被海量用户精准生产,始终以来都是每个公司非常外围的问题。在这个背景下,搜寻零碎和举荐零碎应运而生。搜寻零碎次要解决用户寻找感兴趣的内容,偏主动型生产。举荐零碎则次要解决内容推送给适合用户,偏被动型生产。二者一边牵引用户,一边牵引内容,是实现用户与内容匹配的两头媒介。举荐零碎在每个公司都是非常外围的位置,其意义次要有: 用户侧,为用户及时精准的推送感兴趣的个性化内容,并一直发现和造就用户的潜在趣味,满足用户生产需要,晋升用户体验,从而晋升用户活跃度和留存。内容侧,作为流量散发平台,对生产者(如UGC作者、电商卖家等)有正向反馈刺激能力,通过搀扶有后劲的中小生产者,能够促成整体内容生态的凋敝倒退。平台侧,举荐系统对内容散发的流量和效率都至关重要。通过晋升用户体验,可晋升用户留存,从而晋升日活。通过晋升用户转化和流量效率,可晋升电商平台订单量和内容平台用户人均时长等外围指标。通过晋升用户生产深度,可晋升平台整体流量,为商业化指标(如广告)打下基础,晋升ARPU(每用户平均收入)等外围指标。举荐零碎与公司很多外围指标非亲非故,有极大的牵引和推动作用,意义非常重要。(二)举荐算法根本模块以后基于算力和存储的思考,还没方法实现整体端到端的举荐。一般来说举荐零碎分为以下几个次要模块: 举荐池:个别会基于一些规定,从整体物料库(可能会有几十亿甚至百亿规模)中抉择一些item进入举荐池,再通过汰换规定定期进行更新。比方电商平台能够基于近30天成交量、商品在所属类目价格档位等构建举荐池,短视频平台能够基于公布工夫、近7天播放量等构建举荐池。举荐池个别定期离线构建好就能够了。召回:从举荐池中选取几千上万的item,送给后续的排序模块。因为召回面对的候选集非常大,且个别须要在线输入,故召回模块必须轻量疾速低提早。因为后续还有排序模块作为保障,召回不须要非常精确,但不可脱漏(特地是搜寻零碎中的召回模块)。目前基本上采纳多路召回解决范式,分为非个性化召回和个性化召回。个性化召回又有content-based、behavior-based、feature-based等多种形式。粗排:获取召回模块后果,从中抉择上千item送给精排模块。粗排能够了解为精排前的一轮过滤机制,加重精排模块的压力。粗排介于召回和精排之间,要同时兼顾精准性和低提早。个别模型也不能过于简单。精排:获取粗排模块的后果,对候选集进行打分和排序。精排须要在最大时延容许的状况下,保障打分的精准性,是整个零碎中至关重要的一个模块,也是最简单,钻研最多的一个模块。精排零碎构建个别须要波及样本、特色、模型三局部。重排:获取精排的排序后果,基于经营策略、多样性、context上下文等,从新进行一个微调。比方三八节对美妆类目商品提权,类目打散、同图打散、同卖家打散等保障用户体验措施。重排中规定比拟多,但目前也有不少基于模型来晋升重排成果的计划。混排:多个业务线都想在Feeds流中获取曝光,则须要对它们的后果进行混排。比方举荐流中插入广告、视频流中插入图文和banner等。能够基于规定策略(如广告定坑)和强化学习来实现。举荐零碎蕴含模块很多,论文也是层出不穷,相对来说还是十分复杂的。咱们把握举荐零碎算法最重要的还是要梳理分明整个算法架构和大图,晓得每个模块是怎么做的,有哪些局限性和待解决问题,能够通过什么伎俩优化等。并通过算法架构大图将各个模块分割起来,死记硬背。从而不至于深陷某个细节,不能自拔。看论文的时候也应该先理解它是为了解决什么问题,之前曾经有哪些解决方案,再去理解它怎么解决的,以及相比其余计划有什么改良和优化点。本文次要解说举荐算法架构大图,帮忙读者把握全局,起到提纲挈领作用。 二、召回(一)多路召回召回模块面对几百上千万的举荐池物料规模,候选集非常宏大。因为后续有排序模块作为保障,故不须要非常精确,但必须保障不要脱漏和低提早。目前次要通过多路召回来实现,一方面各路能够并行计算,另一方面舍短取长。召回通路次要有非个性化和个性化两大类。 非个性化召回非个性化召回与用户无关,能够离线构建好,次要有: 热门召回:比方近7天播放vv比拟高的短视频,能够联合CTR和工夫衰减做平滑,并过滤掉人均时长偏低的疑似骗点击item。还能够抉择用户点赞多、好评多的item等。这部分次要基于规定实现即可。因为热门item容易导致马太效应,如果热门召回占整体通路比例过大,能够思考做肯定打压。高效率召回:比方高CTR、高完播率、高人均时长的短视频,这类item效率较高,但可能上架不久,历史播放vv不多,好评也须要工夫积攒,有可能不在热门召回内。经营策略召回:例如经营构建的各个类目标榜单、片单,最新上架item等。个性化召回个性化召回与用户相干,千人千面,依据构建形式次要有: content-based:基于内容,能够通过用户标签,比方新注册时填写的喜爱的导演、演员、类目等信息,也能够通过用户历史行为作为trigger,来选取与之内容类似的item。次要有:标签召回:比方演员、导演、item标签tag、类目等。 常识图谱。 多模态:比方题目语义类似的item,首图类似的item,视频了解类似的item等。 个别先离线构建好倒排索引,在线应用时通过用户标签或者历史行为item作为trigger,取出对应候选即可。基于内容来构建倒排索引,不须要item有丰盛的行为,对冷启item比拟敌对。 behavior-based:基于行为,次要是userCF和itemCF两种,都是通过行为来找类似,须要user或者item有比拟丰盛的行为。userCF先找到与user行为类似的user,选取他们行为序列中的item作为候选。itemCF则找到每个item被行为类似的其余item,构建倒排索引。二者应用的时候有什么区别呢,集体认为次要有: userCF须要user行为较为丰盛,itemCF则须要item被行为比拟丰盛。所以对于新闻类等item实时性要求高的场景,因为冷启item很多,所以能够思考userCF。 一般来说用户量要远大于举荐池的item数量,也就是user向量远多于item向量,故userCF的存储压力和向量检索压力都要大于itemCF。同时也导致user向量远比item向量要稠密,类似度计算准确性不如itemCF。 协同过滤有哪些毛病呢? 因为大部分user只对很少一部分item有行为,导致user与item的行为矩阵非常稠密,甚至有些user基本没有任何行为,影响了向量类似度计算准确性。 user和item数量都很大,行为矩阵存储压力很大。 矩阵稠密也带来一个问题,就是头部热门item容易与大多数item均类似,产生哈利波特问题,导致极其重大的马太效应。 那怎么解决这些问题呢,矩阵合成MF应运而生。它将user与item的行为矩阵,合成为user和item两个矩阵,MN的矩阵转化为MK和K*N的两个矩阵,user矩阵每一行就是一个K维user向量,item矩阵每一列就是一个K维item向量。因为不像CF中向量是基于行为产生的,有比拟明确的含意,故MF中的向量也叫user隐向量和item隐向量。通过MF,能够解决CF向量过于稠密的问题,同时因为K远小于M和N,使得高维稠密向量实现了低维浓密化,大大减小了存储压力。 MF矩阵合成有哪些实现办法呢,能够基于SVD和梯度降落来计算。因为SVD有肯定限度条件,基于梯度降落的比拟多。因而MF也被称为model-based CF。 MF实质上依然是基于用户行为来构建的,没有充分利用user和item的各种feature,比方用户性别年龄,导致有大量的信息失落。LR和FM就应运而生。 feature-based:基于特色,比方user的年龄、性别、机型、地理位置、行为序列等,item的上架工夫、视频时长、历史统计信息等。基于特色的召回构建形式,信息利用比拟充沛,成果个别也比拟好,对冷启也比拟敌对,是最近几年来的钻研重点。又次要分为:线性模型:比方FM、FFM等,就不具体开展了。深度模型:比方基于DNN的DSSM双塔(EBR、MOBIUS)、youtubeDNN(又叫deepMatch)。基于用户序列的Mind、SDM、CMDM、BERT4Rec。基于GNN的Node2Vec、EGES、PingSage等。这一块就不一一开展,是很大的话题。线上应用时,能够有两种形式: 向量检索:通过生成的user embedding,采纳近邻搜寻,寻找与之类似的item embedding,从而找到具体item。检索形式有哈希分桶、HNSW等多种办法。i2i倒排索引:通过item embedding,找到与本item类似的其余item,离线构建i2i索引。线上应用时,通过用户历史行为中的item作为trigger,从倒排索引中找到候选集。social-network:通过好友点赞、关注关系、通信录关系等,找到社交链上的其他人,而后通过他们来召回item。准则就是好友喜爱的item,大概率也会喜爱,物以类聚人以群分嘛。(二)召回优化多路召回的各通路次要就是这些,那召回中次要有哪些问题呢,集体认为次要有: 负样本构建问题:召回是样本的艺术,排序是特色的艺术,这句话说的很对。召回正样本能够抉择曝光点击的样本,但负样本怎么选呢?抉择曝光未点击的样本吗,必定不行。 曝光未点击样本,能从已有召回、粗排、精排模块中竞争进去,阐明其item品质和相关性都还是不错的,作为召回负样本必定不适合。 SSB问题,召回面向的整体举荐池,但能失去曝光的item只是其中很小的子集,这样构建负样本会导致非常重大的SSB(sample selection bias)问题,使得模型重大偏离理论。 基于这个问题,咱们能够在举荐池中随机抉择item作为负样本,但又会有一个问题,随机抉择的item,绝对于正样本来说,个别很容易辨别,所以须要有hard negative sample来刺激和晋升召回模型成果。构建hard negative sample,是目前召回钻研中比拟多的一个方向,次要有: 借助模型:比方Facebook EBR选取上一版召回打分处于两头地位的item,排名101~500左右的item,它们不是很靠前,能够看做负样本,也不是吊车尾,与正样本有肯定相关性,辨别起来有肯定难度。EBR、Mobius、PinSage等都有相似思路。这种办法很难定义分明到底什么样的item是有点类似,但又不那么类似,可能须要屡次尝试。业务规定:比方抉择同类目、同价格档位等规定的item,能够参考Airbnb论文的做法。in-batch:batch内其余用户正样本,作为本用户的负样本。被动学习:召回后果进行人工审核,bad case作为负样本。个别会将hard negative与easy negative,依照肯定比例,比方1: 100,同时作为召回负样本。个别easy negative还是占绝大多数的。SSB问题:召回面向的是整体举荐池,item数量微小,故须要做肯定的负采样,有比拟大的SSB样本抉择偏差问题。故须要让抉择进去的负样本,尽可能的能代表整体举荐池,从而晋升模型泛化能力。次要问题依然是负采样,特地是hard negative sample的问题。阿里ESAM尝试用迁徙学习,通过loss正则使得曝光样本上的模型,能够利用在非曝光item上,从而优化SSB问题。其余更多的办法则思考从负样本采样登程,联合easy negative和hard negative。比方EBR、Airbnb Embedding、Mobius、PinSage等,都有hard negative的优化思路。指标不统一问题:目前的召回指标依然是找类似,不论是基于内容的,还是基于行为和特色的。但精排和最终理论业务指标依然看的是转化,类似不代表就能失去很好的转化,比方极其状况,全副召回与用户最近播放类似的短视频,显然最终整体的转化是不高的。百度Mobius在召回阶段引入CPM,将业务指标作为向量检索后的截断,可优化相关性高但转化率低的item。阿里的TDM则通过最大堆树重构了ANN召回检索过程,大大降低了检索计算量,从而可包容简单模型,比方DIN,使得召回与排序在结构上能够对齐(当然样本上会差别很大),也算是对此类问题有肯定的优化。竞争问题:各召回通路最终会做merge去重,各通道之间反复度过高则没有意义,特地是新增召回通路,须要对历史通路有较好的补充增益作用,各召回通路之间存在肯定的重叠和竞争问题。同时,召回通路的候选item,不肯定能在精排中竞争透出,特地是历史召回少的item,因为其曝光样本很少,精排中打分不高,所以不肯定能透出。召回和精排的相爱相杀,还须要通过全链路优化来缓解。作者简介谢杨易 腾讯利用算法研究员。

April 13, 2022 · 1 min · jiezi

关于算法:分析CSCI1200算法

CSCI-1200 Data Structures — Spring 2021Homework 8 — Simplified B+ TreesIn this assignment we will be implementing a partial and modified version of B+ trees. As a result, onlineresources may not use the same terminology or may describe implementation details that are not relevantto our HW8 implementation. You should read the entire assignment before beginning your work. Youshould also make sure you understand the basic concepts discussed at the end of Lecture 17. It is highlyrecommended that before you begin coding, you practice constructing a couple of trees (using b = 3, b = 4)by hand and then checking your work with this online visualization tool:https://www.cs.usfca.edu/~gal...The bulk of the assignment will focus on proper insertion in a B+ tree, which is described on the next page.Implementation DetailsIn this assignment we will assume that the keys we insert are unique, i.e. for a particular B+ tree, we willnever call insert(3); insert(3);. We will also assume that b > 2 in all tests. You will find it beneficial toborrow code from our partial implementation of the ds set class. We will not implement iterators, so find()should instead return a pointer to a BPlusTreeNode. If the tree is empty, this will be a NULL pointer,otherwise this will be the leaf node where the key is/would be. The print functions only need to work withtypes/classes that already work with operator<<, and PrintSideways makes its split at b/2 children. Youmust implement all of the functions required to make hw8 test.cpp compile and run correctly.HintsUnless the tree is empty, find() will always return a pointer to a node in the tree. You do not need to storeNULL pointers. In the middle of an insertion, it is okay to let your nodes hold too many keys or children aslong as you fix it before the insertion and splits are finished. Since this is a tree, some things will be more“natural” to do with recursion.SubmissionWhile you are encouraged to write your own test functions, we will not be looking at them. You only needto submit a README.txt and BPlusTree.h file. Dr. Memory will be used on this assignment and you willbe penalized for leaks and memory errors in your solution. If you get at least 6 points between test cases4, 5, and 6 by the end of Wednesday, you can submit on Friday without being charged a late day. Pleaseremember that all submissions are still due by the end of Saturday.Extra CreditWith the way print BFS() is currently expected to output, it is not possible to tell which nodes are childrenof a particular node. Assuming that each key is short (i.e. no more than 2 characters wide), implement afunction, print BFS pretty() that still uses a BFS ordering and a vertical layout like print BFS(), but thathas appropriate spacing so the structure of the tree is apparent. There are several possible ways to handlethis, so you may choose whatever design you think is reasonable. Make sure to leave a note in your READMEif you implement the extra credit.Starting from an empty tree, with b = 3 in this example we dothe following:(1) insert(b); creates a root node with “b” in it.(2) insert(a); adds “a” to the root node(3) insert(c); adds “c” to the root node, which makes it toofull.(4) The root node splits into two nodes, one with “a” and onewith “b”, and “c”. A new parent is created, with the firstvalue from the new right-hand node (“b”) placed in it. Anode split should create two new nodes and put half of theorginal node’s keys into each of the new nodes. Wheneverthere are an odd number of keys in a node that needs to besplit, the “extra” key will go to the right-hand node.(1) This example starts with one possibletree with b = 3 and the keys “a”-“f”.(2) insert(ant) causes a leaf to becometoo full. “ant” comes after “a”according to string’s operator<.(3) The leaf containing “a”, “ant”, “b” issplit into two nodes, but this makesthe parent too full.(4) We split the overfull node, creatinga new parent which is now the root.Since this split a non-leaf node, we donot copy the middle key of “a,c,e” intothe new left/right nodes - “c” onlyappears in the newly created root. Asplit at a leaf always has the potentialto cause more splits all the way up tosplitting the root. ...

April 11, 2022 · 4 min · jiezi

关于算法:笔试算法题总结

HashMaphashmap的原理这里不再讲述,不晓得的小伙伴能够看这篇文章。Hash与HashMaphashmap数据结构的引入能帮忙咱们将O(n)的工夫复杂度升高为O(1)的工夫复杂度,代价是应用了O(n)的空间复杂度。这么一看如同功过参半。然而如果咱们原来的工夫复杂度是O(n^2),应用了hashmap后工夫复杂度变为o(n),而只是空间复杂度变为O(n),那么还是很划算的。力扣第一题,两数之和:如果咱们用单纯的二维遍历做的话 public int[] twoSum(int[] nums, int target) { int n = nums.length; for (int i = 0; i < n; i++) { for (int j = i + 1; j < n; j++) { if (nums[i] + nums[j] == target) { return new int[]{i, j}; } } } return new int[0];}第一种办法工夫高起因是,对于每一个第一层遍历的i,咱们都须要再次遍历数组找到target - i。如果咱们用hashmap将数组元素值及对应下标存入hashmap里,咱们就能够间接取得target - i对应下标值,而不须要第二次遍历。 public int[] twoSum(int[] nums, int target) { Map<Integer, Integer> hashtable = new HashMap<Integer, Integer>(); for (int i = 0; i < nums.length; i++) { if (hashtable.containsKey(target - nums[i])) { return new int[]{hashtable.get(target - nums[i]), i}; } hashtable.put(nums[i], i); } return new int[0];}遇到的理论题目是三数之和,给一个数组和一个目标值,在这个数组中找到三个数相加为目标值,如果找失去返回true,如果找不到返回false。三数之和就能够应用hashmap将三层循环降为两层循环,其余跟两数之和类似。 ...

April 11, 2022 · 1 min · jiezi

关于算法:哈啰出行精准营销框架及算法实践

导读:本次跟大家分享的是哈啰出行精准营销场景的算法与实际,包含以下几大部分: 精准营销的背景和价值精准营销框架精准营销算法能力将来方向精准营销的背景和价值首先和大家分享一下精准营销背景和价值。 精准营销的业务背景 哈啰由出行逐步迈向服务电商,除了两轮以外,还包含本地生存、酒店和电动车等多种业务。须要通过精准营销去实现各个新业务的用户增长。咱们的业务指标是通过用户全生命周期精准营销和精细化经营,去晋升用户增长的北极星指标。 精准营销的场景和流程 依照用户生命周期来划分,精准营销的场景次要分为三个方面: 拉新:次要是充沛去开掘一些潜在用户。沉闷:次要是为了留存和促活目前曾经有的存量用户。挽留:次要是通过一些精准营销的形式去召回一些散失用户,最终去晋升各个新业务的DAU。流程包含三大部分: 首先是who,也就是指标群组;接下来what,投什么内容;之后是how,以什么样的形式去投。最初进行精准营销。 精准营销业务痛点 精准营销业务次要蕴含以下四个痛点: 寻找精准人群的效率低:次要体现在是要凭经营人工大量的去测试。ROI比拟低:次要体现在营销老本很高,然而理论的收益却是很低。算法覆盖面低,接入效率较低:次要体现在仅可能笼罩局部人群的局部场景,定制化是很重大的。未造成体系化:次要体现在不足营销后的剖析优化,没有造成一个精准营销的闭环。精准营销我的项目价值 精准营销的我的项目价值次要体现在以下两个方面: 提效:次要体现在两点,第一点是晋升精准营销的效率,次要体现在经营能够间接对算法的精准人群包去进行营销,不必去做后期的大量测试。第二点,次要是晋升转化率,次要是通过精准营销人群模型的搭建,去晋升业务点击率,预计晋升CTR的幅度是20%。增收:通过精精准营销能够晋升业务的订单量,预计能够晋升20%。精准营销框架在搭建精准营销框架之前,须要深刻理解业务,找到哈啰精准营销场景的特点,并找到对应的解法。 精准营销场景特点与解法 通过后期的数据分析和调研发现,目前哈啰精准营销的场景特点和咱们针对性的解法次要有以下三方面: 精准营销场景泛滥,定制化反复开发:算法从模块化逐渐走向组件化,以及最终走向的平台化。高质量人群须要持续扩量:采纳目前业界比拟先进的半监督框架Pu-Learning。种子用户过少,不足以算法建模:去通过一些无监督的学习办法,进行智能放量。精准营销业务框架 精准营销的业务框架次要分为以下三大模块: 特色解决:次要分为离线特色解决和实时特色解决,离线的特色解决次要是通过埋点数据计算出的离线表提前存储到机器本地。实时特色次要是通过Flink去计算一些实时特色,将其存储在Redis中。精准营销:包含算法、用户剖析平台和投放平台模块。首先是算法,算法次要分为两个点,第一个就是行业包,所谓的行业包就是在Pu-Learning框架下的LookAlike建模办法。第二点是智能放量,通过Graph Embedding无监督学习的形式去失去用户之间的embedding,之后通过向量引擎去计算用户和用户之间的类似度,失去每个用户的top n类似用户。其次是用户剖析平台模块,经营首先创立由原子标签组成的种子人群群组,而后抉择是否进行智能放量,如若抉择,算法将会返回放量后的指标群组。再次是投放平台,当经营在进行工作投放时,首先须要创立工作,而后抉择工作计划,这个计划就是用户剖析平台返回的指标群组,之后进行工作的下发,以及一些ab成果的回收。算法场景:次要是业务拉新、业务促活和业务散失。流动方面,次要蕴含资源位的投放,Banner,站内信,或者是push。精准营销技术框架 接下来站在技术的视角去看精准营销的框架。 经营在创立营销工作时,首先抉择工作计划,这个计划背地是用户的指标群组,此群组由两个局部形成: 由行业包造成的群组:通过一些离线的样本和特色,离线训练模型,将模型部署在DataMan,最初造成一个离线的预测工作。此工作将输入的数据存储在hive表,尔后将表数据存储在ES中,造成标签,最终形成指标群组。智能放量服务放量后的指标群组:业务前端收集行为埋点数据,将其存储在kafka中,而后通过flink实时计算,将计算出的实时特色,存储在Redis,当智能放量服务应用时,间接从特色平台取数据。精准营销算法能力Pu-Learning框架下的LookAlike建模办法 什么是lookAlike?它不是一种特定的算法,而是一种思维,次要是依据种子用户去寻找类似的拓展人群。 怎么做lookAlike?次要分为两个办法:利用机器学习模型进行隐式人群拓展;利用社交图构造的类似人群拓展。其中机器学习模型次要分为有监督、半监督和无监督三类,在有监督学习,分类过程中,所有的训练数据都是有标签的;在半监督学习中,训练数据的一部分是有标签的,另一部分没有标签,并且没标签数据的量经常远大于有标签的量。而在无监督学习中是没有标签的。 在做lookAlike的时候遇到的挑战,以及对应解决方案: 新业务用户特色稠密:稠密次要体现在,哈啰目前以两轮流量给新业务导流,并且两轮的用户群体基数是比拟大的。然而新业务在起量时,用户往往是比拟少的,所以会导致用户行为特色的稠密。对此,采纳的解决方案是应用两轮特色。可用特色较少:对此解决方案是剖析各业务间的共性,失去业务之间的穿插特色。高质量人群须要持续扩量:对此采纳的计划是采纳的是业界目前比拟先进的Pu-Learning框架。 面对多个业务多阶段倒退的时候,算法迭代分为以下两个阶段: 采纳GBM有监督学习模型。正样本是新业务实在转化的种子用户,负样本是从一些未转化的用户外面随机去选取的一部分样本。因为各个业务间存在差别,业务成绩晋升30%-130% 不等。采纳TSA半监督模型。此模型次要分为两步,第一步在未标记样本中辨认出一些牢靠的负样本,第二步在正样本和第一步取得的牢靠负样本上进行有监督的学习。 传统TSA建模流程如下: 第一步:将正样本混入未标注样本中(特务样本),将他们对立视为负样本进行第一次的模型训练,训练完之后,次要是通过抉择正样本的分数范畴去抉择出一些牢靠的负样本。具体如上图。第二步:在正样本和第一步失去的牢靠负样本上进行监督学习。 优化的TAS建模流程如下: 对传统TSA的第一步,采纳EM模型。其中EM中为特务样本分布的最小值,为算法离线指标recall很高时的概率。对正样本进行数据加强,即对正样本进行裁减,裁减的办法是将[2,1]间的样本也视为真正样本,而[0,1]间的样本视为真负样本,输出到DeepFM模型进行训练。 优化TSA的业务成绩:在保障就是ROI不升高的状况下,人群数量扩 3-10 倍。 Graph Embedding在精准营销上的工业级利用 Graph Embedding,次要是基于用户关系链去寻找类似的人群。分为两个步骤,首先是获取用户Embedding,其次计算用户间Embedding类似度。首先是Embedding的获取,次要是利用某种无监督机器学习办法失去。 在做Graph Embedding时面对的挑战和对应计划: 种子人群少,如何扩量:无监督计算Embedding类似度。如何构图:时空信息构图。如何加强序的概念:APP点击序列。 时空信息构图由点、边形成。点指的是用户。边指的是用户与用户在同一地块、同一时间、同时产生的行为。其中用户行为,次要包含用户对单车的扫码和关锁等。权重是无向等权,即当用户在一个地块一个时间段同时产生某种行为时,它们之间会有一条边,并且此边是等权重的。 而后采纳DeepWalk失去用户的Embedding。DeepWalk的原理是先在图中随机走出一个门路,之后将门路序列输出到Skip-Gram里进行训练,最初失去用户的向量。 然而上述做法存在肯定的有余,即只思考了用户和用户之间的关系,没有将用户之间自身的一些个性加到模型中。因而后续第二个迭代版本采纳的是EGES模型,其次要和以上做法有两点不同: 第一点:把用户Side information退出到模型。 第二点:不同类型用户Side information设置不同权重。 在后面两个算法迭代版本中,次要是两轮的骑行行为,用户间的关系以及用户自身个性三大方面的特色,然而要思考哈啰APP的所有用户,所以第三个迭代局部是将一些APP的行为序列给加进去。 工业级向量类似度的计算方法,采纳向量引擎Milvus,它的次要长处有两个: 第一:可达到近实时查问的成果。 第二:集成了多个向量索引库,可在限定的工夫内给业务返回后果。 此次业务成绩次要体现在两个方面: 覆盖度:全面平台化,0老本反对智能放量人群包,并笼罩60%场景。晋升度:ROI晋升20+%。将来方向 最初来讲一下咱们对精准营销的将来布局。 首先是图的构建,因为数据是模型的下限,在Graph Embedding里,首先要做到的是把图构建好。后续咱们有两个布局,用户公域点击行为和用户私域点击行为。第二是智能放量阈值,目前抉择阈值次要是经营,比方想扩10 倍或1000倍,就是随机靠人工去拍板。前面心愿建设一个阈值举荐机制,能够通过背地的算法举荐ROI最高状况下的放量倍数。(本文作者:郁丽萍)

April 11, 2022 · 1 min · jiezi

关于算法:AliPLC-智能丢包补偿算法提升弱网环境的通话质量

在线视频/语音通话逐步成为人们日常生活的一部分,而复杂多变的网络环境会导致局部音频包无奈传送到接收端,造成语音信号的短时中断或者卡顿,这会重大影响通话体验,为解决这类问题,阿里云视频云音频技术团队在综合思考成果、性能开销、实时性等诸多因素后,研发了实时因果的智能丢包弥补算法 AliPLC (Ali Packet Loss Concealment),采纳低复杂度的端到端的生成反抗网络来解决语音在传输过程中的丢包问题。 实时通信中,信号不好怎么办?随着互联网技术的飞速发展,直播,在线教育,音视频会议,社交泛娱乐,互动游戏等新兴的交互方式正在扭转着人们的生存。值得一提的是,它们的衰亡都离不开实时通信技术 (Real Time Communication, RTC) 的倒退。图 1 展示了 RTC 通信中音频链路的简要流程,次要蕴含:采集、前解决 (3A)、编码、传输、解码、丢包弥补、混音、播放等环节。图 1. RTC 中的音频链路示意图 语音信号通过编码压缩技术,在网络上进行分帧传送。然而因为网络环境的影响会导致局部音频包无奈传送到接收端,造成语音信号的短时中断或者卡顿,进而影响长时通话过程中的音质和可懂度。为解决以上问题,丢包弥补 (Packet Loss Concealment,PLC) 算法应运而生。PLC 算法能够通过利用所有已失去的信息对失落的音频包进行失当的弥补,使之不易被觉察,从而保障了接管侧音频的清晰度和晦涩度,给用户带来更好的通话体验。 音频弥补算法业内钻研现状丢包是数据在网络中进行传输时会常常遇到的一种景象,也是引起 VOIP(Voice Over Internet Phone, VOIP) 通话中语音品质降落的次要起因之一。传统的 PLC 解决方案次要基于信号剖析原理 [1-2],大抵能够分为基于发送端弥补的计划和基于接收端弥补的计划。前者的基本原理是利用编码冗余信息来复原丢包的内容。 然而,该办法须要占用额定带宽,且存在编解码器不兼容的问题。后者的基本原理是利用丢包前的解码参数信息来重构出失落的语音信号。传统的 PLC 办法最大的长处是计算简略,可在线弥补;毛病是弥补的能力无限,只能无效反抗 40ms 左右的丢包。应答长时间断突发丢包时,传统算法会呈现机械音,波形疾速衰减等无奈无效弥补的状况。因而,上述传统的 PLC 办法的解决能力满足不了现网业务的需要。 近年来,硬件和算法都有了显著的提高,越来越多深度学习的办法被利用到语音信号处理畛域。当然,PLC 算法也不例外。现有的深度 PLC 办法都是在接收端利用深度学习的模型生成失落的音频包,大抵能够分为两个通用的工作框架: 第一个是实时因果解决框架,只应用历史的未失落帧进行后处理。在进行实时处理时,按迭代办法的不同大抵能够分为基于循环神经网络的自回归办法 [3-4] 和基于生成反抗网络的并行办法 [5-6] 两种,但往往波及较大的参数量和计算量。 第二个是离线非因果解决框架,除了应用历史未失落帧之外,还有可能应用了包含将来帧的更宽泛的上下文信息 [7-8]。离线解决办法通常关注的是如何填充语音信号中的空白,而且通常不思考计算复杂度,难以在理论利用场景中部署。 智能丢包弥补算法:AliPLC1. 算法原理在综合思考业务应用场景,弥补成果、性能开销、实时性等诸多因素后,阿里云视频云音频技术团队研发了实时因果的智能丢包弥补算法:AliPLC(Ali Packet Loss Concealment),采纳低复杂度的端到端的生成反抗网络来解决语音在传输过程中的丢包问题。该算法具备以下长处:• 算法没有任何延时;• 能够实时流式解决;• 能够生成高质量的语音;• 不必独自进行平滑操作就能保障丢包前后音频的平滑和连贯性。 2. 算法性能AliPLC 算法的参数量为 590k, 在主频为 2GHz 的 Intel Core i5 四核机器上弥补一帧 20ms 的音频数据所需工夫为 1.5ms, 在推演的过程中不产生任何延时。 ...

April 11, 2022 · 2 min · jiezi

关于算法:MicroNet-低秩近似分解卷积以及超强激活函数碾压MobileNet-2020新文分析

论文提出应答极低计算量场景的轻量级网络MicroNet,蕴含两个外围思路Micro-Factorized convolution和Dynamic Shift-Max,Micro-Factorized convolution通过低秩近似将原卷积分解成多个小卷积,放弃输入输出的连接性并升高连接数,Dynamic Shift-Max通过动静的组间特色交融减少节点的连贯以及晋升非线性,补救网络深度缩小带来的性能升高。从试验后果来看,MicroNet的性能非常强劲 起源:晓飞的算法工程笔记 公众号论文: MicroNet: Towards Image Recognition with Extremely Low FLOPs 论文地址:https://arxiv.org/abs/2011.12289Introduction 论文将钻研定义在一个资源非常缓和的场景:在6MFLOPs的限定下进行分辨率为224x224的1000类图片分类。对于MobileNetV3,原版的计算量为112M MAdds,将其升高至12M MAdds时,top-1准确率从71.7%升高到了49.8%。可想而知,6M MAdds的场景是非常刻薄的,须要对网络进行仔细的设计。惯例的做法可间接通过升高网络的宽度和深度来升高计算量,但这会带来重大的性能降落。 为此,论文在设计MicroNet时次要遵循两个设计要领:1)通过升高特色节点间的连通性来防止网络宽度的缩小。2)通过加强非线性能力来弥补网络深度的缩小。MicroNet别离提出了Micro-Factorized Convolution和Dynamic Shift-Max来满足上述两个准则,Micro-Factorized Convolution通过低秩近似缩小输入输出的连接数但不扭转连通性,而Dynamic Shift-Max则是更强有力的激活办法。从试验后果来看,仅须要6M MAdds就能够达到53.0%准确率,比12M MAdds的MobileNetV3还要高。 Micro-Factorized Convolution Micro-Factorized Convolution次要是对MobileNet的深度拆散卷积进行更轻量化的革新,对pointwise convolution和depthwise convolution进行低秩近似。 Micro-Factorized Pointwise Convolution 论文将pointwise convoluton分解成了多个稠密的卷积,如上图所示,先对输出进行维度压缩,shuffle后进行维度扩大,个人感觉这部分与shufflenet根本一样。这样的操作在保障输出与输入均有关联的状况下,使得输出与输入之间的连接数缩小了很多。 假设卷积核$W$的输入输出维度雷同,均为$C$,Micro-Factorized Convolution可公式化为: $W$为$C\times C$矩阵,$Q$为$C\times \frac{C}{R}$矩阵,用于压缩输出,$P$为$C\times \frac{C}{R}$矩阵,用于扩大输入,$Q$和$P$均为蕴含$G$个块的对角矩阵。$\Phi$为$\frac{C}{R}\times \frac{C}{R}$排列矩阵,性能与shufflenet的shuffle channels操作一样。合成后的计算复杂度为$\mathcal{O}=\frac{2C^2}{RG}$,上图展现的参数为$C=18$,$R=2$,$G=3$。$G$的大小由维度$C$和降落比例$R$而定: 公式2是由维度数$C$与每个输入维度对应输出维度的连接数$E$之间的关系推导所得,每个输入维度与$\frac{C}{RG}$个两头维度连贯,每个两头维度与$\frac{C}{G}$个输出维度连贯,因而$E=\frac{C^2}{RG^2}$。如果固定计算复杂度$\mathcal{O}=\frac{2C^2}{RG}$和压缩比例R,失去: 公式3的可视化如图3所示,随着$G$和$C$的减少,$E$在缩小。在两者的交点$G=\sqrt{C/R}$处,每个输入维度刚好只与每个输出维度连贯了一次,其中$\Phi$的shuffle作用很大。从数学上来说,矩阵$W$可分为$G\times G$个秩为1的小矩阵,从大节结尾处的合成示意图可看出,矩阵$W$中$(i,j)$小矩阵理论为$P$矩阵的$j$列与$Q^T$的$j$行的矩阵相乘后果(去掉空格)。 Micro-Factorized Depthwise Convolution 论文将$k\times k$深度卷积合成为$k\times 1$卷积与$1\times k$卷积,计算与公式1相似,$\Phi$为标量1,如上图所示,可将计算复杂度从$\mathcal{O}(k^2C)$升高为$\mathcal{O}(kC)$。 ...

April 11, 2022 · 1 min · jiezi

关于算法:解说CS-4365-人工算法

CS 4365 Artificial IntelligenceSpring 2021Assignment 3: Knowledge Representation & ReasoningPart I: Due electronically by Wednesday, April 7, 11:59 p.m.Part II: Due electronically by Wednesday, April 14, 11:59 p.m.Part I: Programming (100 points)In this problem you will be implementing a theorem prover for a clause logic using the resolutionprinciple. Well-formed sentences in this logic are clauses. As mentioned in class, instead of usingthe implicative form, we will be using the disjunctive form, since this form is more suitable forautomatic manipulation. The syntax of sentences in the clause logic is thus:Clause → Literal ∨ . . . ∨ LiteralLiteral → ¬Atom | AtomAtom → True | False | P | Q | . . .We will regard two clauses as identical if they have the same literals. For example, q ∨ ¬p ∨ q,q ∨ ¬p, and ¬p ∨ q are equivalent for our purposes. For this reason, we adopt a standardizedrepresentation of clauses, with duplicated literals always eliminated.When modeling real domains, clauses are often written in the form:Literal ∧ . . . ∧ Literal ⇒ LiteralIn this case, we need to transform the clauses such that they conform to the syntax of the clauselogic. This can always be done using the following simple rules: ...

April 11, 2022 · 11 min · jiezi

关于算法:浅谈Genetic-Algorithm

Genetic Algorithm - ImmutableTime Estimate: 20 hoursVideoIntroductionYou will design and implement a genetic algorithm without using any mutable variables orstate.The following are banned in your submissions:● Variables (var)○ The value at any memory address on the stack or heap cannotchange throughout the execution of your program○ You can use values (val) to store values● Any way of directly simulating mutability that is against the spirit of thisassignment. (Ex. Importing a class that has a mutable state variable)If your submission, including testing, violates this immutability restriction it will not be graded.DescriptionGenetic algorithms provide probabilistic solutions to optimization problems. These algorithmscan be thought of as an advanced “guess and check” technique that eventually arrives at anoutput that is close to the actual solution without having to know how to compute the solutiondirectly.Resources:● https://en.wikipedia.org/wiki...● https://www.tutorialspoint.co...You will write a generic genetic algorithm of your own design that can find approximatesolutions to optimization problems. This algorithm will be written in such a way that it can bereused for any application suitable for a genetic algorithm. For each problem, you will needto provide your genetic algorithm with a cost function to determine how well a potentialsolution performs and an incubator function that creates a potential solution from a list ofdoubles (Referred to as genes).Project Structure ...

April 9, 2022 · 8 min · jiezi

关于算法:INT104-人工智能解答

INT104: Artificial Intelligence Spring 2021Lab 4: Linear Algebra and ProbabilityDisclaimer: 1. Lab reports deadlines are strict. University late submission policy will be applied. Collusion and plagiarism are absolutely forbidden (University policy will be applied).Report is due 14 days from the date of running this lab4.1 Objectives• Solve the general problems on linear algebra and probability knowledge.4.2 Problem StatementGiven a two-dimensional array, where each row represents an instance (or object). For each row, the first 5columns are the attributes of the instance and the final column is the label of the instance such asa0, a1, a2, a3, a4, lAs you’ve seen, all attributes can take two values 0 or 1.4-14-2 Lab 4: Linear Algebra and ProbabilityNow you’re required to compute the following estimated probabilities: p(l = 0), p(l = 1), p(ai = 0|l =0), i = 0, 1, 2, 3, 4 and p(ai = 1|l = 0), i = 0, 1, 2, 3, 4, p(ai = 0|l = 1), i = 0, 1, 2, 3, 4 and p(ai = 1|l = 1), i =0, 1, 2, 3, 4.4.3 Lab Report• Write a short report which should contain a concise explanation of your implementation, results andobservations (see the coursework template).• Please insert the clipped running image into your report for each step with the mark.• Submit the report and the python source code electronically into ICE.• The report in pdf format and python source code of your implementation should be zipped into a singlefile. The naming of report is as follows:e.g. StudentID LastName FirstName LabNumber.zip (123456789 Einstein Albert 1.zip)Hints: 1) use the fraction of the given events in all instances to estimate the probabilities (N is the totalnumber of the instances and # is the size of the set).Lab 4: Linear Algebra and Probability 4-3p(l = 0) = #{l = 0}N(4.1)p(ai = 0|l = 0) = #{ai = 0, l = 0} ...

April 8, 2022 · 3 min · jiezi

关于算法:数据处理COMP20008

COMP20008 Elements of Data ProcessingAssignment 1March 3, 2021Due dateThe assignment is worth 20 marks, (20% of subject grade) and is due 8:00am Thursday1st April 2021 Australia/Melbourne time.BackgroundLearning outcomesThe learning objectives of this assignment are to: Gain practical experience in written communication skills for documenting for datascience projects. Practice a selection of processing and exploratory analysis techniques through visualisation. Practice text processing techniques using Python. Practice widely used Python libraries and gain experience in consultation of additionaldocumentation from Web resources.Your tasksThere are three parts in this assignment, Part A, Part B, and Part C. Part A and Part B areworth 9 marks each and Part C is worth 2 marks.Getting startedBefore starting the assignment you must do the following: Create a github account at https://www.github.com if you don’t already have one. Visit https://classroom.github.com/... and accept the assignment. Thiswill create your personal assignment repository on github. Clone your assignment repository to your local machine. The repository contains importantfiles that you will need in order to complete the assignment.1COMP20008 2021 SM1Part A (Total 9 marks)For Part A, download the complete “Our World in Data COVID-19 dataset” (“owid-coviddata”)from https://covid.ourworldindata....Part A Task 1 Data pre-processing (3 marks)Program in python to produce a dataframe by ...

April 8, 2022 · 9 min · jiezi

关于算法:ECE-510-计算求解

ECE 510: Foundations of Computer EngineeringProject 3MIPS SimulatorThis assignment will give you experience in programming in C++ and the operation of a MIPSpipelined processor. Further, you will gain insight into how multiple events that occur in parallelcan be simulated using a sequential machine. Problem StatementThis assignment requires a simple 5 stage pipelined machine to be simulated. The simulatorshould be capable of implementing the MIPS architecture on a cycle by cycle basis. The simulatormust be cycle accurate with respect to contents of the registers, but need not be faithful to otherhardware details such as control signals. The output of the simulator, in addition to the registercontents and latch values should include the utilization factor of each functional unit and thetotal time in cycles to execute a set of instructions. Implement the simulator according to thespecifications described below in C++. Submit the code, simulation results and a projectdescription write-up.1.1 Instructions to be implementedThe simulator should implement the following instructions: add, sub, addi, mul, lw, sw, beq, lui,and, andi, or, ori, sll, srl, slti, and sltiu. Note that these instructions operate integer instructionsonly. The MIPS instruction format can be used for all instructions except mul. Assume the syntaxfor mul is mul $a,$b,$c, meaning that we multiply the contents of $b and $c, the least significantbits of results are placed in register $a and the most significant 32-bits of the result will bestored in register $(a+1). For example, mul $t0, $t8, $t9 will store lower 32-bits of the product of$t8 * $t9 in register $t0 and the upper 32-bits of the product in register $t1 (Hint: See MIPS greensheet instructions summary for registers numbering). This is different from the mult instructionin MIPS. Assume the opcode and function code for mul to be same as that of mult.1.2 Inputs to the simulator1) MIPS machine code as a text file: Convert the assembly level instructions to machine level byusing https://www.eg.bucknell.edu/~... or http://www.kurtm.net/mipsasm/2) A query to the user to select between instruction or cycle mode• Instruction mode: To observe execution of the program instruction by instruction• Cycle mode: To observe execution of the program cycle by cycle3) A query to the user to select the number of instructions or cycles (depending on the choicemade in the previous query) to be executed.4) After executing the number of instructions or cycles entered initially by the user, a thirdquery to the user to choose to continue execution or not.• If yes, Repeat from step 3• If no, exit the execution and display the results1.3 Memory, Registers and PCThe memory is one word wide and 2K bytes in size. There are physically separate instruction anddata memories for the instruction and data. Data memory is initialized to 0 at the beginning ofeach simulation run. There is no cache in this machine.There are 32 registers; register 0 is hardwired to 0. In addition, there is a Program Counter (PC).PC should start execution by fetching the instruction stored in the location to which it isinitialized.1.4 CPUThe pipelined MIPS processor has 5 stages: IF, ID, EX, MEM, WB. There are pipeline registersbetween the stages: IF/ID, ID/EX, EX/MEM, MEM/WB. Assume the pipeline registers to containfollowing latches:• IF/ID : IR, NPC• ID/EX : IR, NPC, A, B , Imm• EX/MEM : IR, B, ALUOutput , cond• MEM/WB : IR, ALUOutput, LMD1.5 Output of the simulatorIn addition to displaying the register contents and latch values after the execution of eachcycle/instruction, it should output the following statistics• Utilization of each stage. Utilization is the fraction of cycles for which the stage isdoing useful work. Just waiting for a structural, control, or data hazard to clear infront of it does not constitute useful work.• Total time (in CPU cycles) taken to execute the MIPS program on the simulatedmachine. (This is NOT the time taken to execute the simulation; it is the timetaken by the machine being simulated.)1.6 Dealing with branchesThe processor does not implement branch prediction. When the ID stage detects a branch, it asksthe IF stage to stop fetching and flushes the IF_ID latch (inserts NOP). When the EX stage resolvesthe branch, IF is allowed to resume instruction fetch depending on the branch outcome.1.7 Other remarks• No interrupts.• Does not support out of order execution• Does not support data forwarding• Assume register writes are completed in the first half of clock cycle and register readsare carried out in the second half.• All data, structural and control hazards must be taken into account.• Branches are resolved in the EX stage.Way to approach (Suggestion)Start by figuring out how to implement the Pipeline registers (use of class is recommended), thestages of the pipeline, instruction and data memory, 32 registers and PC. Think of how theexecution of 5 stages of the pipeline (which is a parallel operation) could be done in C++ (whereinstructions are executed sequentially). Figure out a way to account for data, structural andcontrol hazards. Finally think of how the utilization and total time to execute the program can bemeasured.

April 8, 2022 · 4 min · jiezi

关于算法:恒源云Gpushare啥还不知道咋关机技巧大放送6

文章起源 | 恒源云社区 原文地址 | 【小技巧-关机篇】 1、如何实现训练后自动关机?实例终端中执行 shutdown 命令能够实现关机操作。训练代码完结后能够调用该命令实现训练实现后关机。 import osos.system('shutdown')2、关机时提醒磁盘空间已满,无奈关机如何解决?实例的根目录磁盘使用率,能够通过上面的命令进行查看。 如果满了须要删除一些文件开释空间,或把文件挪动到 /hy-nas(仅有共享存储机型) 或 /hy-tmp(按量实例关机 24 小时会被清空)。 磁盘满的状况下实例是无奈失常启动的,所以要求在关机前必须开释肯定的空间。 进入实例终端后通过下列命令能够查找占用空间的文件。 # 查看实例根目录磁盘使用率df -h | grep "/$" | awk '{print $5" "$3"/"$2}'# 查看 /root 和 /home 目录上面每个目录的大小du -h --max-depth=1 /root /home# 查看当前目录下每个目录的大小du -h --max-depth=1 .# 查看当前目录下每个文件的大小ll -h | grep ^- | awk '{print $5"\t"$9}'或者进入控制台中的实例列表,点击实例中 零碎磁盘 下的 治理 按钮,在关上的面板中能够删除实例内的文件。

April 7, 2022 · 1 min · jiezi

关于算法:故纸堆数字藏品模式丨DApp开发

作为一种非同质化代币,能够在网上流转的资产,NFT自身是区块链中和比特币正好绝对而立的另一面。 换言之,NFT则相似于文物,每一个都不一样,世界上只有一个真品。 或者也恰好是基于这种不齐全意义上的比喻,NFT和比特币的初始利用场景,也就恰到好处地从艺术品和代币两个角度别离起航了。 但NFT实质上表演的是一个公证处、一张鉴定书,确定玉石、金器的成分、字画古玩的真伪等,甚至还可能是一个动静的集体状态。 数字藏品,让艺术珍藏不再局限于物理边界而延长向数字世界,让更多人可能自在鉴赏并领有正版的数字作品。

April 6, 2022 · 1 min · jiezi

关于算法:鲸探数字藏品平台模式丨DApp开发

近年来,随着新经济新业态的一直倒退和互联网技术的倒退,互联网电商平台经济也在大环境的推动下迅速倒退。在NFT热潮刺激下,国内各类数字收藏品平台层出不穷。目前“幻核”和“鲸探”等支流平台次要采纳定价、限量发售模式,与寰球最大NFT平台Opensea相比,其商业模式并不显著。阿里拍卖入局,以中介交易平台的角色撬开数字藏品商业模式,无望凭借品牌与流量劣势,建设起宏大的数字藏品业务幅员,为数字藏品交易市场注入生机 软件特色:1.认证品牌+创作者自营品牌:对于认证过的创作者给予认证标识,确保作品真实性和品质,后续逐渐将认证的权力让渡给社区。 2.多种发行形式:采纳定价发行、盲盒发行、多模式拍卖等机制来确保不同范式的数字藏品都可能有正当的价格发现。 3.残缺生态场景:与优质IP单干,保障作品的高质量,同时为珍藏爱好者提供替换渠道,终构建创作者、消费者、经纪人三位一体的残缺生态,保障流动性。 4.生态赋能劣势:扩充数字藏品利用场景,为数字藏品产品继续赋能,终造成音乐、体育、动漫、艺术品、虚构土地、域名等多畛域协同倒退生态 数字藏品,让艺术珍藏不再局限于物理边界而延长向数字世界,让更多人可能自在鉴赏并领有正版的数字作品。 成为数字收藏家,不仅能够参观藏品、享受珍藏的美妙体验,还能够与好友分享珍藏见解和高兴。 每个数字藏品都映射着特定区块链上惟一序列号,不可篡改,不可分割,也不可相互代替,记录着不可篡改的链上权力。

April 6, 2022 · 1 min · jiezi

关于算法:幻核数字藏品平台模式丨DApp开发

在NFT热潮刺激下,国内各类数字收藏品平台层出不穷。目前“幻核”和“鲸探”等支流平台次要采纳定价、限量发售模式,与寰球最大NFT平台Opensea相比,其商业模式并不显著。阿里拍卖入局,以中介交易平台的角色撬开数字藏品商业模式,无望凭借品牌与流量劣势,建设起宏大的数字藏品业务幅员,为数字藏品交易市场注入生机 数字藏品APP特色: 1、每个数字藏品都映射着特定区kuai链上惟一序列号,不可篡改,不可分割,也不可相互代替,记录着不可篡改的链上权力 2、数字藏品,让艺术珍藏不再局限于物理边界而延长向数字世界,让更多人可能自在鉴赏并领有正版的数字作品 3、成为数字收藏家,不仅能够参观藏品、享受珍藏的美妙体验,还能够与好友分享珍藏见解和高兴 数字藏品交/易平台特点: 1、您也能够将数字珍藏集放大到1:1的大小来拍照,随便进行交/易; 2、很多数字藏品,能够自在交/易,能够看到很多系列的数字收藏品; 3、购销能够在市场上,从其余收藏家购/买并销/售。 数字藏品为何这么火?专家介绍,数字藏品具备一些实物没有的劣势,比方线上珍藏不必受空间的束缚,也不用放心随着工夫流逝而损坏、失落等,同时区块链技术为此类产品锚定了唯一性,使得它领有了一种新鲜的、独到的价值承载形式。

April 6, 2022 · 1 min · jiezi

关于算法:NFT链游模式开发丨DApp搭建

在Facebook更名“Meta”后,乘着元宇宙概念的东风,数字藏品在巨头团体里炽热起来。时尚数字藏品是目前在元宇宙中利用最广泛的垂类之一,而当下最适宜成为元宇宙载体的品类则是社交媒体和游戏。 链游即区块链+游戏,也称“GameFi”简略来说指的是将去中心化金融产品以游戏的形式出现,将游戏的领有的角色和游戏的道具进行“NFT化”(NFT具备不可分割、不可代替的个性),简略了解就是将区块链的技术利用到游戏中,其高度去中心化特点和规定,让玩家实现游戏资产私有化,平安化,透明化。游戏中玩家所领有的所有道具、资源甚至角色都是可货币化的,可供玩家之间自在交易来赚取收益。 相比拟传统游戏而言,链游的劣势之处有哪些传统游戏次要是第三方游戏开发者把握游戏,数据透明度不高,只是一种单纯的娱乐消遣形式,无奈带来理论利益,而链游,使用区块链技术,玩家游戏数据存储在链上,没有第三方平台把握,不可随便纂改,利益交易公开通明,此外,可为玩家和投Z者带来理论收益。在链游中,玩家能够在游戏中获取代必的处分,代必能够在货币数字市场卖出,道具也能够在市场卖出。

April 6, 2022 · 1 min · jiezi

关于算法:大厂九章算法班-2021版

大厂九章算法班 2021版超清原画 残缺无密 包含所有视频课件以及源码 点击下崽:网盘链接入门React Hooks及其罕用的几个钩子函数 写在后面React Hooks 是 React 团队在两年前的 16.8 版本推出的一套全新的机制。作为最支流的前端框架,React 的 API 十分稳固,这次更新的公布,让泛滥恐怖新轮子的前端大佬们虎躯一震,毕竟每一次更新又是高成本的学习,这玩意到底好使么? 答案是好用的,对于 React 的开发者而言,只是多了一个抉择。过来的开发方式是基于Class组件的,而hooks是基于函数组件,这意味着 这两种开发方式可能并存 ,而新的代码可能根据具体情况采纳 Hooks 的形式来实现就行了。这篇文章次要就来介绍一下 Hooks 的劣势 和 罕用的几个钩子函数 。 Hooks的劣势1.类组件的不足代码量多 : 相较于函数组件的写法,使用类组件代码量要略多一点,这个是最直观的感触。 this指向: 类组件中总是需要考虑this的指向问题,而函数组件则可能忽略。 趋势简单难以保护 : 在高版本的React中,又更新了一些生命周期函数,因为这些函数互相解耦,很容易造成扩散不集中的写法,漏掉要害逻辑和多了冗余逻辑,导致前期debug艰巨。相同,hooks可能把要害逻辑都放在一起,不显得那么割裂,调试起来也易懂一点。 状态逻辑难复用 : 在组件之间复用状态逻辑很难,可能要用到 render props (渲染属性)或者 HOC (高阶组件),但无论是渲染属性,还是高阶组件,都会在原先的组件外包裹一层父容器(一般都是 div 元素),导致层级冗余。 Hooks带来的好处逻辑复用在组件之前复用状态逻辑,经常需要借助高阶组件等简单的设计模式,这些高级组件会产生冗余的组件节点,让调试变得艰巨,上面用一个demo来对比一下两种实现形式。 Class 在class组件场景下,定义了一个高阶组件,负责监听窗口大小变动,并将变动后的值作为 props 传给下一个组件。 const useWindowSize = Component => { // 产生一个高阶组件 HOC,只蕴含监听窗口大小的逻辑 class HOC extends React.PureComponent { constructor(props) { super(props); this.state = { size: this.getSize() };}componentDidMount() { window.addEventListener("resize", this.handleResize); }componentWillUnmount() { window.removeEventListener("resize", this.handleResize);}getSize() { return window.innerWidth > 1000 ? "large" :"small";}handleResize = ()=> { const currentSize = this.getSize(); this.setState({ size: this.getSize() });}render() { // 将窗口大小传送给真正的业务逻辑组件 return ;}} return HOC;};复制代码接下来可能在自定义组件中可能调用 useWindowSize 这样的函数来产生一个新组件,并自带 size 属性,例如: ...

April 6, 2022 · 3 min · jiezi

关于算法:NFT区块链游戏模式丨DApp开发

什么是链游?链游即区块链+游戏,也称“GameFi”简略来说指的是将去中心化金融产品以游戏的形式出现,将游戏的领有的角色和游戏的道具进行“NFT化”(NFT具备不可分割、不可代替的个性),简略了解就是将区块链的技术利用到游戏中,其高度去中心化特点和规定,让玩家实现游戏资产私有化,平安化,透明化。游戏中玩家所领有的所有道具、资源甚至角色都是可货币化的,可供玩家之间自在交易来赚取收益。 相比拟传统游戏而言,链游的劣势之处有哪些传统游戏次要是第三方游戏开发者把握游戏,数据透明度不高,只是一种单纯的娱乐消遣形式,无奈带来理论利益,而链游,使用区块链技术,玩家游戏数据存储在链上,没有第三方平台把握,不可随便纂改,利益交易公开通明,此外,可为玩家和投资者带来理论收益。在链游中,玩家能够在游戏中获取代必的处分,代必能够在货币数字市场卖√出,道具也能够在市场卖出。 在元宇宙、NFT等概念加持下,链游仍一直升温,越来越深受欢送,它是架构在区块链之上的游戏,因此,使用区块链技术的链游,具备了区块链相干特色,不仅能拓宽市场,还能带来理论收益。

April 2, 2022 · 1 min · jiezi

关于算法:AIOps智能运维中的指标算法场景分享-内附视频ppt资料

本文转录自:北大博士后严川在云智慧AIOps社区举办的Meetup上进行的《AIOps指标相干算法体系》分享。 直播回放:戳此查看在线回放 PPT下载:戳此下载材料 智能运维算法场景概览传统运维VS智能运维传统运维: 特点:处理速度慢,人力需要大;在海量监控数据下,传统运维效率低下。 故障发现工夫久故障定位工夫长故障修复工夫长 智能运维 特点:处理速度快、人力需要小;在海量监控数据下,智能运维效率高。 故障发现工夫快故障定位工夫短故障修复工夫短 运维场景系统分析 智能运维场景系统分析智能运维:运维场景+智能技术,其围绕着指标/日志/追踪/告警四因素及其转化的AI使能。 故障发现VS指标算法场景 指标异样检测场景智能运维中指标异样检测的意义在运维畛域中,指标异样检测是其余智能运维场景建设的根底,异样检测的后果将为后续的告警压缩、故障定位、故障自愈等场景提供重要输出。 现有的监控告警零碎大部分采纳人工设定规定或阈值的形式来实现中大型业务零碎中会面临更多的KPI数量,更简单的KPI间的关联关系,以及更多样性的KPI型态运维畛域中,人工设定规定或阈值的办法耗时且容易误报和漏报智能运维中指标异样检测的利用场景业务指标 接口访问量用户数响应工夫根底监控指标 主机零碎指标:CPU利用率、内存利用率、IO利用率、温度、电压数据库指标:慢sql数量、连贯响应时长、缓冲区命中率、表空间使用率中间件指标:socket数量、服务器响应工夫、线程池应用状况存储设备指标:磁盘使用率、控制器信息、风扇信息经营指标 银行跑批业务乘车码主被扫业务 时序异样监测在运维畛域落地的艰难海量监控指标 企业的设施数、零碎数众多,运维零碎须要对海量指标进行监控,以保障企业服务的稳固运行。 KPI 的多样性 有监督异样检测算法对海量监控指标的老本高,以后只能从无监督视角进行检测,这对异样检测的准确性带来了更大的挑战。 某气象相干数据 异样类型多 异样类型包含“点异样”、“上下文异样”、“群体异样”、“缺失值”等 某银行零碎业务数据 异样规范不统一 不同指标、资源配置、运维人员,对同一模式,可能会有不同的判断规范、不同的ground truth。算法须要适配不同的敏感度需要。 节假日流动治理 节假日和经营流动时,监控指标常会生成与常态不统一的模式,减少异样检测难度 。 指标异样检测的挑战繁多算法难以适配多种数据类型 数据异样vs业务异样vs调参 单指标异样检测常见算法 简略统计类 时序合成类 指标分类与指标异样检测相结合解决繁多算法无奈适配多种数据类型问题 指标分类常见的数据类型 指标分类中周期性测验的重要性周期性数据在所有数据中占比不高(25%), 但通常具备较高价值 检测数据: 1 是否具备周期 2 具备多少周期 3 每个周期成分如何 对于简化时序异样检测问题至关重要 单指标异样检测在实在数据上的利用 内存使用率数据 交易量数据 指标预测场景运维畛域时序预测的意义运维畛域: 是其余智能运维场景建设的根底(容量布局、异样检测、告警压缩、故障定位、故障自愈等场景) 经营和网络安全: 关注增长和需要 ...

April 2, 2022 · 1 min · jiezi

关于算法:CSSE2310

The University of QueenslandSchool of Information Technology and Electrical EngineeringCSSE2310/CSSE7231 — Semester 1, 2021Assignment 2 (v1.0)Marks: 50Weighting: 10%Due: 3:59pm 1 April, 2021IntroductionThe goal of this assignment is to ensure you have gained familiarity and skills with both the C programminglanguage and using a debugger (such as gdb(1)) to examine various characteristics of running programs. Thesewill be essential skills in later assignments for this course. For this assignment you will be given an executableprogram (the “bomb”) which you have to “defuse” by entering the correct defusing phrases into the programfor each of the 10 bomb phases.Student conductThis is an individual assignment. You should work on defusing your own bomb by yourself. You should feelfree to discuss aspects of C programming and the use of debuggers with your fellow students, but you shouldn’tactively help (or seek help from) anyone with the defusing of particular phases. Do not share your approachesto solving the bomb phases - even after the assignment deadline (as extensions may be given to other students).You should note that each student will receive a different bomb and the strings that defuse your bomb willbe different to the strings that defuse another student’s bomb.In short - don’t risk it! If you’re having trouble, seek help early from a member of the teaching staff. Don’tbe tempted to cheat. You should read and understand the statements on student misconduct in the course profileand on the school website: https://www.itee.uq.edu.au/it...Obtaining the “Bomb”Whilst logged in to moss.labs.eait.uq.edu.au, you should type the following command:getbombThis will create a subdirectory within your current directory named csse2310a2 and place the bomb files intothat directory. The files will include an executable called bomb and a number of source files (.h and .c files).Your bomb (executable and source) will be different to the bombs for all other students. You will not receiveall of the source files – just some of them. There is enough information contained within the bomb executableand the supplied source files in order for you to successfully defuse all phases (although some of them are moredifficult than others). You should note that some of the bomb’s modules have been compiled with debuggingsupport (-g flag to gcc) and some haven’t.Running the “Bomb”The bomb program will only run on moss.labs.eait.uq.edu.au and you are the only user who can run your bombprogram. Any attempt to run the program on another host or to run another user’s bomb will cause the bombto exit immediately. Whilst in your csse2310a2 directory, you can execute the bomb by typing./bombYou may not want to do this until you are ready to try defusing the bomb. When you start the bomb program,it will print out details of any phases you have already defused and it will print your current mark (out of 50)and the maximum mark you can obtain based on your attempts to date.The bomb will then prompt you to enter the number of the phase to defuse next, followed by the string thatyou believe defuses that phase (or a test string). You will be prompted for confirmation before that string is1tested. If you confirm your attempt and the string is incorrect then the bomb will “explode” and exit. If thestring is correct, then that phase is defused and you will not be able to solve it again. You will lose marks forevery time the bomb “explodes”.You should note that the bomb is booby trapped. You are warned against modifying the internal datastructures of the bomb – you never know what might happen and any loss of marks you incur will not bereversed.HintsThere are two demo phases that do not count for marks. You may attempt these as many times as you like byentering either “demo1” or “demo2” when prompted for a phase to defuse. There is no mark penalty if eitherof these demo phases “explodes.”You should carefully read the supplied source code and be familiar with the use of gdb before attemptingto run the bomb. It is suggested you run the bomb from within a debugger rather than standalone. Note thatyou may get a warning message about “Missing separate debuginfos...” – you can safely ignore this message.All phases have associated code and some debugging information and you will need to use a debugger to setbreakpoints, examine various variables etc in order to determine the defusing strings. You may need to learnabout and use a number of features of gdb including watchpoints, automatic display, conditional breakpoints,and/or breakpoint command lists to solve the phases more efficiently.You should note that the code that determines each defusing string is not executed until AFTER the defusingtext is read from the user so you may need to enter some arbitrary text, debug the code to determine the defusingstring, quit the program and then run it again to enter the defusing string for that phase.The bomb is deterministic – the same sequence of inputs will result in the same operation each time, so thedefusing string for each phase will not vary over time. However, many of the functions within the bomb are notdeterministic – they may return something different each time they are called.SubmissionEvery time you run the bomb, a record is kept of your interactions with it and your success/failure at defusingeach phase. Your submission time for the assignment will be considered to be the time of your last attemptto defuse any phase of the bomb. You must make at least one attempt in order to be considered to havemade a submission. An attempt means either that the bomb explodes or a phase is defused.Late penalties will apply as described in the CSSE2310/CSSE7231 course profile. Any attempt to defusethe bomb after the deadline will result in a late penalty being applied to your whole assignment mark.MarksThere are 10 phases, each worth 5 marks. The mark you achieve for each phase is determined by the numberof attempts taken before you successfully defuse that phase. If you do not defuse a phase you will receive zeromarks for that phase. If you defuse a phase on the first attempt, you will receive 5 marks for that phase. If ittakes you longer than one attempt, your mark for that phase will be5 × 0.8(number of attempts−1)i.e. if it takes you 2 attempts, your mark for that phase will be 4 out of 5, 3 attempts gives you 3.2 out of 5, 4attempts gives you 2.56 out of 5, etc. There is no limit on the number of attempts you can make at any phasebefore succeeding. You should note that although each phase is worth the same number of marks, they arenot of equal difficulty. All marks are subject to an audit of our logs to ensure that you have correctly enteredthe defusing strings and haven’t tampered with the bomb to defuse it in some other way. Tampering with thebomb to make it appear as though you have defused a phase when you have not correctly defused it will resultin zero marks for that phase. ...

April 2, 2022 · 6 min · jiezi

关于算法:EC6001-Time-Series

SMU MSE EC6001 Time Series Econometrics Term 2 AY21-22 Individual Project Instructions Select a time series of interest to you (your “forecast series”), and any number of potential predictors. The only restriction to your choice is that you have at least 40 observations in each of your series. Justify your choice of predictors on theoretical grounds, either citing and briefly describing a formal theory, or using informal arguments. Provide all relevant background and ...

April 2, 2022 · 3 min · jiezi

关于算法:ADS2-算法

ADS2 2021 1 Assessed Exercise 2Algorithms and Data Structures (ADS2)Assessed Exercise 2This exercise is for submission using Moodle and counts for 10% of the total assessmentmark for this course.This exercise is worth a total of 30 points.The deadline for submission is Friday 26 March 2021 at 4:30pm.ExerciseThis exercise has two parts. The first involves implementing in Java the Dynamic Set abstractdata type using two different data structures. The second involves running an empirical studyto compare the performance of each implementation.SubmissionSubmit the Java sources of your implementations and a short (max 3 pages) report describingwhat you have done in each part of the exercise. Your report should include a heading statingyour full name and matriculation number and clear instructions on how to run your code.Please make sure the report is in pdf format and your sources are not password protected.Part 1The Dynamic Set is an abstract data type (ADT) that can store distinct elements, without anyparticular order. There are five main operations in the ADT:• ADD(S,x): add element x to S, if it is not present already• REMOVE(S,x): remove element x from S, if it is present• IS-ELEMENT(S,x): check whether element x is in set S• SET-EMPTY(S): check whether set S has no elements• SET-SIZE(S): return the number of elements of set SAdditionally, the Dynamic Set ADT defines the following set-theoretical operations:• UNION(S,T): return the union of sets S and T• INTERSECTION (S,T): return the intersection of sets S and T• DIFFERENCE(S,T): returns the difference of sets S and T• SUBSET(S,T): check whether set S is a subset of set TImplement in Java the Dynamic Set ADT defined above usinga) a doubly linked list and [9]b) a binary search tree. [9]Observe that the ADT implementation should use Java Generics (see Lab 3) and operationsshould be in the form s.add(x), s.remove(x), etc. Explain in the report yourimplementation, noting the running time (using big Oh notation) of each operation in bothimplementations. Note you can use a self-balancing binary tree but no extra marks will beawarded. Also, you are not allowed to rely on Java library classes in your implementation.ADS2 2021 2 Assessed Exercise 2c) Suppose your implementation based on a doubly linked list maintains the list sorted.Explain in the report what are the implications of such implementation choice on thecomplexity of operations ADD and IS-ELEMENT? [2]d) A naive implementation of operation UNION(S,T) in the implementation based onBST consists in taking all elements of BST S one by one, and insert them into BST T.Describe in the report an implementation with a better running time. Use big Ohnotation to indicate running times. [5]Part 2a) Compare the two implementations of the Dynamic Set ADT by carrying out thefollowing empirical study. First, populate (an initially empty) set S with all theelements from dataset int20k.txt provided on Moodle under Lab/Files. Then,generate 100 random numbers in the interval [0, 49999]. Finally, for each randomnumber x record the time taken to execute IS-ELEMENT(S,x). What is the averagerunning time of IS-ELEMENT over 100 calls in the two implementations of the ADT?Comment and explain your findings. [3]b) What is the output of SET-SIZE(S)? [1]c) What is the height of the BST implementing set S [1] ...

April 1, 2022 · 3 min · jiezi

关于算法:NFT数字藏品丨APP开发源码搭建

什么是NFT?NFT是“Non-Fungible Tokens”的缩写,意思是不可调换的代币,NFT的单位永远为1,NFT的典型特色就是唯一性和稀缺性。 不同于加密货币、央行数字货币、稳固币等可不分面额大小、灵便的破费与找零,每一个NFT都是举世无双、不可分割及不可随便替换的存在,因而成为承载数位艺术价值与交易的新兴媒介。 NFT最后的次要利用场景是虚构珍藏。因为应用NFT对艺术品进行编码合乎艺术品稀缺的个性,对于收藏家、艺术家和创作者来说很有价值。因为其独特性和更多用例的呈现,往年NFT曾经倒退成为一些人展现其财产和身份的一种形式。 NFT的不可分割、不可代替、举世无双等基本特征,可能更好地爱护创作者权利,打造更为衰弱的生态系统,目前NFT的利用场景正在向各个领域不断扩大和深入.而NFT作为元宇宙的重要基础设施之一,既能解决身份认证和确权问题,又能够实现元宇宙之间的价值传递,更是作为事实世界和元宇宙之间的桥梁,减速元宇宙的到来和成熟。

April 1, 2022 · 1 min · jiezi

关于算法:CSc-361-Networks

1 CSc 361: Computer Communications and Networks2 (Spring 2021)3 Assignment 3: Analysis of IP Protocol4 Spec Out: March 5, 20215 Due: 11:55 pm April 2, 20216 1 Goal7 The purpose of this assignment is to learn about the IP protocol. You are required to write a8 python program to analyze a trace of IP datagrams.9 2 Introduction10 In this assignment, we will investigate the IP protocol, focusing on the IP datagram. Well do so11 by analyzing a trace of IP datagrams sent and received by an execution of the traceroute program.12 We will investigate the various fields in the IP datagram, and study IP fragmentation in detail.13 A background of the traceroute program is summarized as follows. The traceroute program14 operates by first sending one or more datagrams with the time-to-live (TTL) field in the IP header15 set to 1; it then sends a series of one or more datagrams towards the same destination with a TTL16 value of 2; it then sends a series of datagrams towards the same destination with a TTL value of 3;17 and so on. Recall that a router must decrement the TTL in each received datagram by 1 (actually,18 RFC 791 says that the router must decrement the TTL by at least one). If the TTL reaches 0, the19 router returns an ICMP message (type 11 TTL-exceeded) to the sending host. As a result of this20 behavior, a datagram with a TTL of 1 (sent by the host executing traceroute) will cause the router21 one hop away from the sender to send an ICMP TTL-exceeded message back to the sender; the22 datagram sent with a TTL of 2 will cause the router two hops away to send an ICMP message back23 to the sender; the datagram sent with a TTL of 3 will cause the router three hops away to send an24 ICMP message back to the sender; and so on. In this manner, the host executing traceroute can25 learn the identities of the routers between itself and a chosen destination by looking at the source26 IP addresses in the datagrams containing the ICMP TTL-exceeded messages. You will be provided27 with a trace file created by traceroute.28 Of course, you can create a trace file by yourself. Note that when you create the trace file,29 you need to use different datagram sizes (e.g., 2500 bytes) so that the captured trace file includes30 information on fragmentation.31 3 Requirements32 There are two requirements for this assignment:133 3.1 Requirement 1 (R1)34 You are required to write a python program to analyze the trace of IP datagrams created by35 traceroute. To make terminologies consistent, in this assignment we call the source node as the36 computer that executes traceroute. The ultimate destination node refers to the host that is the37 ultimate destination defined when running traceroute. For example, the ultimate destination node38 is “mit.edu” when you run39 %traceroute mit.edu 200040 In addition, an intermediate destination node refers to the router that is not the ultimate destination41 node but sends back a ICMP message to the source node.42 As another example, you can set “don’t fragment”’ bit and set the number of probes per “ttl”43 to 5 queries using the following command:44 %traceroute -F -q 5 mit.edu 20045 Your program needs to output the following information:46 • List the IP address of the source node, the IP address of ultimate destination node, the IP47 address(es) of the intermediate destination node(s). If multiple intermediate destination nodes48 exist, they should be ordered by their hop count to the source node in the increasing order.49 • Check the IP header of all datagrams in the trace file, and list the set of values in the protocol50 field of the IP headers. Note that only different values should be listed in a set.51 • How many fragments were created from the original datagram? Note that 0 means no frag-52 mentation. Print out the offset (in terms of bytes) of the last fragment of the fragmented IP53 datagram. Note that if the datagram is not fragmented, the offset is 0.54 • Calculate the average and standard deviation of round trip time (RTT) between the source55 node and the intermediate destination node (s) and the average round trip time between56 the source node and the ultimate destination node. The average and standard deviation are57 calculated over all fragments sent/received between the source nodes and the (intermediate/58 ultimate) destination node.59 The output format is as follows: (Note that the values do not correspond to any trace file).60 The IP address of the source node: 192.168.1.1261 The IP address of ultimate destination node: 12.216.216.262 The IP addresses of the intermediate destination nodes:63 router 1: 24.218.01.102,64 router 2: 24.221.10.103,65 router 3: 12.216.118.1.6667 The values in the protocol field of IP headers:68 1: ICMP69 17: UDP707172 The number of fragments created from the original datagram is: 3273 The offset of the last fragment is: 36807475 The avg RTT between 192.168.1.12 and 24.218.01.102 is: 50 ms, the s.d. is: 5 ms76 The avg RTT between 192.168.1.12 and 24.221.10.103 is: 100 ms, the s.d. is: 6 ms77 The avg RTT between 192.168.1.12 and 12.216.118.1 is: 150 ms, the s.d. is: 5 ms78 The avg RTT between 192.168.1.12 and 12.216.216.2 is: 200 ms, the s.d. is: 15 ms7980 3.2 Requirement 2 (R2)81 Note: You can finish this part either with a python program or by manually col-82 lecting/analyzing data. In other words, coding is optional for the tasks listed in this83 section.84 From a given set of five traceroute trace files, all with the same destination address,85 • determine the number of probes per “ttl” used in each trace file,86 • determine whether or not the sequence of intermediate routers is the same in different trace87 files,88 • if the sequence of intermediate routers is different in the five trace files, list the difference and89 explain why,90 • if the sequence of intermediate routers is the same in the five trace files, draw a table as shown91 below (warning: the values in the table do not correspond to any trace files) to compare92 the RTTs of different traceroute attempts. From the result, which hop is likely to incur the93 maximum delay? Explain your conclusion.TTL Average RTT Average RTT Average RTT Average RTT Average RTTin trace 1 in trace 2 in trace 3 in trace 4 in trace 51 0.5 0.7 0.8 0.7 0.92 0.9 1 1.2 1.2 1.33 1.5 1.5 1.5 1.5 2.54 2.5 2 2 2.5 35 3 2.5 3 3.5 3.56 5 4 5 4.5 49495 4 Miscellaneous96 Important! Please read!97 • Same as in Assignment 2, you are not allowed in this assignment to use python packages that98 can automatically extract each packet from the pcap files. That means, you can re-use your99 code in Assignment 2 to extract packets.100 • Some intermediate router may only send back one “ICMP TTL exceeded” message for multiple101 fragments of the same datagram. In this case, please use this ICMP message to calculate RTT102 for all fragments. For example, Assume that the source sends Frag 1, Frag2, Frag 3 (of the103 same datagram, ID: 3000). The timestamps for Frag1, Frag2, Frag3 are t1, t2, t3, respectively.3104 Later, the source receives one “ICMP TTL exceeded” message (ID: 3000). The timestamp is105 T. Then the RTTs are calculated as: T − t1, T − t2, T − t3.106 • More explanation about the output format107 The number of fragments created from the original datagram is:108109 The offset of the last fragment is:110 If there are multiple fragmented datagrams, you need to output the above information for111 each datagram. For example, assume that the source sends two datagrams: D1, D2, where112 D1 and D2 are the identification of the two datagrams. Assume that D1 has three fragments113 and D2 has two fragments. Then output should be:114 The number of fragments created from the original datagram D1 is: 3115116 The offset of the last fragment is: xxx.117118119 The number of fragments created from the original datagram D2 is: 2120121 The offset of the last fragment is: yyy.122123 where xxx and yyy denote the actual number calculated by your program.124 • If the tracefile is captured in Linux, the source port number included in the original UDP125 can be used to match against the ICMP error message. This is due to the special traceroute126 implementation in linux, which uses UDP and ICMP. If the tracefile is captured in Windows,127 we should use the sequence number in the returned ICMP error message to match the sequence128 number in the ICMP echo (ping) message from the source node. Note that this ICMP error129 message (type 11) includes the content of the ICMP echo message (type 8) from the source.130 This is due to the special traceroute implementation in Windows, which uses ICMP only131 (mainly message type 8 and message type 11). It is also possible that traceroute may be132 implemented in another different way. For instance, we have found that some traceroute133 implementation allows users to select protocol among ICMP, TCP, UDP and GRE. To avoid134 the unnecessary complexity of your program, you only need to handle the two scenarios135 in finding a match between the original datagram and the returned ICMP error136 message: either (1) use the source port number in the original UDP, or (2) use the137 sequence number in the original ICMP echo message. You code should automatically138 find out the right case for matching datagrams in the trace file. We will not test your code139 with a trace file not falling in the above cases.140 5 Deliverables and Marking Scheme141 For your final submission of your assignment, you are required to submit your source code to142 brightspace. You should include a readme file to tell TA how to compile and run your code. In4143 addition, you are required to submit a pdf file for your solution of R2. Use %tar -czvf command in144 linux.csc.uvic.ca to generate a .tar file and submit the .tar file. Make sure that you use %tar -xzvf145 command to double-check if you have included all the files before submitting the tar file. Note that146 your code will be tested over linux.csc.uvic.ca.147 The marking scheme is as follows:Components WeightThe IP address of the source node (R1) 5The IP address of ultimate destination node (R1) 5The IP addresses of the intermediate destination nodes (R1) 10The correct order of the intermediate destination nodes (R1) 5The values in the protocol field of IP headers (R1) 5The number of fragments created from the original datagram (R1) 15The offset of the last fragment (R1) 10The avg RTTs (R1) 10The standard deviations (R1) 5The number of probes per ttl (R2) 10Right answer to the second question (R2) 5Right answer to the third/or fourth question (R2) 10Readme.txt 5Total Weight 100148149 6 Plagiarism150 This assignment is to be done individually. You are encouraged to discuss the design of your solution151 with your classmates, but each person must implement their own assignment.152 The End ...

April 1, 2022 · 9 min · jiezi

关于算法:CSSE20027023

School of ITEECSSE2002/7023 — Semester 1, 2021Assignment 1Due: 1 April 2021 16:00 AESTRevision: 1.0.0AbstractThe goal of this assignment is to implement and test a set of classes and interfaces1to be used inthe second assignment.Language requirements: Java version 11, JUnit 4.Please carefully read the Appendix A Document. It outlines critical mistakeswhich you must avoid in order to avoid losing marks. This is being heavily emphasisedhere because these are critical mistakes which must be avoided.If at any point you are even slightly unsure, please check as soon as possible withcourse staff !PreambleAll work on this assignment is to be your own individual work. As detailed in Lecture 1, codesupplied by course staff (from this semester) is acceptable, but there are no other exceptions. Youare expected to be familiar with “What not to do” from Lecture 1 and https://www.itee.uq.edu.au/itee-student-misconduct-including-plagiarism. If you have questions about whatis acceptable, please ask course staff.All times are given in Australian Eastern Standard Time. It is your responsibility to ensure thatyou adhere to this timezone for all assignment related matters. Please bear this in mind, especiallyif you are enrolled in the External offering and may be located in a different time zone.IntroductionIn this assignment, and continuing into the second assignment, you will build a simple simulationof an air traffic control (ATC) system. The first assignment will focus on implementing the classesthat provide the core model for the system.An ATC system manages the movement of aircraft around an airport, including assigningaircraft to terminals and gates, and managing situations such as emergencies. An ATC systemalso maintains various queues including a landing queue, and takeoff queue (more about this inassignment 2).Terminals each contain a number of Gates, and can be designed to only cater to specific typesof aircraft, such as Airplanes or Helicopters.An aircraft can be either an Airplane or a Helicopter. An Aircraft can have two primarypurposes; they can either be a Passenger aircraft, or a Freight aircraft.Each aircraft has various characteristics which are known to the ATC system including theaircraft’s empty weight, fuel capacity, passenger capacity, and freight capacity. These characteristicscan determine how long it can take to complete various actions such as boarding passengers1From now on, classes and interfaces will be shortened to simply “classes”1or freight, or refueling the aircraft. The total weight of an aircraft can change depending on theoccupancy level (passengers or freight) of the aircraft (for example, more passengers means theaircraft will be heavier).Each aircraft also has a list of tasks that it completes. Tasks represent the actions or statesthat an aircraft can take including: being away from the airport (for assignment simplicity weonly care the aircraft is in an away state, we do not care what it might be doing in this time),waiting to land, waiting to take off, loading (i.e. passengers or freight), and waiting idle at thegate. Tasks can take different amounts of time depending on characteristics of the aircraft, andthe requirements of each recorded task (for example, loading takes longer when there are morepassengers or freight).The list of tasks which an aircraft has must follow a strict set of requirements. The tasks canonly occur in a specific order (for example, it would not make logical sense to be waiting in theair in a landing queue, and then takeoff – the aircraft must land first). Aircraft also complete atask list in a circular manner. Once the aircraft has completed the last task in the list, it will thencommence the first task in the list again. This behaviour will go on forever (this simplificationof aircraft behavior is for assignment simplicity. A real aircraft would have much more complexschedules, but this is way beyond the scope of this course).Supplied Material❼ This task sheet.❼ Code specification document (Javadoc).2❼ Gradescope, an online website where you will submit your assignment.3❼ An empty assignment solution (i.e. skeleton code) is provided in your Subversion repository.These files provide a minimal framework for you to work from, and build upon. These fileshave been provided so that you can avoid (some of) the critical mistakes described in theAppendix. Each of these files:– is in the correct directory (do not change this!)– has the correct package declaration at the top of the file (do not change this!)– has the correct public class or public interface declaration. Note that you may still needto make classes abstract, extend classes, implement interfaces etc. as detailed in thespecifications.As the first step in the assignment (after reading through the specifications) you should checkoutthe ass1 repository from Subversion. Once you have created a new project from the repositoryyou have checked out, you should start implementing the specifications.JavadocCode specifications are an important tool for developing code in collaboration with other people.Although assignments in this course are individual, they still aim to prepare you for writingcode to a strict specification by providing a specification document (in Java, this is called Javadoc).You will need to implement the specification precisely as it is described in the specification document.The Javadoc can be viewed in either of the two following ways: ...

April 1, 2022 · 15 min · jiezi

关于算法:INFO1113-Flight-Scheduler

INFO1113 Assignment 1Due: 23 April 2021, 11:59PM AESTThis assignment is worth 12% of your final grade.Task Description – Flight SchedulerIn this assignment, you will create a Flight Scheduler application in the Java programming language. Theprogram will be a tool for airlines to use to schedule flights between different locations, producingtimetable plans, and an easy way to check routing between cities on multiple flights. You must create atleast three classes: FlightScheduler, Flight and Location, for which a scaffold and description have beenprovided to you. The FlightScheduler class will contain the main entry point of the application (static mainfunction).You are encouraged to ask questions on Ed under the assignments category if you are unsure of thespecification – but staff members will not be able to do any coding or debugging in this assignment foryou. As with any assignment, make sure that your work is your own, and do not share your code orsolutions with other students.Working on your assignmentYou can work on this assignment on your own computer or the lab machines. It is important that youcontinually back up your assignment files onto your own machine, external drives, and in the cloud.You are encouraged to submit your assignment on Ed while you are in the process of completing it. Bysubmitting you will obtain some feedback of your progress on the sample test cases provided. INFO1113Page 2 of 17Implementation detailsWrite a program in Java to implement the Flight Schedular application that accepts input from the uservia standard input. The terminal interface allows the user to interact with the program, to give it inputand receive output. The available commands are described below in the section ‘Commands’.There are three main classes you must implement, but you may also create more if you wish.FlightScheduler classThis class will contain the main entry point of your program (static main function) and store links to all thedata relevant to the application. It will be a container for the flight schedule, which is made up of a list ofFlights. It should also contain a list of Locations.The flight schedule is only a single week, Monday to Sunday, which repeats. Assume all times are in UTC,so you do not have to account for timezone differences at different locations.Flight classThe Flight type should contain all data relevant to a particular flight, methods that perform operations ona Flight or multiple Flights. Attributes will be the flight ID, departure time, source and destinationlocations, capacity, ticket price, number of passengers booked, and anything else you think is relevant.Flight duration is determined by the distance between the start and end locations, calculated using theHaversine Formula, and assuming the average speed of an aircraft is 720km/h. The initial ticket price iscalculated using an average cost of $30, plus 4x the demand coefficient differential between locations,per 100km distance. For example, if the starting location has demand coefficient of -1 and the end has -1,it remains $30 per 100km. If the starting location has -1 and the end has 1, then it’s $38 per 100km. If thestarting location has 1 and end has -1, it would be $22 per 100km.Ticket price changes when the flight starts to fill up. For the first 50% of seats, the price decreases linearlyto 80% of its original value by the time the flight is half full. For the next 20% of seats, the price increaseslinearly back to 100% of its original value. For the last 30% of seats, ticket price increases by an inversetancurve to 110% of its original value, except for the last 10 seats, which each will increase the price by1% each time they are booked. Seat proportions are to be rounded down. = −0.4 + 1, 0 < ≤ 0.5 + 0.3, 0.5 < ≤ 0.70.2× tan(20 − 14) + 1 , 0.7 < ≤ 1 = ×100 × (30 + 4 − )whereT = ticket pricey = multiplier for ticket price to determine current valuex = proportion of seats filled (booked/capacity)d = flight distance in kilometres (haversine formula result)Dto = demand coefficient for destination locationDfrom = demand coefficient for starting location INFO1113Page 3 of 17Location classThe Location type should contain all data relevant to a particular location, and methods that performoperations on a Location or multiple Locations. Attributes will be the location name, latitude andlongitude coordinates, lists of arriving and departing flights, and a demand coefficient. Location namesmust be unique (case insensitive). Latitude must be within [-85, 85] and longitude must be within [-180,180], both in degrees. The demand coefficient is a number between -1 and 1 (inclusive) whichrepresents whether there is a net inflow or outflow of passengers from this location (negative meanspassengers want to leave, positive means they want to come). It factors into the calculation thatdetermines the ticket price for a particular flight.Assume each location has only one runway – that is, no flights can be scheduled to arrive or depart withinan hour of another at a particular location. Multi-runway airports can be represented by multiplelocations in such a system (eg. Heathrow-1, Heathrow-2, etc). INFO1113Page 4 of 17CommandsFLIGHTS - list all available flights ordered by departure time, then departure location nameFLIGHT ADD <departure time> <from> <to> <capacity> - add a flightFLIGHT IMPORT/EXPORT <filename> - import/export flights to csv fileFLIGHT <id> - view information about a flight (from->to, departure arrival times, current ticket price,capacity, passengers booked)FLIGHT <id> BOOK <num> - book a certain number of passengers for the flight at the current ticket price,and then adjust the ticket price to reflect the reduced capacity remaining. If no number is given, book 1passenger. If the given number of bookings is more than the remaining capacity, only accept bookingsuntil the capacity is full.FLIGHT <id> REMOVE - remove a flight from the scheduleFLIGHT <id> RESET - reset the number of passengers booked to 0, and the ticket price to its original state.LOCATIONS - list all available locations in alphabetical orderLOCATION ADD <name> <lat> <long> <demand_coefficient> - add a locationLOCATION <name> - view details about a location (it’s name, coordinates, demand coefficient)LOCATION IMPORT/EXPORT <filename> - import/export locations to csv fileSCHEDULE <location_name> - list all departing and arriving flights, in order of the time they arrive/departDEPARTURES <location_name> - list all departing flights, in order of departure timeARRIVALS <location_name> - list all arriving flights, in order of arrival timeTRAVEL <from> <to> [sort] [n] - list the nth possible flight route between a starting location anddestination, with a maximum of 3 stopovers. Default ordering is for shortest overall duration. If n is notprovided, display the first one in the order. If n is larger than the number of flights available, display thelast one in the ordering.can have other orderings:TRAVEL <from> <to> cost - minimum current costTRAVEL <from> <to> duration - minimum total durationTRAVEL <from> <to> stopovers - minimum stopoversTRAVEL <from> <to> layover - minimum layover timeTRAVEL <from> <to> flight_time - minimum flight timeHELP – outputs this help string.EXIT – end the program.Note: All commands may be case insensitive.However Location names when stored in the location class, should display the name as initially given. INFO1113Page 5 of 17Travel commandSince the schedule is weekly and wraps around, you need to consider the possibility of a flight arriving onSunday evening potentially connecting with a flight that departs on Monday morning. As such, you mayignore available seat capacity selecting a flight in a potential route, since it is assumed that the currentbookings are only for the current week, and this flight route may be used to show results for travellers insubsequent weeks, looking to make a booking later on. However, the ticket prices and overall route costshould depend on the current booking numbers of each flight, since we are assuming that the currentbooking demand is a good indicator of future demand, so ticket prices will be similar in the future to whatthey are now.The TRAVEL command has 5 potential orderings, detailed below. If the primary sorting property is equalbetween two flight paths, it will fall back to the following secondary and tertiary sorting properties. It isassumed that current cost will never be equal for two flight paths (this may not be true in practice, butthen any secondary ordering is fine). If total duration is equal, sort then by minimum current cost. Total duration is the time taken frominitial departure of the first flight, to finally arriving at the destination. ...

April 1, 2022 · 19 min · jiezi

关于算法:EN3085-Assessed

EN3085 Assessed Coursework 1 The purpose of this coursework is to design a class that will allow programmers to manipulate univariate quadratic functions. These functions can be represented by three parameters in three different forms:-A Standard form: f(x) = a x2 + b x + c, where a, b, and c are real numbers.-A Factored form: f(x) = a (x - R1)( x - R2), where a is a real number and R1 and R2, the roots of the quadratic function (solutions of f(x)=0), can be either real or complex numbers. -A Vertex form: f(x) = a (x - h)2 + k, where the vertex coordinates h and k are real numbers.1.Create a class QuadraticF to represent univariate quadratic functions. [5]An object of that class should:0a)Store a univariate quadratic function in an efficient way (optimum use of memory storage)b)Have a constructor that takes 4 arguments: a character indicating in which form the three other arguments should be interpreted (‘S’ for Standard, ‘F’ for Factorised or ‘V’ for Vertex) and the three arguments fully defining the quadratic function in the specified form ((a, b, c ), (a, R1, R2) or (a, h, k)).c)Be able to return each individual elements of the stored quadratic function from the three possible forms (Standard: a, b, c; Factorised: a, R1, R2; Vertex: a, h, k).d)Allow the modification of individual “standard form” elements of the quadratic function (a, b, c).Using the template provided (main () function), briefly demonstrate the use of this class and of all the created member functions by creating and manipulating a single object of that class. 2.Add to the class functions that display on the screen the stored quadratic function. [3]Create three member functions to display each form separately: ax²+bx+c, a(x-h)²+k and a(x-R1)(x-R2)Create one member function to display all forms together: f(x)=ax²+bx+c=a(x-h)²+k=a(x-R1)(x-R2)Using the template provided (main () function), demonstrate the use of these member functions. 3.Create a function GetFunctionsFromFile(filename): [3]The function should:a)Read all quadratic functions recorded in a text file. Each data line in the text file always consists of one of the following, to define a quadratic function:oa letter ‘S’ followed by the three values defining the standard form (a, b, c)oa letter ‘F’ followed by the three values defining the factorised form (a, R1, R2)oa letter ‘V’ followed by the three values defining the Vertex form (a, h, k)b)Create for each of them an object of type QuadraticF and store the created objects in a vector to obtain a list of all quadratic functions defined in the file.Using the template (main () function) and the text file (FunctionsList.txt) provided, briefly demonstrate the use of this function by displaying on the screen all quadratic functions extracted from the text file and stored in the vector. For each quadratic function, all forms should be displayed. 4.Create two overloaded operators: [3]Create an overloaded operator "+", which adds one QuadraticF object to another and return the resulting new QuadraticF object. Thus, enabling the following type of operations: Funct3 = Funct1 + Funct2.Create an overloaded operator "", which multiplies one QuadraticF object by an integer and returns the resulting new QuadraticF object. Thus, enabling the following type of operations: Funct4 = 4 Funct3.Using the template provided (main() function), demonstrate the use of these functions by calculating the sum of all functions extracted from the text file (FunctionsList.txt) and by multiplying the resulting QuadraticF object by 4. Display on the screen the resulting QuadraticF functions (using the three forms) resulting from both the sum and the multiplication operation.5.Explain briefly the structure of your class (Maximum 500 Words), in terms of memory usage and “user friendliness”, how you tested it and highlight any identified issues. [1]Notes on univariate quadratic functions.1) Finding the Factored form, f(x) = a (x - R1)( x - R2), using the standard form f(x) = a x2 + b x + c, (if a ≠ 0):Calculate the discriminant D=b2-4ac. Then:-If D ≥ 0 R1 and R2 are real numbers. and -If D < 0, R1 and R2 are complex numbers and 2) Finding the vertex form, f(x) = a (x - h)2 + k, using the standard form f(x) = a x2 + b x + c, (if a ≠ 0): and Submission ProcedureYou should solve the problems independently from other students and submit only your own work. Submit your solution on “learning central” using the Coursework Answer Sheet provided by 18:00, Friday 26/03/2021 (week 8).Include your answer to Question 5 in section one of the Coursework Answer Sheet. Then, directly from Microsoft Visual C++, copy and paste all your code in the relevant section of the Coursework Answer Sheet. Add a screen shot of the window showing what is printed on the screen when running your program (for example using Microsoft Snipping Tool).All developed code should be included in the 3 files provided: QuadraticF.h, QuadraticF.cpp and Main.cpp and uploaded on “learning central” together with the Coursework Answer Sheet to facilitate our testing of your program. Any incomplete submission will be considered late.Marking schemeQuestion1 - 5 Points ; Question2 - 3 Points ; Question3 - 3 Points ; Question4 - 3 Points ; Question5 - 1 PointsMarks will be awarded for: • A correctly functioning program. The program should operate according to the specification. • An efficient program and elegant algorithms. Try to develop algorithms, which are efficient in terms of the amount of data, which needs to be stored (e.g. minimum number of variables used), and the speed in which they operate. • A user-friendly program. When your program runs, the messages on the screen should be easy to understand and succinct.• A well commented program. The judicious use of commenting is essential if somebody else is to easily understand your program.With that in mind, each coding question will be marked using the following scheme ...

March 31, 2022 · 6 min · jiezi

关于算法:求解COP-5536

COP 5536 Spring 2021Programming ProjectDue Date: Apr 6th, 2021, 11:59 pm EST GeneralProblem descriptionThe primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storagecontext — in particular, file systems. In this project, you’re asked to develop and test a small degree B+tree used for internal-memory dictionaries (i.e. the entire tree resides in main memory). The data isgiven in the form (key, value) with no duplicates, you are required to implement an m-way B+ tree tostore the data pairs. Note that in a B+ tree only leaf nodes contain the actual values, and the leavesshould be linked into a doubly linked list. Your implementation should support the followingoperations:Initialize (m): create a new m-way B+ treeInsert (key, value)Delete (key)Search (key): returns the value associated with the keySearch (key1, key2): returns values such that in the range key1 <= key <= key2Programming EnvironmentYou may use either Java or C++ for this project. Your program will be tested using the Java or g++/gcccompiler on the thunder.cise.ufl.edu server. So, you should verify that it compiles and runs as expectedon this server, which may be accessed via the Internet.Your submission must include a makefile that creates an executable file named bplustree.Input and Output RequirementsYour program should execute using the followingFor C/C++:$ ./ bplustree file_nameFor Java:$ java bplustree file_nameWhere file_name is the name of the file that has the input test data.Input FormatThe first line in the input file Initialize(m) means creating a B+ tree with the order m (note: m may bedifferent depending on input file). Each of the remaining lines specifies a B+ tree operation. Thefollowing is an example of an input file:Initialize(3)Insert(21, 0.3534)Insert(108, 31.907)Insert(56089, 3.26)Insert(234, 121.56)Insert(4325, -109.23)Delete (108)Search(234)Insert(102, 39.56)Insert(65, -3.95)Delete (102)Delete (21)Insert(106, -3.91)Insert(23, 3.55)Search(23, 99)Insert(32, 0.02)Insert(220, 3.55)Search(33)Delete (234)Search(65)You can use integer as the type of the key and float/double as the type of the value.Output FormatFor Initialize, Insert and Delete query you should not produce any output.For a Search query you should output the results on a single line using commas to separate values. Theoutput for each search query should be on a new line. All output should go to a file named“output_file.txt”. If a search query does not return anything you should output “Null”.The following is the output file for the above input file:121.563.55,-3.95Null-3.95SubmissionDo not use nested directories. All your files must be in the first directory that appears after unzipping.You must submit the following:Makefile: You must design your makefile such that ‘make’ command compiles thesource code and produces executable file. (For java class files that can berun with java command)Source Program: Provide comments.REPORT:· The report should be in PDF format.· The report should contain your basic info: Name, UFID and UF Email account· Present function prototypes showing the structure of your programs. Include the structure of yourprogram.To submit, please compress all your files together using a zip utility and submit to theCanvas system. You should look for Assignment Project for the submission.Your submission should be named LastName_FirstName.zip.Please make sure the name you provided is the same as the same that appears on theCanvas system. Please do not submit directly to a TA. All email submissions will beignored without further notification. Please note that the due day is a harddeadline. No late submission will be allowed. Any submission after the deadlinewill not be accepted.Grading PolicyGrading will be based on the correctness and efficiency of algorithms. Below are some details ofthe grading policy.Correct implementation and execution: 60%Note: Your program will be graded based on the produced output. You must make sure to produce thecorrect output to get points. Besides the example input/output file in this project description, there aretwo extra test cases for TAs to test your code. Each one of the test case contributes 20% to the finalgrade. Your program will not be graded if it can not be compiled or executed.You will get 0 point in this part if your implementation is not B+ tree.Comments and readability: 15%Report: 25%You will get 10% points deducted if you do not follow the input/output or submission requirementsabove. In addition, we may ask you to demonstrate your projects.Miscellaneous• Do not use complex data structures provided by programming languages. You have toimplement B+ tree data structures on your own using primitive data structures such as pointers.You must not use any B tree / B+ tree related libraries.• Your implementation should be your own. You have to work by yourself for this assignment(discussion is allowed). Your submission will be checked for plagiarism.

March 31, 2022 · 4 min · jiezi

关于算法:阿里算法浙大博士带你写项目经历

简介:阿里算法,浙大博士带你写我的项目经验! 简历模块校招应届生的简历次要包含:根本信息、教育背景、工作/实习经验、科研/我的项目经验、荣誉/获奖名称,组织/社团经验以及其余集体评估等。其中后面四项是必选项,也是招聘者最关怀的信息。荣誉/获奖名称和组织/社团经验能够作为应聘者综合能力的额定补充。集体评估等信息通常不会特地受关注,有利于应聘的信息能够放上去。 根本信息:给出的联系方式要尽量确保能够即时分割,否则容易耽搁重要信息。教育背景:如果在校期间的问题比拟不错或者研究生期间的导师或者实验室十分出名的话,也能够在简历外面体现进去。工作/实习经验:优良的实习经验是十分加分的,能够具体地形容一下实习期间本人次要的工作内容和成绩。科研/我的项目经验:随着大学科研实力的晋升,校招生领有科研或者我的项目经验变得越来越广泛。对于一些技术要求比拟高的岗位(例如算法岗),我的项目钻研经验也是十分重要的。论文专利或者实验室参加的横向我的项目都能够应聘者在某一个畛域的业余水平,有相干经验能够详细描述一下。 我的项目经验1. STAR法令 形容我的项目是一份简历外面最外围也是最头疼的中央。好的我的项目形容很清晰地就能让别人看到应聘者的奉献和能力,反之差的我的项目形容给人感觉就是他做了这件事件但如同又不晓得做了什么事件。 而我的项目形容的外围指标就是答复分明三个问题: (1)为什么要做这件事? (2)采取了什么行为来做这件事? (3)最终获得了什么成绩? 一个十分驰名的法令就叫做STAR法令,STAR是Situation(情景),Task(指标), Action(行为),Result(后果)四个单词的首字母: Situation: 我的项目产生的背景是什么?Task: 我的项目的工作是什么,要达到什么样指标?Action: 面对这个工作采取了什么的解决方案?Result:(量化)形容最终获得了什么样的成绩?2. 具体案例 具体咱们能够通过一个理论案例(曾经取得作者的受权)来了解,作者参加了一个BERT预训练网络QAT量化钻研的我的项目。 .png") Situation: 红色框阐明了我的项目的背景,因为模型参数量很大所以难以部署。Task: 蓝色框阐明了我的项目指标,须要对模型进行量化来减速推理。Action: 绿色框是具体采纳了什么样的行为来达到目标,通常会以动词进行结尾,这里体现一些的关键词会是面试过程中重点探讨的内容。Result: 深蓝色框是最终获得的后果,最终在精度无损的状况下减速了四倍,在形容获得的成绩时,举荐尽量「量化」地去形容,主观的数据相比于主观的形容会更加具备说服力。从这个案例咱们能够看到,整个形容的字数用的并不多,然而没有一句话是多余。并且逻辑非常清晰,分成了三条来形容STAR的四点内容。与之绝对应,背面例子就是花了一大段篇幅来形容一个我的项目,却表白不分明本人做了什么。 3. 三个问题 在形容一个我的项目的时候咱们能够对照一下STAR法令,看看这几个关键点是否都表白分明了。之后咱们能够在自问一下本人以下三个问题: 是观点还是行为?是本人的行为还是他人的行为?是概括的总结还是具体的事件?如果以上三个问题答复的都是红色的选项,就阐明我的项目形容是合格的。 简历是一个人过往经验的出现,平时付出了致力都会有回报,置信学弟学妹们都能在简历外面把本人100分的程度展示进去,都能在校招季获得现实的offer。 流动初衷阿里巴巴提倡“人人公益3小时”,以此来激发大家心田的善念,并通过集体的业余度,来开释善能。咱们组建了一个民间公益组织,叫做焚烧吧少年幸福团,就是心愿找到更多有这份善心及善能的“学长学姐”,基于各自的行业积攒,业余实际和生存经历,为关注求职及个人成长的高校同学们及集体开发者,提供力不从心的信息和帮忙。同时,我始终感觉,这样的分享、对话是平等互利的。咱们能在繁碌之余,有机会和年老的灵魂产生碰撞,看到更多优良的人们在奋力前行,也是一种美妙和幸福。 ——王可心,阿里巴巴,流动志愿者 除了简历领导,还提供阿里内推机会(报名地址): https://developer.aliyun.com/trainingcamp/6ab6d5ffa8964625ab41a941f97814fa?utm\_content=g\_1000328625 版权申明:本文内容由阿里云实名注册用户自发奉献,版权归原作者所有,阿里云开发者社区不领有其著作权,亦不承当相应法律责任。具体规定请查看《阿里云开发者社区用户服务协定》和《阿里云开发者社区知识产权爱护指引》。如果您发现本社区中有涉嫌剽窃的内容,填写侵权投诉表单进行举报,一经查实,本社区将立即删除涉嫌侵权内容。

March 30, 2022 · 1 min · jiezi

关于算法:27万只当前及未来仍将是量化交易的黄金期

近期,国务院金融委召开专题会议,全面、踊跃回应资本市场的担心和关切后,不仅加强了市场信念,市场情绪也在逐步企稳。 自2017年以来,国内量化私募基金在A股市场呈现出疾速倒退态势。截至3月28日,国内百亿量化私募数量已达30家,在百亿私募中占比逾25%(百亿私募有115家)。如此同时,在近5年的发行中,量化私募基金发行产品总数量达到了27317只,百亿量化产品占比达到27.99%。 趋势向好 近日,多家头部量化私募走漏,通过控制规模、策略迭代、裁减研发团队,近期量化股票产品的超额收益有所回升,模型体现趋势性向好。同时,一些头部量化私募通过自购的形式,来表白对市场的信念。 国内量化私募基金的倒退,可能失去市场的高度认可,次要体现在两个方面:一是量化私募实现超额收益;二是量化私募治理规模大幅晋升。 数据显示,2017年,国内量化私募基金共计发行总量为3236只;2018年发行2620只;2019年发行3914只;2020年再度放量,共计发行6261只;2021年首次冲破万只关口,发行10430只。今年以来,截至目前共计发行856只。 在量化私募基金获得优异业绩的同时,量化私募的治理规模也在同步晋升。目前量化私募治理规模近1.5万亿元,相比此前有很大的增长。 以量化私募为代表的量化交易,是使用对历史数据进行开掘建设的统计学模型进行交易,从而赚取超额收益。近些年,量化交易作为一种投资伎俩可能获取较高的收益,其底层逻辑在于以后国内A股市场还处于有效稳定较大阶段,与国外成熟资本市场相比,还存在较多的超额收益。因而,以后及将来,仍将是量化投资的黄金期。 然而,量化交易得以疾速倒退,取决于多方面因素。一是需要因素。2018年资管新规落地后,大量非标金融产品到期转化,高净值客户被动寻求优质投资工具,市场需求旺盛;二是人才因素。大量海内量化人才回国倒退,把先进的教训和技术带到了国内市场;三是技术因素。随着大数据、人工智能等IT技术的疾速倒退,为量化交易提供了技术根底;四是资金因素。近两年量化私募产品业绩广泛优于股票多头,大量资金涌入了量化赛道。 采取措施 近两年,在量化私募规模一直晋升下,一些百亿量化私募为了保障产品收益,采取封盘、升高交易频率等措施。 某国内量化机构示意,降频的目标是为了适应以后的市场格调,因为短期格调激烈变动,行业轮转放慢,导致大规模量化因子失灵,甚至模型生效,在此背景下,高换手率一直转化投资方向有可能会一直犯错,则适度降频是适应市场之举。目前市场中的高频量化策略同质化重大,机构间策略竞争激励,适度转向中低频策略有利于量化机构保障收益。 相比传统的交易方式,量化交易堪称是“一台没有感情的交易机器”,不受情绪影响,疾速解决大量信息,下单果决疾速。每一个模型,每一次下单,都要基于大量数据和计算的后果。 有人说,量化交易从业者不再是西装革履的“金融范儿”,而更像是一个工程师。每个人盯好几个屏幕,但屏幕上通常不是股票盘面和K线图,而是满屏代码。他们多是数学、物理、金融工程等偏理工科业余,高学历、高智商。 快是量化交易的根本要求,不仅要执行策略快、下单速度快,还要收集、解决数据的速度快。与此同时,对网速和计算速度也有着极高的要求,配置超级计算机的量化私募也并不常见。 除此之外,量化机构对人才的要求十分高,即“在寰球范畴内寻找聪慧的脑袋”。他们对人才的需要方向次要是顶级工程师,985高校理工科是根本条件,顶级名校顶级院系,智商爆表,数学、物理、计算机比赛金银牌才是加分项。 刚强后盾 当初全国疫情处于攻坚阶段,疫情防控局势依然严厉。非凸科技采取多项工作措施,全力保障疫情期间的技术服务,反对机构零碎维稳工作。任何时候有须要,请@非凸科技。 非凸科技以算法交易执行切入到量化交易畛域,凭借齐备的算法研发体系,弱小的软件自主开发能力,为券商,量化私募等泛滥大型金融机构保驾护航,提供优质的算法交易解决方案。 非凸科技崇尚技术钻研,基于Rust生态打造了高效率、低提早、高牢靠、全内存高频交易平台。将高并发低提早设计落地于交易系统的每个环节,从海量数据模型训练到回测剖析,从行情采集到实时信号预测,再到报单指令执行,均采纳当先业界的技术实现。 非凸科技器重人才,深信人才是技术之本、翻新之源。外围团队来自于国内外顶尖院校,领有国内外出名量化基金公司的从业经验。他们是计算机科学家、数学家、量化专家等。 非凸科技唯才是举,为你提供自由发挥的空间。提供业内有竞争力的薪酬和激励,让你的每一份致力都播种称心的回报;帮你造就谨严、高效、粗疏的职业习惯,为你未来的倒退铺平道路;提供全面深刻的职业造就打算,帮你成为求实、进取、与时代气味相投的工程师;提供体贴入微的人文关心,让你在非凸的每一天都能享受工作。 非凸科技是一家年老的金融科技公司,这里有扁平化的架构,有坦诚、凋谢、融洽的气氛;也是一家自在的公司,这里有多元和容纳的工作环境,激励不同背景的思维碰撞,产生更多翻新的火花。 非凸科技诚挚邀你退出,共创量化将来!

March 30, 2022 · 1 min · jiezi

关于算法:下拉推荐在-Shopee-Chatbot-中的探索和实践

本文首发于微信公众号“Shopee技术团队”。摘要在支流的搜索引擎、购物 App 和 Chatbot 等利用中,下拉举荐能够无效地帮忙用户疾速检索所须要的内容,曾经成为一项必须且标配的性能。本文将介绍 Shopee Chatbot 团队在 Chatbot 中从 0 到 1 构建下拉举荐性能的过程,并分享模型迭代优化的教训。 特地地,针对东南亚市场语种繁多的挑战,咱们摸索了多语言和多任务的预训练语言模型,并将其利用于下拉举荐中的向量召回,以优化召回成果。另一方面,为了使下拉举荐尽可能帮忙用户,并解决用户的问题,咱们针对用户点击与问题解决这两个指标进行了同时建模,在多指标优化方面也做了摸索。 1. 业务背景1.1 Shopee Chatbot 随着 Shopee 业务的扩张,消费者对客服征询的需要一直攀升。Shopee Chatbot 团队致力于基于人工智能技术打造 Chatbot 与人工客服 Agent 的有机联合,通过 Chatbot 来解决用户日常的征询诉求,给用户提供更好的体验,缓解和加重人工客服的压力,也帮忙公司节俭大量人力资源老本。目前,咱们曾经在多个市场上线了 Chatbot。如上图所示,用户能够通过 Shopee App 中的 Mepage 体验咱们的 Chatbot 产品。 咱们也在继续一直地打磨 Shopee Chatbot 产品,加强其性能,给用户提供更好的体验,帮忙用户解决购物过程中所遇到的问题。在 Shopee Chatbot 的泛滥性能中,下拉举荐是其中一个重要的性能。 1.2 下拉举荐 下拉举荐,又名输出倡议、搜寻倡议、主动补全或问题举荐等,曾经成为支流搜索引擎、购物 App 和 Chatbot 等泛滥产品里的一项必须且标配的性能。其大抵性能为:在用户输出查问词的时候,显示与输出 query 语义相干的举荐 suggestion,供用户抉择。通过这种形式,它能够帮助用户更快地表白其想要检索的内容,进而帮忙用户疾速检索到所须要的内容。 在 Shopee Chatbot 中,咱们也心愿 Chatbot 具备下拉举荐的性能,从而能更快更好的解决用户的问题,晋升用户的购物体验。 2. 整体计划针对目前 Chatbot 的场景,为了使它具备下拉举荐的性能,咱们借鉴搜寻和举荐的场景,应用召回+排序的流程,如下图所示。针对用户以后的搜寻输出,找到最类似和最相干的 suggesiton,作为举荐倡议。为此,咱们须要搭建举荐候选池、多路召回以及排序模块。 2.1 举荐候选池2.1.1 构建流程 ...

March 29, 2022 · 4 min · jiezi

关于算法:CSCI120-Crossword-Blackout

CSCI-1200 Data Structures — Spring 2021Homework 6 — Crossword BlackoutIn this homework we will work with concepts inspired by crosswords, however we will not be following all therules that would be required by a proper “American-style” or “British-style” crossword. As such, you shouldread the entire handout carefully. Crosswords are quite popular, so there is lots of material available online.You may not search for, study, or use any outside code related to crossword puzzles. You are welcome tofind additional puzzles to experiment with and try to solve by hand.The basic goal is to consider a two dimensional grid in which each square either contains a letter or a blacksquare, and find one or all boards that meet some requirements (described below) and have all other squaresblacked out. Any sequence of two or more squares touching is considered a word. In order for a HW6 solutionto be valid it must contain no words shorter than 3 letters, and all words must be in a provided dictionary(more on this later). Words only run in two directions, across (meaning left-to-right) and down (meaningtop-to-bottom). A letter in a square can belong to an across word, a down word, or both an across and downword. A letter in a square is never part of two or more across words at the same time, or two or more downwords at the same time. Finally, in a real crossword grid, all the words must be “connected”, and blacksquares must be placed to ensure symmetry, but neither of those are requirements for the baseline HW6.A full-size example puzzle (left) and solution (right) might look like:Source: https://www.sporcle.com/games...Crossword Blackout ArgumentsYour program will accept four (baseline) or five (extra credit) command line arguments. The extra creditwill be explained near the end of the handout, for now we focus on the baseline HW6 requirements.Execution looks like: ./a.out [dictionary file] [initial grid file] [solution mode] [output mode] [gc]Dictionary FileThe dictionary file consists of words that only use upper-case letters, with one word per line. For a solutionto be legal (allowed), all words in the solution must be in the dictionary file. Words in the dictionary areunique, and words can only appear up to one time per solution.Initial Grid FileThe initial grid file describes the puzzle to work with. There are three types of lines that appear in the input.There are no spaces in any of the lines. You should not make any assumptions about the order the differenttypes of lines will appear in the input. Black squares, represented by “#”, may appear in the input andoutput.• Comments: Comment lines start with “!” and should be ignored by your program• Length constraints: Constraint lines start with “+” followed by a positive integer.• Grid lines: Lines that are not comments and not constraint lines are one row of the input puzzle’sgrid. These are presented in order.Every puzzle should have one or more constraints which represent a required word length in the solution.Any legal solution must have one matching word per constraint, and every constraint must have a matchingword. For example if “+4” appears twice in the input file, then all legal solutions must have exactly two4-letter words.Solution ModeThe solution mode will either be one solution meaning you should only print up to one solution, or it will beall solutions meaning you should print all solutions that satisfy the inputs.Output ModeThe output mode will either be count only in which case you will only print the number of solutions youfound (just the first line of output from the example in the section below), or print boards in which case youshould print the count and print all solutions.Output FormattingAn example output for a puzzle with 3 solutions is:Number of solution(s): 3To ensure full credit on the homework server, please format your solution exactly as shown above. Solutionsmay appear in any order, but the first line must start with Number of solution(s): then a space and thenumber of solutions. Each solution should start with a line that says Board followed by one row of the boardper line, starting from the top row.Additional Requirements: Recursion, Order Notation, & Extra PuzzlesYou must use recursion in a non-trivial way in your solution to this homework. As always, we recommend youwork on this program in logical steps. Partial credit will be awarded for each component of the assignment.Your program should do some error checking when reading in the input to make sure you understand thefile format. IMPORTANT NOTE: This problem is computationally expensive, even for medium-sized puzzleswith too much freedom! Be sure to create your own simple test cases as you debug your program.Once you have finished your implementation, analyze the performance of your algorithm using order notation.What important variables control the complexity of a particular problem? The dimensions of the board (wand h)? The number of words in the dictionary (d)? The total number of spaces with a letter (l)? Thetotal number of blacked out spaces (b)? The number of constraints (c)? Etc. In your README.txt file writea concise paragraph (< 200 words) justifying your answer. Also include a simple table summarizing therunning time and number of solutions found by your program on each of the provided examples.You should include 1-3 new puzzles and at least 1 dictionary for each of your new puzzles that either helpedyou test corner cases or experiment with the running time of your program. Make sure to describe thesepuzzles in your README.You can use any technique we have covered in Lectures 1-14, Homework 1-5, and Lab 1-7. This means youcannot use STL pair, map, set, etc. on this homework assignment.You must do this assignment on your own, as described in the “Collaboration Policy & Academic Integrity”handout. If you did discuss this assignment, problem solving techniques, or error messages, etc. with anyone,please list their names in your README.txt file.NOTE: If you earn 7 points on the homework submission server for tests 3 through 10 by 11:59pm onWednesday, March 24, you may submit your assignment on Friday, March 26 by 11:59pm without beingcharged a late day.Extra CreditFor extra credit, your program should function the same as in the baseline case when given four arguments.However if a fifth argument is given, gc, then you should only print boards that are a “giant component”.What this means is that starting from any letter, you should be able to use a series of up, down, left, andright moves to reach all other letters in the board without having to go through a blacked out (“#”) square.Legal for baseline HW6, but not valid forextra credit if gc is given as 5th argument.There is no way to get from “LABRAT” to“RED” without touching a blacked out cell orusing a diagonal movement.Legal for baseline HW6 and extra credit, allletters can be reached by only going up, left,down, or right without having to use a blackedout cell. ...

March 29, 2022 · 6 min · jiezi

关于算法:广告主视角下的信息流广告算法探索

导读:广告主不能像广告平台一样,获取到比拟多的用户维度的曝光数据,并且在广告主侧获取不到端外新用户的特色。本次分享会着重讲一下哈啰出行作为广告主,是如何在这些挑战下进行信息流广告算法建模摸索的。 本次介绍会围绕上面四点开展: 信息流广告投放现状哈啰业务背景介绍广告主侧的算法优化计划将来方向信息流广告投放现状倒退历程 咱们在刷朋友圈、抖音、头条时,应该都看过信息流类型的广告。它是一种嵌入在媒体内容流中的广告模式,内容包含图片、图文、视频等等。它的次要个性是内容的价值性和原生性。对于受众来说,信息流广告可能为用户提供更多有内容,有价值的货色,而不是单纯的广告。所以它的内容植入和出现不会毁坏页面自身的谐和度,用户角度体验感也是比拟好的。 上图右边的图展现了信息流广告的发展史,信息流广告最早在2006年呈现在Facebook上,11年呈现在Twitter上,12年利用在微博,14年头条,15年朋友圈。直到16年进入了全面暴发阶段,百度、快手、UC等都相应地推出了信息流广告。上图右侧的图展现了搜寻广告、电商广告、信息流广告在整个市场上的占比。能够看出,从2015年到2021年信息流广告的占比逐年减少。到了2022年预计能够达到40.8%。目前信息流广告被少数广告主选用在拉新促活的用户增长伎俩,所以哈啰出行也抉择了信息流广告作为站外拉新的次要形式。 投放流程 上图是平台视角和广告主视角两个角度下的广告投放流程。 平台视角,用户在流量主侧产生浏览行为,流量主会向ADX(ad exchange广告实时竞价交易平台)发送广告申请,广告交易平台在接管到广告交易申请,接着会向DSP(demand side platform,广告投放平台)发送申请。DSP在承受到这个竞价申请后,外部会进行一系列包含从流量筛选到广告召回,而后排序、出价等操作。目前大家常说的广告算法,更多的是嵌入在DSP外部的召回,排序等算法,并且这种广告算法在业界内也比拟成熟。 广告主视角,这里指的是广告主针对线上投放所可能做的操作。一开始会通过一个竞价机制。当一个广告申请过去,对于这个申请带过去的用户,咱们来决定要不要参加对这个用户的竞价。这个竞价机制在以下四个方面进行了考量:转化状况,用户价值,曝光状况,还有其余的烦扰策略等等。在竞价机制后就到了投放机制,投放机制更偏差于线上的理论投放,蕴含了账户设置、异样监测、数据监控和主动投放等等。 哈啰业务背景介绍上面介绍一下哈啰出行外投业务背景。 哈啰外投倒退阶段 广告主投放能力的倒退历程会通过以下四个阶段: 第一个阶段是摸索阶段,在这个阶段广告主业务个别是刚刚起步,它须要通过投放广告来摸索市场。此时广告主所须要做的就是间接在平台下面开户。通过后期的摸索,验证了广告的投放成果之后就进入到第二阶段,投放初步阶段。这个阶段的目标是迅速占领市场,所以会在市场上投入大量的广告,晋升投放效率。在这个阶段广告主须要技术支持来进行后续的转化归因,监测体系,数据监控等等。提效之后就到了以降本为目标的倒退阶段。通过后期的大量的市场投放后,广告的获客老本会越来越高,这个时候须要对流量做精细化经营,所以须要更多的技术能力来撑持和实现降本的指标。当初倒退比拟成熟的有DMP,平台提供的人群治理API,包含前面我要提到的marketing API。经验过以上三步之后,就达到了成熟阶段,成熟期的指标就是智能化,领有全链路的算法和自动化实现,不再须要人工参加。目前哈啰曾经在倒退阶段,技术能力和数据能力都曾经比拟成熟。 外投零碎框架 上图是哈啰的外投零碎框架。在业务上对接的比拟大的三个渠道是巨量引擎,广点通和快手。 因为波及到一些接口对接,所以在服务端要建设一个对立接口网关,而后进入到存储层,存储层应用了业界比拟通用的组件包含:redis,MYSQL,HBASE,Elasticsearch等等。再通过数据层后,达到应用层。应用层次要列了三点,就是决策机制,自动化经营和归因机制。决策机制也是我前面次要介绍的重点。因为算法更多是作用在决策机制层。 广告主侧的算法优化计划第三局部具体讲一下下面提到的决策机制外面的算法优化计划。次要从三个方面进行介绍:广告打算维度,创意维度,以及竞价前的预判机制。 广告打算维度 首先是广告打算维度, 上图的上半局部图列出了用户从被广告曝光到完单的整体链路。以哈啰车主拉新为例,一个新用户须要通过曝光、点击/三秒曝光、注册成为哈啰用户、提交认证成车主这几步后,能力进行完单行为。对于广告主来说,完单才是最终能产生价值的行为。但目前对接的几家比拟大的渠道,都是以提交认证老本来作为获客老本。现业务上一个痛点是提交认证到完单的比例比拟低,大略百分之二三十左右。这对广告主是不利的,因为广告主破费了钱来拉用户,但用户在端内并没有产生价值,这部分就是有效的估算。上图中下半局部的图,联合了广告账户平台构造从新解释了下面提到的业务痛点。平台下面通用的广告账户构造是一个账户下蕴含着不同的广告组,不同的广告组又蕴含着不同的广告打算。 为了比拟形象地示意从提交认证到完单这部分比例比拟低的状况,能够看一下上图用红框和蓝框中的两个广告打算。下面这个广告打算是品质比拟低的广告打算。上面的是品质比拟高的。能够看到这两个打算在提交认证,也就是转化这一步都假如有四个人转化。但下面的这个打算,只有一个人完单,完单率只有25%。而上面这个广告打算的完单率达到了75%。很显著,上面这个广告打算的品质比下面的品质要高。针对这个问题,咱们进行了算法计划优化的摸索。 目前面临的第一个挑战是在线上起量的打算的量级比拟小。因为算法建模是基于数据,如果能用的数据量少,就会间接影响到后续建模的精确度。第二个挑战是咱们不能取得广告平台商的曝光点击和竞价等明细数据。针对这两个挑战,做了一个问题的转化,从打算品质辨认转化为劣质流量辨认,再转化为用户完单率预估的问题。 因为咱们的用户都是在广告打算上面转化的,所以最后的目标是进行打算品质辨认。但因为数据量等起因,将问题转化成了劣质流量辨认。而对于广告主来说劣质流量能够定义成没有产生价值的流量,所以问题就变成判断用户在提交认证之后是否可能完单。这样问题会简略很多,且尽管他可能在端外是新用户,但他在提交认证之后,咱们就能够取到他端内的画像数据特色,所以有足够的数据来解决这个问题。 上图是完单模型的建模思路。由数据分析,样本构建,特色选取,模型训练四局部形成。 在数据分析局部,咱们通过剖析发现,大部分用户从提交认证到产生完单行为的工夫距离是在七天之内的。如果超过七天他还没有完单,那大概率上就不会完单了,就变成了刚刚提到的劣质流量了。所以在样本构建局部,通过提交认证之后,是否能在七天之内完单这个逻辑来构建正负样本。上图的submit_pt代表的是用户提交认证的工夫。 然而车主是否完单,其实是受很多内部因素制约的。并且样本的数据量也是比拟小的,为了更贴合业务状况,进行了数据加强操作。将原始的用户维度采样加强为以订单维度采样,具体为在用户进行提交认证之后,将每次在发单页面有过拜访或者点击行为的日期作为基点来预测它之后七天内完单的概率。 而后是特色选取局部。特色选取应用了用户特色,环境特色,广告特色,工夫特色等特色。工夫特色应用了用户产生转化到浏览的工夫距离作为特色。 模型抉择遵循了奥卡姆剃刀原理,抉择了简略高效的lightGBM。 创意维度 在理论的业务下,广告优化师会因为不确定广告投放成果,在不同的账户或者不同的打算上面沉积大量类似创意,去测试其成果。这就导致线上会存在着大量的有效素材,他们并不能起量,然而会产生一些小额耗费,节约了估算。 并且类似的创意,因为不同的账户的历史体现不一样,所以广告平台的算法对类似创意预估出的分值可能会不一样,针对这个问题这边构建了一个预估新创意是否起量的模型,来领导广告优化师后续的计划调整。决定创意是否起量的因素是品质度。不同的渠道对品质度有着不同的偏重,从上方的表格能够看出,巨量引擎可能更侧重于成果的反馈。广点通更重视eCPM,百度则侧重于定向形式。对于广告主而言,定向形式和成果反馈是没方法干涉的,所以更多的是干涉eCPM。从下面列出的ecpm的公式能够看出,预估创意是否可能起量,更多的是偏差于ctr方面。所以这边列了三点,定向,创意,“户口”。“户口”是指账户的历史体现,比如说他在线上曾经投放了多少天,用户的转化和完单等数据。 上图展现了构建模型的挑战,第一个挑战点在最开始也介绍了,就是数据的制约,从左图上的自定义列,能够看到广告主可能拿到的一些数据,打算的估算以及左图上展示的数据都偏差于广告打算维度。对于一些数值信息,比方展示数据,转化数据等,广告主所可能拿到的数据也都是绝对粗粒度的。由右图所示,咱们只能拿到这一条广告打算上面的耗费、展现量、点击率等。针对用户维度的具体的数据,比方曝光、参竞数据等,广告主是拿不到的。 第二个挑战是新创意只有刚配置完的配置信息,短少后续投放的相干数据。 针对以上两个问题,次要是在构建样本和特色工程两个方面进行解决的,针对新创意没有相干投放数据的问题,解决办法是在样本构建时同时选取了新创意和老创意,新创意是可能学习到配置特色的重要性。老创意能够学习到更偏差于左边这张图的投放特色。通过这个形式让模型同时学到创意维度或者打算维度的配置数据和一些投放的数值特色。 特色工程中次要利用了特色穿插去获取更多的数据,由下图所示: 上图就是特色工程,通过特色穿插解决了数值型数据比拟少的问题。这张图的右边是创意ID,两头框出来的是特色工程比拟外围的局部。次要是做了三局部内容: 第一局部是将ID特色应用word2vector产生ID特色序列。一个创意属于一个打算下。一个打算则属于一个广告组下。所以从账户ID到广告组ID到打算ID,都是一对多的关系。而一个创意是由不同的素材形成的,不同的素材蕴含着不同的视频、封面、题目等。针对这部分ID特色就是做了一个ID特色序列,将他们展成文本序列,而后应用word2vector转化成向量。第二局部是针对投放的数值特色的解决形式。对于投放的数值特色局部以及配置参数特色局部,次要是进行了不同维度的特色穿插,比如说一个创意ID和一个打算ID穿插来拿到打算ID下相应的数值特色。在做了各种穿插之后,就拿到了不同的视频、封面、题目、打算ID上面的数据特色。第三局部是针对广告配置参数特色的解决形式。配置参数特色其实就是广告在进行投放时配置的定向参数的特色,比如说投放工夫,用户定向,投放城市等。解决形式与第二局部相似,也是通过穿插拿到打算的配置特色和创意的配置特色。通过这整个特色解决之后,会进行模型训练,最初咱们抉择了应用多分类模型。因为一开始在解决这个问题时,有尝试过回归,但回归预测进去成果不是很好,MSE特地高,所以前面将问题转化为多分类,相对来说多分类会比回归成果好很多,准确率也高很多。 上图是整体模型框架图,从下至上展现了数据从输出到输入,下半局部就是后面特色工程的汇总,最上面是特色输出,包含刚刚说过的数值特色,类别特色和ID特色。数值特色通过归一化,离散化后进行embedding。类别特色也是进行embedding。ID特色首先展成文本序列,而后通过word2vector产生向量。而后embedding产生的向量和word2vector产生向量这两局部同时输到模型外面,再通过一个concat层,最初应用softmax输入不同类别的概率。 上图是人工账户与算法操作账户的成果数据比照。蓝色的是人工账户,橙色的是算法操作账户。由图所示,不论是在转化老本或者首单老本,算法操作账户晋升比拟高的,大略可能升高到10~20%左右,成果还是比较显著的。 竞价前预判机制 这个机制更偏差于前置策略,也就是说一个用户过去,咱们可能决定对这个用户到底进不进行曝光,或者说有一些其余的烦扰用户品质分。 从上方的左图大家能够看到,当初业界比拟支流的针对老客拉活的操作是RTB,RTB电商做的比拟多。而对于新客次要是做RTA,因为RTA更偏差于流量屏蔽。对于老客和新客都实用的就是两头的穿插局部加强RTA,当初比拟支流的媒体,像腾讯,头条等都有接口可能反对的。针对两头穿插局部,咱们应用了因果推断的uplift模型构建了促活模型。 在构建样本时思考到了用户志愿,选取选信息流广告下转化的用户为正样本,天然转化的用户为负样本。uplift分值能够体现用户的志愿度,它是有须要内部的广告激励能力转化,还是它自身就有志愿转化。公式里的T代表是否存在广告干涉。而后根据uplift的分值从0到5将用户进行分档,0是曾经转化的用户,这部分用户咱们会间接屏蔽掉,不会对他们出价。1是天然转化,2~4为营销敏感度低、中、高用户,5是新用户,因为咱们没有方法拿到新用户的数据,所以咱们会返回最高的用户品质分。这个机制实现了用户价值分层阶梯出价买量,线上成果降本显著。 将来方向上面从前置策略和线上投放两方面介绍咱们的将来布局。 前置策略拉新场景下的后续指标,更偏重于精准屏蔽的模型,目前咱们只是针对端内曾经转化的用户进行屏蔽,当咱们接入曝光数据后,就能够深刻开掘曝光数据来制订策略,例如一个用户最大曝光次数等,来进行精准屏蔽。拉活场景下的用户投放更侧重于RTB,因为目前哈啰用户体量比拟大,也有足够的数据撑持去做RTB。 布局的第二个方向是全自动线上投放,闭环治理。通过算法来抉择最优计划构建创意和打算,缩小人工手动配置。进行不同创意不同打算之间的估算调配,以达到广告打算ROI最大为目标设置用户定向。 左边这张大图能够看成是整个布局的概览图。外面左下角的小图是算法能力的建设,包含出价治理,跨渠道治理,RTB估算调配,DPA等。左边的小图列出了算法能力的技术撑持,包含uplift,强化学习,在业务场景内融入业界比拟成熟的ctr算法,以及应用CV相干算法实现素材的翻新,针对不同的用户展现不同的素材。 精彩问答Q:如果存在多业务拉活,怎么去防止恶性竞争抬价呢? A:不同业务针对的人群大概率是不一样的。比方做四轮车主拉新的人群肯定是有车人群。如果是两轮业务拉新,那么更偏差的是没有车的人群。在不同的业务线针对的用户不一样的状况下,两头的穿插应该不会特地重大。 Q:类似素材为什么在不同账户下的体现会不同呢? A:因为平台方会从很多方面来判断是否要给一个打算或者一个素材放量。举个例子来说,一个在线上曾经跑得比拟好的账户,它上面会有很多的用户转化,如果拿他跟一个刚起量的账户比照,那平台的偏重肯定是不一样的。所以雷同的素材在线上跑得比较稳定的打算下和在线上刚跑的打算下,必定是在比较稳定的打算或者账户下更容易起量。 Q:在将来布局的那页PPT中,拉活局部的潜客模型是筹备通过RTB而不是RTA来达到的? A:RTB和RTA从自身的概念讲是不能够相互替换的,RTB是一个实时竞价的框架,而RTA只是一个接口,这页PPT次要想表白的意思是将RTB的外围性能点集成到RTA外面,通过RTA的接口,来实现实时竞价。 (本文作者:周冰倩)

March 28, 2022 · 1 min · jiezi

关于算法:EE425X-信号处理

Homework 2EE425X - Machine Learning: A signal processing persepectiveLogistic Regression and Gaussian Discriminant AnalysisIn this homework we are going to apply Logistic Regression (LR) and Gaussian Discriminant Analysis(GDA) for solving a two-class classification problem. The goal will be to implement both correctly andfigure out which one is better.To do this, you will first “learn” the parameters for each case using the training data (as discussed inclass and available in the handouts). Then, you will apply it to test data and evaluate the performance asexplained below. The only change from the handout is that, for GDA, you need to assume that thecovariance matrix is diagonal.1 Synthetic Data GenerationGenerate your own training data first. To do this, we use the GDA model because that is the only one whichprovides a generative model. Generating Training data: Since we want to implement a two-class classification problem, let the classlabels, y(i)take two possible values 0 or 1 (for i = 1, · · · , m, i.e., we have m training samples). Theseare generated independently according to a Bernoulli model with probability . Next, conditioned ony(i), the features x(i) ∈ Rn×1 are generated independently from a Gaussian distribution with meanµy(i) and covariance matrix . In other words, while generating x(i), use the same covariance matrix for both classes, but pick two different µ’s: µ0 as the n-dimensional mean vector for data from class0 and µ1 as the n-dimensional mean vector for data from class 1. Do this for all i = 1, 2, · · · , m. Generating Test data: Do the same as above, but now instead generate mtest = m/5 samples.2 Learning parameters using training data; and then testing the methodon test data❼ Write code to estimate the parameters for Logistic Regression and for GDA. For how to do it, pleaserefer to the class handouts. GDA was covered recently in the Generative Learning Algorithms handout.LR is covered in the first handout (Supervised Learning).For LR, you need to write Gradient Descent code to estimate .For GDA, proceed as follows. The ONLY CHANGE from the handout is that we assume that is1DIAGONAL and thus use the following formulas:while setting all non-diagonal entries of to be zero. Here, 1(w = c) is the indicator function thatevaluates to 1 when w = c and 0 otherwise. Write a code that uses the estimated parameters for each method, and then classifies the test data asexplained in the handout and in class. For GDA, we use Bayes rule for classification. For each inputquery x, compute the output y(x) as Evaluate accuracy: let us denote the test data as Dtest. Report accuracy of each method aswhere y(x) is the output of the classifier for input x. Also, |Dtest| = mtest is number of testing samples. Use n = 100 and m = 20. This means that for estimating each entry of µ or you have 20 samples.Generally speaking, we need to have order of n2samples to estimate all entries of . However, sincein this homework we assume that is a diagonal matrix, order n samples suffices.3 Real DataNext use the MNIST dataset to evaluate both approaches on real data. MNIST is a good database for peoplewho want to try learning techniques and pattern recognition methods on real-world data while spendingminimal efforts on preprocessing and formatting. The MNIST database of handwritten digits has a trainingset of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available fromNIST. The digits have been size-normalized and centered in a fixed-size image. The entire dataset can bedownloaded from here but in this problem we only use samples corresponding to two digits 0 and 9.Use the code written in the previous part to classify two digits 0 and 9 in MNIST by using LogisticRegression and Gaussian Discriminant methods. You should have written code for part 2 so you need nothave to rewrite anything, except change what you provide as training and test data. This is what we wantto learn in this course: use simulated (synthetic) data to write and test code; make sure everything worksas expected, then use the same code on real data.Please report the final classification accuracy and discuss how the obtained accuracy for the real datadifferences from the synthetic data.4 What to turn in?Submit a short report that discusses all of the above questions. Also submit your codes with clear documentation.Grading will be based on the quality of report and accuracy of implemented codes. ...

March 28, 2022 · 4 min · jiezi

关于算法:Data-Visualization-incarceration

Assignment 3: Data Visualization (incarceration)Due Feb 19 by 10pm Points 100 Submitting a text entry boxSubmit AssignmentOne of the sharpest manifestations of racism in the United States is its prison system. A complex setof social and political structures, including the over-policing (https://www.pewresearch.org/f...)of individuals of color and thewar on drugs (https://www.drugpolicy.org/is...) have led to thedisproportionate incarceration of people of color. These issues are very well summarized in thedocumentary 13th, which you can for free (https://www.youtube.com/watch...) .In this assignment, you will use your data analysis and visualization skills to expose patterns ofinequality using incarceration data collected by the Vera Institute (https://www.vera.org/) . You willpractice producing visualizations of complex data as well as generating reports that include thosevisualizations.ObjectivesBy completing this assignment you will practice and master the following skills:Rendering R Markdown files using knitrProducing data visualizations using the ggplot2 libraryWrangling and shaping real-world dataGenerating visualizations of map-based dataExploring and drawing conclusions from visual dataSetupFollow the below link to create your private code repo for this assignment. You will need to acceptthis assignment to create your code repo. Do not fork this repository!https://classroom.github.com/... (https://classroom.github.com/...)Note that you do not have to accept an invitation to the organization (you won't get one); once you'veaccepted the assignment you're done and can access it.You will need to accept this assignment to create your code repo. This repo will have the nameinfo201b-wi21/a3-incarceration-yourusername , and you can view it online athttps://github.com/info201b-w... (replacing yourusername with yourGitHub user name).Do not fork this repository!2021/2/25 Assignment 3: Data Visualization (incarceration)https://canvas.uw.edu/courses... 2/13After you've accepted the assignment, clone the repo to your local machine so you can edit the files.Make sure you don't clone it inside another repo!Unlike previous assignments, the repo contains no starter code beyond a .gitignore file; you willneed to generate the script files yourself (see below for details).Start with creating a file analysis.R that will be your "main" program script; most of your code willgo in here (and you will draw on that file in your R Markdown).Remember that it is always a good idea to add and commit (and even push ) your changeswhenever you finish a section of an assignment or project!The instructions for this assignment written on this Canvas page, rather than as comments in a .Rfile. That means you'll need to consider on your own how to approach each question/step, how toname variables, and how to organizing your code effectively. Note that we will be considering someaspects of coding style—see below for detail.Note that this write-up is structured around the data analysis, with details on how to present(report) on that analysis at the end. You can work on this assignment in any order you wish;but I encourage you to focus on the R code first, separate from the R Markdown report.The DataYou can access that data for this assignment from this GitHub repository (https://github.com/verainstit...)(specifically, the CSV file incarceration_trends.csv , for countyleveldata). You can either load the data set directly from GitHub (preferred), or download the file intoyour assignment repo (into a data/ folder) to read locally.This data file is big and messy. A few tips on working with this data:It may take a minute or two to load the data file because it is so large, so don't worry if it takes awhile when you start the assignment. Note that your read.csv() call will need to go in youranalysis.R file! Save this data in a variable called e.g., incarceration_dfRead the documentation! You can find the details of this dataset in the "Codebook"(https://github.com/vera-insti...) . This will help you knowwhat variables are available, where they come from, and what they mean. You can also crosscheckyour work with the Vera Institute's own visual tool (http://trends.vera.org/incarc...).Watch out for missing values. You don't need to impute these (using na.rm = TRUE is fine), butthey should inform your work and analysis.Part 1: Data DescriptionIn this first part you will do some basic wrangling of the data, as well as generate some summarystatistics to include in your report. Again, remember that all your data wrangling should go in theanalysis.R file!2021/2/25 Assignment 3: Data Visualization (incarceration)https://canvas.uw.edu/courses... 3/13Start by reading in the data set using read.csv() . Once you do so, you'll want to inspect the data toget a sense for what you have. Use functions such as colnames() and View() to see what it lookslike. Note that because the data is so large, filtering for just a subset of the data to View (such as aparticular year) can help a lot.Once you have a sense for what the data will be, you need to calculate some descriptive statisticalvalues for your report. These values must include: ...

March 28, 2022 · 22 min · jiezi

关于算法:131A-难点解答

1COMPUTER SCIENCE 131AOPERATING SYSTEMSPROGRAMMING ASSIGNMENT 3CARS AND TUNNELS PRIORITY SCHEDULERIntroductionYou will implement a priority scheduler that manages a collection of vehicles attempting to entera set of tunnels. The purpose of this project is to expose you to the operating system task ofmanaging and allocating CPU processes. In this PA, you can think of Vehicles as CPUprocesses and Tunnels as CPU cores. The mechanism by which the CPU scheduler allows forthreads to be run on cores is represented by the PriorityScheduler class admitting aVehicle into a Tunnel.DescriptionMost of the logic surrounding Vehicles and Tunnels is already implemented. However, thereare two classes that have been left unimplemented: BasicTunnel.java andPriorityScheduler.java. When solving this PA, you are only allowed to modify thesubmission package.VehiclesThere are two types of Vehicles in this PA: Cars and Sleds. Each Vehicle has apriority and direction. The PriorityScheduler and Tunnels use thisinformation to allocate tunnel spaces to Vehicles. Both of these classes are alreadyimplemented, but more detailed information about their behavior can be found inVehicle.java.BasicTunnelThe BasicTunnel class contains the logic governing the Vehicle admittance policy.Although both types of Vehicles are functionally the same, BasicTunnel policy distinguishesbetween them.At all times, the BasicTunnel’s state must satisfy the following policy: ...

March 26, 2022 · 6 min · jiezi

关于算法:CPSC-319图数结构

Assignment 4 GraphsYou are given an input adjacency matrix representing a directed graph in the following format:0 0 0 1 00 0 1 0 11 0 0 0 00 1 0 0 10 0 0 0 0Nodes are to be indexed from 0 to n – 1, where n is the total number of nodes.You are also given a query file that contains pairs of graph nodes. Your task is to determine if there is a path between these nodes in the graph and to record this path in the output file. You must also indicate if the path does not exist.The format of the query file is start_node, end_node. For example:0 14 2The format of the output file is start_node, path_node, . . ., path_node, end_node if the path is found, and start_node, -1, end_node if the path is not found. For the above example, the output file will be:0, 3, 14, -1, 2Create a Java program that performs the following steps: ...

March 26, 2022 · 4 min · jiezi

关于算法:百度飞桨螺旋桨赋能生物医药推动AI技术在药物研发领域的探索应用

在数字化浪潮中,AI 正在成为生物医药行业高质量倒退的重要推动力。3月23日,百度深圳研发核心自然语言解决部技术总监、螺旋桨 PaddleHelix 生物计算平台负责人何径舟在机器之心 AI 科技年会 AI for Science 论坛上发表了 《飞桨螺旋桨 PaddleHelix 赋能生物医药:AI 技术在药物研发畛域的摸索和利用》 主题演讲,介绍了螺旋桨 PaddleHelix 在生物医药畛域的布局与技术停顿,以及在利用落地方面的成绩。 传统药研瓶颈待解 预训练或成行业摸索新方向 目前,随同寰球生物医药市场规模继续上涨,新药研发的投入产出比却继续下滑,药物研发面临的长周期、高投入、高风险等问题凸显。传统药物研发在尝试了生物试验、传统机器学习等办法后,面对大量无标注的数据,高要求的泛化能力,以及生物计算畛域的个性,终于迈向领有自监督和多任务学习交融能力,又思考生物畛域钻研对象个性的预训练模型。 反观以化合物、DNA 与 RNA、蛋白质为次要钻研对象的生物计算,何径舟示意,在此之前,预训练模型在 NLP、CV、跨模态等 AI 畛域已相继展现出通用的 AI 能力、优良的图像分类成果,以及弱小的生成能力,构建基于预训练技术的分子表征模型、蛋白表征模型、组学表征模型,使之成为生物计算的底座,将有助于解决传统机器学习利用在生物畛域的问题。 预训练模型发力 开释多维技术劣势 基于预训练技术,螺旋桨 PaddleHelix 曾经在化合物表征和蛋白质表征等钻研方向上获得重大进展。 在化合物畛域,螺旋桨 PaddleHelix 团队揭示了一种基于三维空间构造信息的化合物建模办法,即“几何构象加强 AI 算法”(Geometry Enhanced Molecular Representation Learning,GEM 模型),首次在寰球范畴内将化合物的几何构造信息引入自监督学习和分子示意模型,并在上游十多项的属性预测工作中获得 SOTA,成为百度在 AI 赋能药物研发畛域对外公开的又一项重磅成绩。 业界之前的预训练方法没有思考化合物的三维空间构造,而空间结构对于化合物性质至关重要。凭借化合物表征模型 GEM 在基于空间结构的图神经网络和多个几何级别的自监督学习工作上的技术创新,该钻研于往年2月登上了国内顶级学术期刊《Nature》子刊《Nature Machine Intelligence》。 在蛋白质畛域,蛋白的建模技术可能无效表征蛋白,对预测蛋白构造和预测蛋白-蛋白相互作用(PPI)来说至关重要。螺旋桨 PaddleHelix 团队以“蛋白 PPI 表征模型 S2F”为例分享了相干停顿。蛋白-蛋白相互作用问题与蛋白的构造和性能密切相关,独自应用蛋白质序列很难形容蛋白质的构造和性能。螺旋桨创新性地提出,通过构建多模态的蛋白预训练技术,利用在 PPI 工作上。该模型在跨物种蛋白 PPI、抗体-抗原亲和力预测、SARS-CoV-2 的抗体中和预测,以及渐变驱动的蛋白联合亲和力变动的预测上,获得 SOTA 后果,相比于其余蛋白质示意模型有5%-10%的晋升。 在本次科技年会上,螺旋桨 PaddleHelix 团队还走漏,螺旋桨基于飞桨框架,联合国内多家超算核心,在国产的软硬件上,残缺适配并跑通了 AlphaFold2 的训练和推理代码,实现千万级别的蛋白质 initial training 训练工夫从 AlphaFold2 的7天压缩到2.6天。相干代码也将于3月底在螺旋桨PaddleHelix平台开源,为国内相干科研工作者提供更多的抉择。 ...

March 25, 2022 · 1 min · jiezi

关于算法:SOTA效果一键预测PaddleNLP带你玩转11类NLP任务

2021年10月,PaddleNLP聚合泛滥百度自然语言解决畛域自研算法以及社区开源模型,并凭借飞桨外围框架的能力降级凋谢了开箱即用、极致优化的高性能一键预测性能,备受开发者青睐。开源一年以来,团队精耕细作,一直公布适宜产业界利用的模型、场景、预测减速与部署能力,在GitHub和Papers With Code等平台上继续失去开发者的关注。 近日,PaddleNLP中月均模型下载量1.9w的一键预测性能全新降级!带来更丰盛的性能、更强的成果、更便捷的应用形式!咱们一起来看看吧。 更丰盛的性能 全场景反对笼罩NLU(Natural Language Understanding,自然语言了解)和NLG(Natural LanguageGeneration,自然语言生成)畛域十一大经典工作:中文分词、词性标注、命名实体辨认、句法分析、中文常识标注、文本纠错、文本类似度、情感剖析、生成式问答、智能写诗、凋谢域对话。 文档级输出首个反对文档级输出的开箱即用NLP工具,解决预训练模型对输出文本的长度限度问题,大大节俭用户输出长文本时的代码开发量。 定制化训练除间接预测外,还反对应用本人的数据集,进行定制化训练。传入模型自定义门路后,仍可应用一键预测能力。 产业级成果 PaddleNLP一方面聚合了百度在语言与常识畛域多年的业务积淀和当先的开源成绩,如词法剖析工具LAC、句法分析工具DDParser、情感剖析零碎Senta、文心ERNIE系列家族模型、凋谢域对话预训练模型PLATO、文本常识关联框架解语等;另一方面也涵盖了开源社区优良的中文预训练模型如CPM等。试验证实,PaddleNLP在成果上全面当先同类开源产品。 分词集成jieba、LAC分词工具,重磅推出基于解语(首个笼罩中文全词类的知识库——百科知识树及常识标注框架)的分词模式:实体粒度分词精度更高,语义片段残缺,在常识图谱构建等利用中劣势显著。 以下面这句话为例,PaddleNLP善于精准切分实体词如“北京冬奥会”、开掘畛域新词如“自由式滑雪”等。在开源数据集上对模型成果进行评测,分词效果显著优于同类工具。 备注:该表格列出的指标是各个工具在不同数据集上进行模型微调训练后得出,这是因为目前分词后果并没有对立的规范,比方WEIBO数据集将『总冠军』作为一个残缺的单词,而MSR数据集会切分为『总 冠军』,通过微调训练使得各个工具可在同一个分词规范下进行比拟。 命名实体辨认两种模式: 1️基于百度词法剖析工具LAC的疾速模式:训练语料蕴含近2200万句子,笼罩多种畛域; 2️基于百度解语的准确模式:具备最全中文实体标签的命名实体辨认工具,不仅实用于通用畛域,也实用于生物医疗、教育等垂类畛域。蕴含66种词性及专名类别标签(同类产品的标签数是15个左右)。 PaddleNLP准确模式下的实体标签丰盛,且对局部类目做了更细的划分,有利于进行精准信息抽取、构建常识图谱、撑持企业搜寻等利用。例如上图例子中,『北京冬奥会』被辨认为『文化类_奖项赛事流动』,而非『nz』(其余专名),能够和其余『文化类』实体无效辨别开来;『自由式滑雪』也被残缺辨认为『事件类』实体。 在通用和垂类畛域的开源数据集上比拟PaddleNLP与其余工具的专名辨认成果,PaddleNLP疾速模式和精准模式成果均远超同类工具,如下左图所示: 备注:在垂类畛域,从金融、法律、经济畛域中随机选取100条样本,人工评估准确模式成果,如上右图所示,PaddleNLP的实体抽取效果显著优于同类工具(Good:代表PaddleNLP更优)。 依存句法分析基于已知最大规模中文依存句法树库(蕴含近100万句子)研发的依存句法分析工具,蕴含SBV(主谓关系)、VOB(动宾关系)等14种标注关系: 情感剖析集成百度自研的情感常识加强预训练模型SKEP,利用情感常识构建预训练指标,在海量中文数据上进行预训练,为各类情感剖析工作提供对立且弱小的情感语义示意能力。 文本类似度收集百度晓得2200万对类似句组,基于SimBERT[1]训练文本类似度模型,在多个数据集上达到了当先成果。 文本纠错ERNIE-CSC在ERNIE预训练模型的根底上,交融了拼音特色的端到端中文拼写纠错模型,在SIGHAN数据集上获得了SOTA的成果。 首个中文多轮凋谢域对话预测接口;反对生成式问答、写诗等趣味利用。凋谢域对话应用的PLATO-MINI模型在十亿级别的中文对话数据上进行了预训练,闲聊场景对话效果显著。 生成式问答、写诗基于开源社区优良中文预训练模型CPM [2],参数规模26亿,预训练中文数据达100GB。 简捷易用 通过调用PaddleNLP的Taskflow API,传入工作名称即可主动抉择最优的预置模型,并且以极致优化的形式实现推理。

March 25, 2022 · 1 min · jiezi

关于算法:CS-325-问题求解

Homework 6 Problem 1 (6 points)Shortest paths can be cast as an LP using distances dv from the source s to a particular vertex v as variables.•We can compute the shortest path from s to t in a weighted directed graph by solving. Use linear programming to answer the questions below. State the objective function and constraints for each problem and include a copy of the LP code and output. a)Find the distance of the shortest path from vertex 0 to vertex 7 in the graph below. b)Find the distances of the shortest paths from vertex 0 to all other vertices. Acme Industries produces four types of men’s ties using three types of material. Your job is to determine how many of each type of tie to make each month. The goal is to maximize profit, profit per tie = selling price - labor cost – material cost. Labor cost is $0.75 per tie for all four types of ties. The material requirements and costs are given below. Material Cost per yard Yards available per month Silk $20 1,000 Polyester $6 2,000 Cotton $9 1,250 ...

March 25, 2022 · 5 min · jiezi

关于算法:CSC148H1-算法比较分析

2021/2/21 Assignment 1: CSC148H1 S (All Sections) 20211:Introduction to Computer ScienceAssignment 1An Experiment to Compare Algorithms for Parcel DeliveryDue date: Tuesday, March 2, 2021 before 1:00 pm sharp, Toronto time.You may complete this assignment individually or with one partner.Learning GoalsBy the end of this assignment you be able to:read complex code you didn’t write and understand its design and implementation, including:reading the class and method docstrings carefully (including attributes, representation invariants, preconditions, etc.)determining relationships between classes, by applying your knowledge of composition and inheritancecomplete a partial implementation of a class, including:reading the representation invariants to enforce important facts about implementation decisionsreading the preconditions to factor in assumptions that they permitwriting the required methods according to their docstringsuse inheritance to define a subclass of an abstract parent classdesign a class, given its name and purpose, includingdesigning an appropriate interfaceweighing some options and choosing an appropriate data structure to implement a classrecording important facts about an implementation decision using representation invariantsmake reasonable decisions about which classes should be responsible for whatuse an ADT to solve a problem, without having to think about how it is implementedperform unit testing on a program with many interacting classesPlease read this handout carefully and ask questions if there are any steps you do not understand.The assignment involves code that is larger and more complex than Assignment 0. If you find it difficult to understand at first, that is normal – we are stretching you to do morechallenging things. Expect that you will need to read carefully, and that you will need to go back over things multiple times.We will guide you through a sequence of tasks, in order to gradually build the pieces of your implementation. Note that the pieces are laid out in logical order, not in order ofdifficulty.Coding GuidelinesThese guidelines are designed to help you write well-designed code that maintaints the interfaces we have defined (and thus will be able to pass our test cases).For class DistanceMap , all attributes must be private.For all other classes except Parcel and Truck , You must NOT:change the interface (parameters, parameter type annotations, or return types) to any of the methods you have been given in the starter code. Note: in the schedulersubclasses, you are permitted to change the interface for the inherited initializer, if needed.change the type annotations of any public or private attributes you have been given in the starter code.create any new public attributes.create any new public methods except to override the schedule method that your classes Greedy and Random inherit from their abstract parent.add any more import statements to your code.You will be designing classes Parcel and Truck , so we make these exceptions:You will be designing the attributes for these classes Parcel and Truck , and are allowed to make some or all of them public. Use your judgment.You may add public methods to Parcel and Truck .You may:remove unused imports from the Typing module. (We have included those that we think you might want to use in DistanceMape , Parcel and Truck .)create new private helper methods for the classes you have been given.if you do create new private methods, you must provide type annotations for every parameter and return value. You must also write a full docstring for such methods,as described in the Function Design Recipe.create new private attributes for the classes you have been given.if you do create new private attributes you must give them a type annotation and include a description of them in the class’s docstring as described in the Class DesignRecipe.Exception: In class PriorityQueue we have defined all the attributes needed. Do not define any new attributes, even private.You may assume that all arguments passed to a method or function will satisfy its preconditions.All code that you write should follow the Function Design Recipe and the Class Design Recipe.IntroductionConsider what happens when a delivery company like FedEx or Purolator receives a plane load of parcels at Pearson Airport to be delivered by truck to cities all oversouthern Ontario. They have to schedule the parcels for delivery by assigning parcels to trucks and determining the route each truck will take to make its deliveries.Depending on how well these decisions are made, trucks may be well-packed and have short, efficient routes, or trucks may not be fully filled and may have to travelunnecessary distances.For this assignment, you will write code to try out different algorithms to perform parcel scheduling and compare their performance.2021/2/21 Assignment 1: CSC148H1 S (All Sections) 20211:Introduction to Computer Sciencehttps://q.utoronto.ca/courses... 2/5Problem descriptionBe sure to read through this section carefully. Your Python code must accurately model all of the details described here.The parcel delivery domainEach parcel has a source and a destination, which are the name of the city the parcel came from and the name of the city where it must be delivered to, respectively. Eachparcel also has a volume, which is a positive integer, measured in units of cubic centimetres (cc).Each truck can store multiple parcels, but has a volume capacity, which is also a positive integer and is in units of cc. The sum of the volumes of the parcels on a truck cannotexceed its volume capacity. Each truck also has a route, which is an ordered list of city names that it is scheduled to travel through.Each parcel has a unique ID, that is, no two parcels can have the same ID. Each truck also has a unique ID, that is, no two trucks can have the same ID.DepotThere is a special city that all parcels and trucks start from, and all trucks return to at the end of their route. We’ll refer to this city as the depot. (You can imagine all parcelshave been shipped from their source city to the depot.) Our algorithms will schedule delivery of parcels from the depot to their destinations.You may assume that no parcels have the depot as their destination.There is only one depot.Truck routesAll trucks are initially empty and only have the depot on their route (since it is their initial location). A truck’s route is determined as follows: When a parcel is scheduled to bedelivered by a truck, that parcel’s destination is added to the end of the truck’s route, unless that city is already the last destination on the truck’s route.Example: Consider a truck at the start of the simulation, and suppose that the depot is Toronto. The truck’s route at that point is just Toronto. Now suppose parcels are packedonto that truck in this order:a parcel going to Windsor. The truck’s route is now Toronto, Windsor.another parcel going to Windsor. The truck’s route is unchanged.a parcel going to London. The truck’s route is now Toronto, Windsor, London.a parcel going to Windsor. The truck’s route becomes Toronto, Windsor, London, Windsor. (Yes, this is a silly route, so the order in which we pack parcels onto trucks isgoing to matter.)Whatever its route, at the end, a truck must return directly to the depot.Scheduling parcelsYou will implement two different algorithms for choosing which parcels go onto which trucks, and in what order. As we saw above, this will determine the routes of all thetrucks.(1) Random algorithmThe random algorithm will go through the parcels in random order. For each parcel, it will schedule it onto a randomly chosen truck (from among those trucks that havecapacity to add that parcel). Because of this randomness, each time you run your random algorithm on a given problem, it may generate a different solution.(2) Greedy algorithmThe greedy algorithm tries to be more strategic. Like the random algorithm, it processes parcels one at a time, picking a truck for each, but it tries to pick the “best” truck it canfor each parcel. Our greedy algorithm is quite short-sighted: it makes each choice without looking ahead to possible consequences of the choice (that’s why we call it“greedy”).The greedy algorithm has two configurable features: the order in which parcels are considered, and how a truck is chosen for each parcel. These are described below.Parcel orderThere are four possible orders that the algorithm could use to process the parcels:In order by parcel volume, either smallest to largest (non-decreasing) or largest to smallest (non-increasing).In order by parcel destination, either smallest to largest (non-decreasing) or largest to smallest (non-increasing). Since destinations are strings, larger and smaller isdetermined by comparing strings (city names) alphabetically.Ties are broken using the order in which the parcels are read from our data file (see below).Truck choiceWhen the greedy algorithm processes a parcel, it must choose which truck to assign it to. The algorithm first does the following to compute the eligible trucks: ...

March 25, 2022 · 20 min · jiezi

关于算法:因果推断在哈啰出行的实践探索

导读:因果推断在智能营销场景有着宽泛的利用。在哈啰出行酒店营销业务中,自研Tree-based因果推断模型通过批改决裂准则,使得模型指标和业务指标的一致性更高,在智能补贴模块中获得了较好的成果。 本文的次要内容包含: 背景介绍,简略介绍智能补贴及其中的一些问题; 因果推断的利用,次要是Uplift建模,即增量预估; 因果推断在哈啰智能补贴场景的利用,包含一些翻新; 问题的思考和将来的布局。 背景介绍天降红包场景哈啰出行的根本业务是两轮共享出行业务,包含单车和助力车。同时,也有针对用户需要而推出的服务和产品,例如酒店业务。 以酒店业务促销为例,这一业务的指标是使总效用最大化。想要进步总效用,通常会给用户以发优惠券的模式做补贴,从而促使用户转化。 智能补贴从用户层面进行下钻剖析。左图依照经典的营销人群四象限,从“是否发券”和“是否购买”两个维度,将用户分为了四类人群。其中: 营销敏感人群指对价格比拟敏感,没有优惠就不会购买、有优惠才会购买的人群;天然转化人群是指无论是否有优惠都会购买的人群;金石为开人群是指无论是否有优惠都不会购买的人群;副作用人群是指没有优惠活动触达时会购买,但有优惠活动触达时反而不会购买,对营销流动比拟恶感的人群。咱们进行补贴的目标通过发放优惠券,促使原本不会购买的用户产生转化,从而进步总效用,咱们要做的就是进步补贴效率。显然,咱们的指标人群是四象限中的营销敏感人群。 用户对价格的接受程度是不一样的,并且在用户量宏大的状况下,是不可能给所有用户都发放优惠券的。 以往是按照经营的教训,依据用户的需要来制订发放策略。起初有了算法的染指,应用机器学习算法来预估用户的购买概率。最后应用的是CTR预估模型,也就是Response模型的相关性模型,尽管模型策略绝对经营策略而言有肯定的晋升,但也存在一些问题。 从具体case来看,左表中假如当初有两个用户。通过Response模型的预估,咱们失去了在发券状况下两个用户的购买概率,同时咱们也晓得他们在不发券状况下的购买概率。能够看到,发券之后两个用户的购买概率都是晋升的,并且用户2的购买概率(1.5%)要高于用户1的购买概率(1.3%)。如果依照这种后果,咱们是否能够决策应该给用户2发放优惠券? 后面咱们也提到过,咱们的指标是晋升补贴效用,所以须要再从效率的角度再来计算一下。右图中,假如用户1和用户2各有一万人,商品的原价是10元,优惠券的金额是2元,那么如何发券会使效率最大化呢?咱们能够计算一下效率的冀望。以总人数乘以购买概率,再乘以价格,就能够失去如下的效率状况: 都不发券时,总效用是2200;都发券时,总效用是2240;用户1不发券,用户2发券时,总效用是2000;用户1发券,用户2不发券时,总效用是2400。若依照前述Response模型预估的后果进行发券,即用户1不发券,而用户2发券,总效用只有2000,反而比都不发券时更低。这里是为了不便大家了解与后果相悖的发券策略举了一个极其的例子,能够看出用Response模型的后果取得的补贴效用并不是最大的。 咱们再来看一下用户1和用户2在发券后的购买概率变动的差值,也就是新增的这一列Uplift的值。能够看到发券后用户1的购买概率晋升了0.5%,用户2的购买概率晋升了0.1%,能够看登程券对用户的影响成果大小是不一样的。它能够预测用户的购买概率,然而它无奈通知咱们用户是否因为发放优惠券而产生购买,也就无奈辨认出营销敏感人群。 表格中显示了Response模型和因果推断的关系。从发券到购买是有因果关系的,此时就要用到因果推断技术。 相关性模型是基于观测后果失去的,比方观测到了什么、购买概率是怎么的,次要用利用在搜寻广告举荐等场景。然而补贴场景中的发券动作,能够看作是对用户的一次干涉行为,有无奈观测到的异样,咱们称之为反事实。也就是说在有干涉存在的状况下,绝对于没干涉的状况下,它是什么状况?比如说如果我过后没有给用户发券,它的后果会是怎么?这是因果推断所钻研的问题。 在营销畛域的利用次要是增量预估,也就是咱们所说的Uplift建模,能够通过增量的大小来辨认出营销敏感人群。 因果推断接下来咱们看一下Uplift建模,这部分次要会讲一些罕用的建模办法,还包含一些离线评估的办法。 通过后面局部的介绍咱们曾经晓得相关性和因果性是不一样的,咱们这里再举一个例子。在数据统计网站中有很多乏味的相关性的统计,其中有一个是尼古拉斯凯奇参演电影数量和当年度泳池溺水身亡人数的相关性统计,这两条曲线的重合度十分高,是不是能够得出结论——因为尼古拉斯凯奇演电影导致人溺水呢?显然不是。还有很多这样的例子,次要为了阐明相关性和因果性并不一样的问题。 因果推断回到正题,Response模型和Uplift模型次要的区别能够用这两个公式来形容。Response模型次要是预估用户的购买概率,Uplift模型次要预估因为某种干涉后用户购买的概率。因果推断技术在营销场景的次要利用,就是基于Uplift建模来预测营销干涉带来的增益。 因果效应那么Uplift模型是如何做到增量预估的呢?咱们在这里介绍因果效应的概念。假如有n个用户,Yi1示意对用户i进行干涉的后果,Yi0示意对用户i无干涉的后果,那么用户i的因果效应就能够示意为Yi1减Yi0。在咱们的场景中,Y1和Y0就别离代表了发券和不发券。然而要留神的是存在反事实的问题,因为对同一个用户,咱们不可能既失去他发券的后果,又失去他不发券的后果。前面咱们会讲到反事实问题目前的解决思路,Uplift建模的指标因果效应,也即利用场景中发券带来的增量收益。 Uplift建模办法接下来介绍Uplift建模的罕用办法。这里建模用到的数据都是蕴含有干涉的数据和无干涉的数据的。 T-Learner,其中T代表two的意思,也即用两个模型。它的次要思维是对干涉数据和无干涉数据别离进行建模,预估时数据进入两个模型,用两个模型的预测后果做差值,来失去预估的增量。该办法的长处是原理比较简单直观,能够疾速实现。但毛病是,因为两个模型的精度不肯定十分高,所以两个模型的误差会有叠加,并且因为有差分的操作,这是间接计算的增量。 S-Learner,其中S代表single,也即用一个模型。它的次要思维是把干涉作为特色输出模型,在预测时,同样是用有干涉的后果和无干涉的后果做差,失去预估增量。和Response模型比拟像,就相当于特色外面有“是否干涉”这样的特色,它的长处是,相比T-Learner缩小了误差的累积,但毛病同样是间接的计算增量。 X-Learner,它的思维是先别离对有干涉、无干涉数据进行建模,再用两个模型来穿插预测,失去干涉数据和无干涉数据别离的反事实后果。因为这是训练数据,它是有实在label的,再用实在label和方才预测的反事实后果做差,失去增量;把增量再作为label,再针对增量进行建模。同时对有无干涉进行建模,失去干涉的偏向分,并在预测增量的时候引入偏向分权重。办法的长处在于,能够对后面咱们预测进去的增量建模时退出先验常识进行优化,来进步预测的准确性,另外还引入了偏向分权重来缩小预测的误差。但它的毛病也是多模型的误差,可能会有累积,并且归根结底也是间接失去预测增量。 下面这几种办法都是间接失去预测增量,当然,也有间接对增量进行建模的办法,比方基于决策树的模型Tree-based Model。近几年也有一些深度学习的办法,比方DragonNet。工夫关系,这里不具体介绍。 评估办法那么Uplift模型应该如何评估呢?这里咱们能够看到,因为有反事实问题的存在,咱们是没有实在的增量标签的,传统算法的评估办法,如AUC、准确率、RMSE等都是依赖实在标签的评估办法,在该场景中无奈应用。Uplift模型的离线评估应用的指标是AUUC,它的含意是Uplift曲线下的面积。能够看一下右上角图中蓝色Uplift的曲线,AUUC的指标对曲线做积分,求它曲线下的面积。问题又来了,Uplift曲线是如何失去的呢?咱们能够看一下公式。这里我还列了一下AUUC的计算流程,帮忙大家了解。 第一步,测试集进入模型,输入Uplift Score; 第二步,所有测试样本依据Uplift Score做降序排序; 第三步,进行分桶,每个桶的序号记为t; 第四步,计算每个t的累计增益。其中,Y示意分组的正例的数量,T代表是有干涉,C代表无干涉,YT示意有干涉组的正例的数量,再除以NT有干涉组总的样本数量。假如label是“是否转化”,咱们就能够看到YT除以NT是干涉组的转化率。同样,YC除以NC是无干涉组的转化率。能够看到公式右边是干涉组绝对无干涉组的转化率的增量,左边是样本的总数量,公式计算了收益,也即干涉组绝对无干涉组晋升了多少转化。 第五步,求积分,计算曲线下的面积。 Uplift Score 越准,第一步对样本的排序越准,排序越靠前的桶实验组和对照组的指标的差值会越大。这体现为曲线的拱形。所以AUUC绝对越高,就阐明相对而言模型越好。 因果推断在哈啰的利用接下来介绍一下因果推断在哈啰是如何利用的,次要是Tree-based模型。这部分会具体解说Tree-based模型和一些翻新利用。还有离线和线上的成果的展现。 红包补贴这部分利用次要是在酒店营销场景中的红包补贴模块,属于整个算法链路中的一环。后面曾经提到过,咱们的指标是使补贴效用最大化。次要看的指标是人均效用,即总效用除以人群的人数。 咱们的模型是基于Tree-based的Uplift模型做了一些改良。 Tree-based Model咱们把基于决策树的Uplift模型和一般分类决策树放在一起做个比拟,这样不便了解。 二者次要不同在决裂的准则和指标。一般的决策树的决裂准则是信息增益,这样使得叶子节点的信息熵起码、类别的不确定性最小,以达到分类的目标;而基于决策树的Uplift模型的分类准则是散布散度,比方罕用的有kl散度、卡方散度,这样能够使叶子节点中干涉组和无干涉组的散布差别最大,来达到晋升增益的目标。 从左边的示意图能够看出基于决策树的Uplift的模型是如何做到辨别营销敏感人群的,每个叶子节点的图标示意的是叶子节点中占绝大多数的人群。这样咱们就能够比拟容易地区分出营销敏感人群。 那么咱们为什么要抉择基于决策树的模型呢?次要有两点思考: ① 树模型解释性比拟强,这点对业务的利用比拟有帮忙。 ② 间接对增量建模的准确性更高,对业务的晋升是咱们最关怀的事件。 接下来看一下咱们是如何改良基于树的Uplift模型的,从而实现以增量收益最大化为指标的Treelift模型。方才曾经提到,树模型最重要的点是决裂准则和指标的一致性。咱们能够通过批改决裂准则,来使得模型的指标和业务指标一致性更高。 目前业界支流的用法还是在用户转化上,也就是说样本的label为0或者1,因而就能够应用kl散度作为节点的决裂准则。而咱们的指标是人均效用,用kl散度并不太适合,因为kl散度是掂量概率分布的,因而咱们改良了决裂准则来匹配业务指标。 以增量效用为指标的TreeCausal这里咱们是以效用作为样本的label,那么以干涉组和无干涉组的人均效用的差值的平方作为节点的决裂准则,这里的指标是最大化人均效用差值的平方。 算法流程: ...

March 25, 2022 · 1 min · jiezi

关于算法:产品升级|12月合刊多款重磅产品来袭

2022年1月,百度智能云天工物联网外围套件等产品带来重大性能降级,度目全新公布煤矿电子封条解决方案。2月,冬奥隆重揭幕,百度智能云“3D+AI”技术、百度智能云 AI 手语主播等闪耀赛场内外,同时,百度云计算(阳泉)核心入选《国家新型数据中心典型案例》名单、百度智能云中国边缘云基础设施服务市场份额第一等喜报频频,泛滥产品迎来重大更新和降级, 接下来,带您一起解锁更多产品新性能。 # 重点举荐 # AI 手语主播正式上岗 由“百度智能云曦灵”数字人平台打造的 AI 手语主播在冬奥赛事中实时播报,为听障用户提供24小时的手语服务,让他们也能快捷地获取赛事资讯。 AI 手语主播不仅具备高精度的数字人形象,还具备可能语音辨认、手语翻译和手语表白的 AI 大脑。通过百度自主研发的机器翻译技术,百度智能云构建出⼀套准确的手语翻译引擎,可懂度达到85%以上,媲美支流的中英、中日等方向的机器翻译后果,达到业界领先水平,联合百度自研的语音辨认技术,可将冰雪赛事的文字及音视频内容,疾速精准地转化为手语;同时再通过专为手语优化的天然动作引擎,实现 AI 手语主播的动作驱动,实时演绎为数字人的动作、表情和唇语。这保障了 AI 手语主播具备高可懂度的手语表达能力和精准连贯的出现成果。 度目煤矿电子封条解决方案全新公布,AI 助力生产平安,推动煤矿智能化建设 百度智能云度目团队打造的煤矿电子封条解决方案,利用智能化视频辨认技术,实时监测矿井出入口人员/人数变动,自动识别矿车数量,调度室人员离岗、井下设施运输状态等,辅助监管人员及时发现煤矿异样动静,主动推送报警信息并上报至国家矿山局『电子封条』智能监管平台,造成业务闭环,实现全天候近程监测,缩小煤矿井下事变的产生,保障生产作业平安。 智能培训产品全新公布,通过人机对练形式,疾速晋升新员工应答技巧及业务能力 智能培训产品采纳语音辨认/合成、自然语言了解、数字人等 AI 技术,通过人机交互能力,还原实在的业务场景,面向学员提供实景练习、智能仿真考试,零碎实时对学员进行评估、给予领导,升高新员工上岗工夫、疾速晋升闭口应答技巧及业务能力。 # 全新公布 # 【全新公布】飞桨 BML 全功能 AI 开发平台全新推出试验治理性能,一体化治理、跟踪和比拟机器学习训练试验 飞桨 BML 全功能 AI 开发平台上线试验治理性能,能够主动跟踪每一个训练任务的输入输出,参数和相干配置信息,并进行横向比拟,筛选出真正有价值的参数,领导下一步的试验调整方向。同时,试验治理可记录全副配置信息,复现历史上的试验,便于摸排历史模型的故障,或审计历史模型的合规性。 【全新公布】智能对话平台 UNIT 上线图谱问答引擎,反对企业实现不同数据模式下智能问答 图谱问答引擎在客服场景可能解决简单业务场景的征询问题,辅助客户构建细粒度结构化的常识体系,在智能客服的对话服务中提供常识征询能力。 残缺的常识构建流程,对常识进行结构化整顿,基于百度自主研发的高性能商业图数据库,反对企业级实体的图数据规模,并反对可视化图形界面查看;以常识图谱为根底进行问题答复和计算推理,反对上下文语义了解,并与智能客服多轮引擎深度联合,可间接输入准确答案;提供智能学习能力,能依据用户反馈进行机器学习,通过模板配置,逐步晋升问答泛化能力;与智能客服平台协同治理性能交融,反对残缺的审核、公布、记录、统计、标注等性能,撑持企业级治理的需要。【全新公布】客服知识库,围绕坐席应用场景实现全流程高效常识治理 在客服业务人员常识治理的采编、审核、经营、查找等环节,全面晋升工作效率,同时反对与 UNIT 企业版造成一体化的客服常识治理与利用解决方案。 常识治理模块减少对文档常识、FAQ 的版本治理,提供更精密的常识治理能力;常识审核模块减少智能化文档比对、疾速定位内容变更,反对定制化工作流满足多地区多部门多角色审核须要,反对审核过程间接批改缩小审核流转周期;在常识门户中减少面向坐席集体常识治理的常识珍藏和常识笔记能力。【全新发售】度目7寸智能门禁机 CM-X,防水防尘防爆的户外专家 7寸高亮触摸显示屏,防护等级达到 IP65/IK07,防尘防水防爆;2核1.5GHzCPU,200万双摄摄像头,面部辨认间隔0.3m-1.5m;反对双目活体检测,无效进攻照片、电子屏、视频等非活体攻打;1W 人脸库,人脸比对工夫小于200ms/人,戴口罩人脸识别准确率≥97%;设施反对三种认证形式:人脸、人脸或二维码或刷卡、人脸且刷卡。【全新公布】游戏行业平安解决方案 依靠百度平安的技术积淀,帮忙游戏客户高效解决行业平安难题,笼罩抗 DDos 攻打、挪动利用加固、合规检测、反外挂、黑产反抗、数据安全流通、业务风控等多维平安场景,全方位爱护游戏业务衰弱倒退。 【全新公布】国家要害基础设施工业平安解决方案 基于“主动防御”理念,采纳“AI 内生”的纵深进攻体系和平安威逼免疫技术,为工业业务场景提供更具主动性、适应性和更为智能化的全方位平安保障。 【全新公布】大数据交易中心解决方案 针对数据因素市场化征途中的痛点,采纳隐衷计算、AI 技术推动数据因素汇聚和交融,促成数据因素规范化流通、合理化配置、市场化交易、生态化倒退。 【全新公布】媒体行业智能内容生产解决方案 借助人工智能、大数据、音视频产品技术,打造满足智能剪辑、拆条、配音、创作等需要的内容生产平台,实现内容生成更简略、更智能,内容编辑更高效、更便捷,助力媒体行业智能化倒退。 【全新公布】电信网络运维智能判障解决方案 提供基于常识图谱的智能判障解决方案,助力电信网络运维判障在治理、服务、以及运维模式上的智能化降级。 ...

March 24, 2022 · 1 min · jiezi

关于算法:治理有精度AI赋智加强城市精细化管理

城市治理的指标是无效解决城市公共问题、提供城市公共服务、增进城市公共利益,然而因为城市具备高度的复杂性,对其无效治理是一项异样艰巨的工作。 交通拥堵、环境好转、公共安全堪忧……面向上述提到的“城市病”,仅依赖于人力治理已逐步力不从心。联合人工智能、大数据、云计算等古代信息技术晋升城市管理水平,是城市智能化降级,实现可继续倒退的无效策略。 这其中 AI 技术可能更多地开释城市资源和生产力,用智能的伎俩对人、车、事、物等要害实体进行实时、精准的剖析,及时辨认异常情况或者突发事件并收回警报,能够大幅晋升治理效率和精准度,升高城市治理的人力老本,让咱们的工作和生存更平安、更便捷。 城市治理全景图 目前飞桨开源的深度学习能力曾经能够笼罩“市容环境、街面秩序、市政设施监管和灾祸预警”4大类20+典型城市治理场景,无论是数据中心还是边缘服务器、安防摄像头、手机、平板等各类型终端设备,都有均衡精度、体积、运行速度的最佳模型以供选用,充分利用硬件性能,实现业务成果。 飞桨这一系列开源能力曾经失去中再财险、天覆科技等业界知名企业的宽泛认可,切实实际产业智能化降级。 上面就将通过两个具体的场景案例,为大家分析飞桨深度学习技术是如何在城市治理畛域利用的。 场景一:河道漂浮物检测 在河道、湖泊、水库等水域中,沉没着各种生产,生存垃圾,严重破坏了水体生态环境。为了保护水域的清洁,清理水面漂浮物至关重要。而传统做法次要依赖于人工巡检清理,一方面因为监察范围广,消耗人力老本极大;另一方面,天然流域状况多变,很多时候仅靠人工无奈及时发现解决问题,导致垃圾沉积或者水生植物泛滥。 目前已有多家头部安防厂商,借助飞桨的深度学习算法,联合视频监控点位,实现河道漂浮物的实时巡视、检测、预警,预计可缩小90%人工成本,真正实现“智慧水利”。 河道漂浮物检测效果图 场景二:火灾烟雾检测 据统计,2020年全国共接报火灾25.2万起,间接财产损失高达40.09亿元。火灾曾经成为危害人们生命财产平安的一种多发性灾祸。 火灾监控会存在很多难点,特地是在传统人工智能畛域,因为火焰自身形态、色彩并不对立,要对火灾进行特色工程的建模十分困难,而深度学习则因为更强的学习能力和泛化性,能够完满解决这类简单问题。 针对住宅、加油站、公路、森林等火灾高发场景,能够利用飞桨指标检测技术,自动检测监控区域内的烟雾和火灾,帮忙相干人员及时应答,最大水平升高人员伤亡及财物损失。 目前中再财险(中国财产再保险有限责任公司)曾经应用飞桨自研高精度指标检测算法 PP-YOLOv2 尝试对新能源车充电平安进行实时监测,并获得不错的成果,为生命、财产平安保驾护航更添助力。 火灾烟雾检测效果图 AI 在城市治理中的利用,意味着治理体系能够迅速精准地感知、判断、预测和解决各种城市问题,推动城市治理向着多元共治的方向迈进,更快更好地满足居民之所需,让城市和咱们的生存更美妙。

March 24, 2022 · 1 min · jiezi

关于算法:AI遥感智能解译赋能智慧城市规划革新

在新型城市建设和布局中,以卫星遥感图像处理为代表的天文信息技术正在施展十分重要的作用,通过对城市范畴内的人、事件、基础设施和环境等因素全面感知、实时动静辨认和疾速指标提取,为智慧城市的建设提供更多有价值的信息。以后,基于人工智能的遥感图像处理技术已被宽泛地利用在城市规划、违章建筑监管、工程环境监测、废弃物治理、交通治理、城市安防等场景。 作为源于产业实际的深度学习平台,飞桨始终致力于为各行各业的开发者提供齐备的产业利用开发计划。同样在遥感畛域,飞桨也提供了丰盛的数据预处理计划,笼罩地物检测、地块宰割、变化检测、地物分类等多种视觉工作,致力于更好地帮忙开发者实现遥感我的项目的利用开发。 图1 飞桨遥感利用开发计划 尤其是针对遥感畛域广泛关注的数据标注艰难的问题,飞桨团队联结中国四维,在原有交互式智能标注软件EISeg的根底上,推出了专门针对遥感的交互式垂类模型,提供多通道提取(高光谱、多光谱数据)、大尺幅数据的切片(多宫格)解决和主动拼接等性能,使遥感场景的数据可能被更便捷地解决。 图2 EISeg 遥感性能智能标注性能演示 以后,很多产业AI开发者正在基于飞桨提供的遥感利用开发计划解决理论利用问题。接下来咱们将通过几个具体的场景案例来为大家具体解读。 居民地数据提取 居民地数据是根底地理信息的外围因素之一。利用遥感技术及时、精确地发现、确定居民地变动对灾祸评估、城市扩张、环境变动、空间数据更新等有着重要意义。航天宏图信息技术股份有限公司应用飞桨图像宰割套件PaddleSeg中的Segformer系列算法对居民地大类下的一般街区、高层建筑、独立屋宇、体育场等二级类进行遥感监测,大大晋升了制作根底测绘底图的工作效率。 该我的项目基于Segformer系列算法,联合居民地5种二级类数据的不同遥感图像特色进行了调优。调优后,在精度相当的状况下,飞桨模型的体积是其它框架实现的Segformer模型体积的1/3。最终,模型在2米分辨率遥感影像上进行推理,联合栅格矢量化、规则化等后处理工具,可能疾速地宰割出居民地并生成测绘级地图,相比于传统人工地图矢量化的办法,工作效率进步了85倍,检出准确率能够达到90.2%,合乎产品上线要求。 图3 居民用地宰割示意图 土地利用类别动静解译 土地利用是水土流失的重要影响因子。全国水土流失动静监测采纳遥感考察、定位观测与模型计算相结合的技术办法,每年发展一次区域土地利用类别解译工作。如果采纳传统的人工目视解译形式,须要消耗大量的人力、物力资源,每人每天只可能解译300-400平方公里,在时效性方面难于满足区域水土流失动静监测工作须要。基于飞桨,北科博研实现了宁夏土地利用类型AI遥感辨认,提取准确率达到90%以上,绝对传统的人工解译我的项目有了很大的晋升。只须要两台GPU工作站,即可疾速实现全省的解译工作,大幅提高土地利用辨认效率,保障当地区域水土流失动静监测工作的顺利开展。 图4 北科博研AI解译平台 高尔夫球场检测 因为历史上疏于监管,各地均存在着高尔夫球场滥建强占城市建设空间的问题,引起了发改委等相干部门的高度重视。中科院空天信息翻新研究院利用飞桨深度学习开源框架对高尔夫球场进行遥感监测,针对指标进行了一系列优化,大大晋升了遥感图像解译工作的效率,为高尔夫球场检测提供半自动化技术手段。 在我的项目中采纳经典的指标检测算法Faster R-CNN,并依据高尔夫球场的个性对输出图像的长宽比进行了调优。我的项目上线后,绝对于传统办法效率大大提高,使周期性、自动化高尔夫球场遥感检测成为可能。在京津冀地区GF-6 WFV影像中获得的面积检测率为86%,数量检测率为95%,单景 GF-6 WFV影像检测耗时10分钟。 图5 高尔夫球场辨认效果图 同时,飞桨贴心地为大家筹备了一节直播课并邀请了国内遥感能力当先的提供商—航天宏图的技术专家,从核心技术实践动手,全方位分析遥感图像在智慧城市中的利用。在将来,飞桨会继续增强在遥感畛域的能力建设。以最低门槛、最高性能为初心,更好地赋能智慧城市的建设。

March 24, 2022 · 1 min · jiezi

关于算法:企业如何挖掘知识金矿这本白皮书讲得够透彻

常识是智能化降级的重要根底,通过建设全新的信息与常识解决平台,精炼一直积攒的数据资源,可更疾速地驱动智能化利用的落地、减速业务的翻新倒退。这其中最要害的挑战是如何高效地从海量数据中提炼常识,让机器可能学习到数据中蕴藏的专家教训和业务逻辑,从而可能像人类专家一样进行剖析和决策。 近日,百度智能云公布了《“云智一体”技术与利用解析系列白皮书——常识智能化篇》, 零碎介绍了常识智能化体系的建设思路,具体解读了企业应该如何构建以常识为外围的竞争劣势,注入企业智能化降级的继续动能。 常识与 AI 交融,助力电力行业智能化 常识与人工智能技术的深度交融,在推动产业智能化降级中扮演着不可或缺的角色。以能源行业为例,在以后“双碳”指标的大背景下,发电团体纷纷提出智慧建设项目,旨在放慢企业推动新一代信息技术与发电业务的深度交融,进而全面晋升企业生产、治理和服务的智能化程度。但在晋升智能化程度过程中,业内企业通常面临着管理系统信息竖井普遍存在、团体对立建设零碎与基层个性化需要存在差距、历史积淀信息难以无效利用并造成数据资产等诸多痛点。白皮书以“国能(绥中)发电”为例,阐释了常识智能化在电力企业转型降级中的微小推动力。在本案例中,国能(绥中)发电基于常识智能化策略实现了三大价值: 基于百度的常识图谱技术,全面买通资金流、物料流、事件流等流程之间的壁垒,实现不同零碎之间的数据关联, 对生产经营业务进行全方面、全过程、全场景刻画剖析,为作业成本法等财务比对剖析提供帮忙。实现了对沉没的历史信息的开掘, 不断完善岗位责任及与之对应的相干常识,建设起合乎特定单位治理需要的岗位常识管理体系,进而辅助实现上岗培训等人力资源管理业务。胜利搭建智能办公网络系统, 实现常识智能举荐、数据即搜即得,企业办公效率极大晋升。常识智能化,驱动产业智能化降级 上述案例只是常识在产业智能化降级中的一个价值缩影,白皮书指出,基于常识图谱、自然语言了解等当先的人工智能技术,百度智能云造成了常识智能化解决方案——常识中台,笼罩常识的高效生产、灵便组织和智能利用,是面向企业常识生产、治理和利用的全生命周期一站式智能解决方案。 百度智能云常识中台向下对接不同起源和状态的海量企业数据,向上撑持企业各类业务场景,为企业提供智能化降级的常识底座。在底座之上,百度智能云常识中台撑持造成各种类型的产品和利用,既有以企业搜寻、智能知识库等为代表的通用型产品,又有在能源、金融、运营商、法律和医疗等各行业满足不同场景需要的场景化产品。 常识中台,笼罩常识全生命周期 一般来说,常识全生命周期过程包含数据治理、常识生产、常识组织、常识利用及常识经营等环节。白皮书指出,百度智能云常识中台解决方案作为企业常识智能化的中枢,包含数据接入、常识生产、常识组织、智能利用以及经营治理等模块。 数据接入: 百度智能云常识中台反对多源异构数据的接入与预处理,单日十亿级的数据吞吐量,同时接入百万级数据源,能做到分钟级更新。常识生产: 百度智能云常识中台提供弱小且丰盛的多模态常识生产能力,包含常识图谱生产、问答常识生产、全文常识生产、标签常识生产、事件常识生产、多模常识生产、因果常识生产七大常识生产方式。常识利用: 常识的生产与组织不是起点,智能化利用才是常识在企业智能化降级中的价值要害。通过七大常识生产方式生产出不同品种的常识后,通过图谱链接造成对立组织状态的有序组织体系,可服务于企业搜寻、智能知识库、智能举荐、智能客服、智能文档剖析以及如流等通用型利用,全方位满足企业对常识利用的需要。常识经营: 当企业实现常识中台建设后,为确保常识的高效利用,还需进行日常经营工作,包含内容经营,确保常识内容的继续迭代和更新;用户经营,激励和沉闷企业员工用户,使其被动使用常识体系或奉献常识;功能完善,平台性能须要在应用中一直迭代演进,须要通过新接入常识来继续欠缺。多种 AI 技术加持,搭建“硬核”常识中台 白皮书提到,百度智能云常识中台基于当先的 AI 技术与大规模常识的交融,深刻行业利用场景中,通过利用组件、标准化产品、定制化服务、集成化计划等多种形式,助力企业高效生产常识、灵便组织常识、智能利用常识,全面晋升企业运行效率和决策的智能化程度。次要技术包含: 常识图谱。 通过近10年倒退,百度常识图谱曾经具备从通用常识图谱到行业常识图谱的残缺技术体系,构建了业界规模最大的多源异构常识图谱,蕴含5500亿条常识。自然语言解决。 基于飞桨深度学习平台打造的文心 ERNIE 系列常识加强大模型,在语言了解与生成、跨语言了解、跨模态了解与生成等畛域获得多项技术冲破。其中,百度联结鹏城实验室公布了寰球首个常识加强千亿大模型鹏城-百度·文心(ERNIE 3.0 Titan),模型参数规模达到2600亿,是目前寰球最大中文单体模型。常识加强的跨模态内容了解。 百度智能云研制的常识加强的跨模态深度语义了解办法,通过常识关联跨模态信息,可解决不同模态语义空间交融示意的难题,冲破了跨模态语义了解的瓶颈,让机器可能像人类一样,通过语言、听觉和视觉等取得对真实世界的对立认知,实现对简单场景的了解。面向多模态的简单常识开掘。 在常识开掘层面,针对行业多状态输出数据,基于 Prompt learning 技术,对实体、关系、事件等信息抽取工作进行对立建模和多任务训练。基于当先的 AI 技术与大规模常识的交融,百度智能云常识中台深刻行业利用场景中,助力企业高效生产、灵便组织和智能利用常识,全面晋升企业运行效率和决策的智能化程度。以后,百度智能云常识中台在能源、运营商、金融、医疗、媒体、政务等行业均有大量实际我的项目,积攒了丰盛的实践经验。置信将来,百度智能云将可能携手合作伙伴,用常识中台惠及更多客户,助力更多行业实现智能化降级!

March 24, 2022 · 1 min · jiezi

关于算法:COMPSCI-111求解分析

QUESTION/ANSWER BOOKLET COMPSCI 111/111GTHE UNIVERSITY OF AUCKLANDSECOND SEMESTER, 2020Campus: CityCOMPUTER SCIENCEAn Introduction to Practical Computing(Time Allowed: TWO hours)NOTE:Calculators are NOT permitted.You must answer all questions in this question/answer booklet.There is space at the back for answers that overflow the allotted space.Surname ModelForename(s) AnswersPreferred NameStudent IDLogin (UPI)Question Mark Out Of1 - 10 Short Answers 3011 Programming using Python 1512 Spreadsheets 1513 HTML5/CSS 2014 LaTeX 20TOTAL 100QUESTION/ANSWER BOOKLET COMPSCI 111/111GID ……….…………Page 2 of 21SECTION A – SHORT ANSWERSAnswer all questions in this section in the space provided. If you run out of space, please usethe Overflow Sheet and indicate in the allotted space that you have used the Overflow Sheet. ...

March 23, 2022 · 9 min · jiezi

关于算法:每日算法之数组二

题目给你一个 升序排列 的数组 nums ,请你 原地 删除反复呈现的元素,使每个元素 只呈现一次 ,返回删除后数组的新长度。元素的 绝对程序 应该放弃 统一 。 因为在某些语言中不能扭转数组的长度,所以必须将后果放在数组nums的第一局部。更标准地说,如果在删除反复项之后有 k 个元素,那么 nums 的前 k 个元素应该保留最终后果。 将最终后果插入 nums 的前 k 个地位后返回 k 。 不要应用额定的空间,你必须在 原地 批改输出数组 并在应用 O(1) 额定空间的条件下实现。 示例输出:nums = [1,1,2]输入:2, nums = [1,2,_]解释:函数应该返回新的长度 2 ,并且原数组 nums 的前两个元素被批改为 1, 2 。不须要思考数组中超出新长度前面的元素。输出:nums = [0,0,1,1,1,2,2,3,3,4]输入:5, nums = [0,1,2,3,4]解释:函数应该返回新的长度 5 , 并且原数组 nums 的前五个元素被批改为 0, 1, 2, 3, 4 。不须要思考数组中超出新长度前面的元素。题目解析这也是一道数组简略题,会应用一种常见的解题办法--快慢指针,次要用于一些前后对照的场景,在此题中须要前后对照看是否反复。开始之前首先须要思考一些边界状况如数组的长度为0,此时新数组也为0。当数组数组的长度大于0时,数组中至多蕴含一个元素,在删除反复元素之后也至多剩下一个元素,因而nums[0] 保持原状即可,从下标1开始删除反复元素。定义两个指针fast和slow别离为快指针和慢指针,快指针示意遍历数组达到的下标地位,慢指针示意下一个不同元素要填入的下标地位,初始时两个指针都指向下标1。假如数组 \textit{nums}nums 的长度为 nn。将快指针 \textit{fast}fast 顺次遍历从 11 到 n-1n−1 的每个地位,对于每个地位,如nums[fast] !=nums[fast−1],阐明nums[fast] 和之前的元素都不同,因而将nums[fast] 的值复制到 nums[slow],而后将slow 的值加1,遍历完结之后,从 nums[0] 到 nums[slow−1] 的每个元素都不雷同且蕴含原数组中的每个不同的元素,因而新的长度即为slow,返回slow 即可。 ...

March 23, 2022 · 1 min · jiezi

关于算法:Programming-Puzzle-Bamboo-Trimming

2020/3/27 Programming Puzzle: Bamboo Trimming https://www.wild-inter.net/te... 1/8back to main page — COMP 526 Applied AlgorithmicsProgramming Puzzle: BambooTrimmingThis continuous-assessment exercise consists of a small applied projectwith algorithmic and programming components, including a real-timeleaderboard of the competition.Will you be able to beat your classmates, or even your demonstrator?You will be working on a real, challenging research problem, so theintention is as much on the process of producing solutions to algorithmicproblems, as on the actual deliverable.The Bamboo Trimming ProblemTo offset the long hours ofsitting in classes, you area passionate gardener,and your pride and joy isyour little forest of exoticbamboos. However, beingone of the fastest-growingplants on earth, thebamboo plot requiresconstant attention. In anattempt to keep the effort manageable, you decide to each day cut downexactly one of your the bamboo plants, and you cut it right back to theroots.Sebastian Research Teaching Blog About2020/3/27 Programming Puzzle: Bamboo Trimminghttps://www.wild-inter.net/te... 2/8Since your bamboos have vastly different growth rates, some of them needmore frequent cutting than others. You set out to find a periodic scheduleof which bamboo to cut each day, so as to minimize the maximal height ofyour garden.FormalizationYou decide to mathematically model the task as follows. Given bambooswith daily growth rates , we assume that after growing for days(without cutting), bamboo will have height . Right after you cut abamboo, its height is 0, and so is the initial height of all bamboos at thebeginning.Writing for the height of bamboo after days, and the bamboothat you cut on day , we obtain:The task is to find an infinite schedule of cuts that keeps themaximal height as low as possible.To simplify your planning, you decide to restrict your attention to periodicschedules, i.e., a fixed, finite list of cuts that follow, and when you aredone, you simply start from the beginning again.InputsYour garden contains five named bamboo plots with the growth rates ofthe bamboos given below:Unequal Pair: [1,199]Fibonacci: [1,1,2,3,5,8,13,21]Odds [3,3,3,5,5,7,7,9]Powers3 [3,6,12,24,48,96,192,386]Precision [2000,3999,4001]Design as good a periodic schedule as you can find for each of them!Can you argue that your solutions are best possible?Code templateWe prepared a Java implementation of the bamboo-trimming problem thatyou will use to evaluate your trimming schedules:Java sourcesThere is one main class BambooX for each value of X in the list above. Theysimulate the growth of the bamboo under your periodic schedule andreport the maximum height ever reached, divided by the sum of allgrowth rates. The classes automatically store your results in a csv file.Obey the comments! Once you downloaded the code, please in each of the5 BambooX classesadd your vital username in the appropriate variable,add your periodic schedule.To compile the simulation, extract the zip archive to a folder and run javac*java there. To run a simulation, use, e.g, java BambooUnequalPair .DeliverablesSubmissions are due 23 March on SAM.This is an individual project; each student has to submit his or her ownsolution comprising the following:The 5 bamboo plot classes ( BambooX.java for each X in { UnequalPair ,Fibonacci , Odds , Powers3 , Precision }),the generated file with your results ( results.csv ) — Make sure youhave filled in your vital username!A document describing how you arrived at the solution (not morethan two pages). Report also on dead ends you tried (what did notwork), as well as on arguments why a solution better than a certainheight is not possible.2020/3/27 Programming Puzzle: Bamboo Trimminghttps://www.wild-inter.net/te... 4/8The overall mark will consist of a weighted average.40% for the description.60% for the quality of the achieved solutions. The baseline aresolutions that George has found; in principle you could get morethan 100% for this subtask if you manage to beat his solutions!CollaborationThis programming puzzle is mainly an individual project, and you have tosubmit you own solution. In particular, the description of your solutionmust be a single-author document.Collaboration in small groups (not more than five students) on theconceptual level(discussing ideas, not sharing entire solutions) areaccepted, but they must be declared in the description document, includingproper mention of others’ contributions.LeaderboardWe run a (voluntary, anonymous) leaderboard of the current best solution.Whenever you have a periodic schedule tried in the simulator, use thebelow form to share your achievements with the rest of the class!Bamboo trimming leaderboardFor each of the five bamboo plots listed below, you can enter the best ratio (asoutput by the simulation) you were able to achieve. ...

March 23, 2022 · 4 min · jiezi

关于算法:12月热点度目发布煤矿电子封条解决方案AI助力生产安全推进煤矿智能化建设

1-2月,度目公布煤矿电子封条解决方案,实时监测矿井出入口人员/人数变动、矿车数量、井下设施运输状态等,及时发现煤矿异样动静,晋升管理效率,缩小煤矿井下事变的产生。智能培训产品重磅首发,通过智能对话、数字人等 AI 技术,还原实在业务场景,疾速晋升新员工应答技巧及业务能力! 接下来,带您一起解锁更多 AI 产品新性能。 重点领先看 度目煤矿电子封条解决方案全新公布,AI 助力生产平安,推动煤矿智能化建设 百度智能云度目团队打造的煤矿电子封条解决方案,利用智能化视频辨认技术,实时监测矿井出入口人员/人数变动,自动识别矿车数量、调度室人员离岗、井下设施运输状态等,辅助监管人员及时发现煤矿异样动静,主动推送报警信息并上报至国家矿山局『电子封条』智能监管平台,造成业务闭环,实现全天候近程监测,缩小煤矿井下事变的产生,保障生产作业平安。 智能培训产品全新公布,通过人机对练形式,还原实在的业务场景,疾速晋升新员工应答技巧及业务能力 智能培训产品采纳语音辨认/合成、自然语言了解、数字人等 AI 技术,通过人机交互能力,还原实在的业务场景,面向学员提供实景练习、智能仿真考试,零碎实时对学员进行评估、给予领导,升高新员工上岗工夫,疾速晋升闭口应答技巧及业务能力。 人脸人体 H5 人脸实名认证计划性能上新【能力降级】 金融/交通物流/泛互联网行业化活体模型,更无效、更平安面向金融、交通物流、泛互联网行业活体检测中的常见攻打类型(如屏幕攻打、照片、纸张、以及面具、头模等),综合通过率指标提供预置举荐计划。 『金融场景活体模型』: 针对 2D/3D 面具、头套模型、高清手机屏幕及身份证照片等高老本攻打场景进行专项优化,安全性更高。 『交通物流场景活体模型』: 针对简单光线、戴头盔、室外场景进行了真人通过率的专项优化。 『泛互联网场景活体模型』: 非活体攻打回绝率99%+,同时升高真人误拒,满足大多数场景要求。 全新 H5 实时活体检测,弱配合操作,用户体验更优无需用户录制视频上传,在活体检测流程中反对通过摄像头实时进行炫瞳活体/动作活体/静默活体,判断是否为攻打及翻拍行为,此过程会进行抽帧解决并输入照片。 文字辨认 购物小票“拍一拍”,剁手记录秒获取 【新品邀测】 结构化辨认商场、超市、药店等各类款式的购物小票,可输入店名、单号、总金额以及商品明细详情等信息。专项优化轻微褶皱、信息错行、叠字等状况,辨认成果强劲。可用于商品售卖信息统计、购物中心用户积分兑换、企业外部报销等场景,大幅缩小人工录入老本,晋升信息采集准确率。 OCR 3 项卡证辨认能力,新增危险预警性能【能力降级】 行驶证辨认、驾驶证辨认、营业执照辨认新增危险预警性能,可判断上传的卡证图片是否为复印件、屏幕翻拍或被 PS 过等状况,并提醒对应危险类型,高效鉴伪。单次调用价格低至0.009元。 飞桨 EasyDL 零门槛 AI 开发平台 EasyDL 图像上线「批量预测」,批量验证模型成果,大幅缩小调用老本!【能力降级】 性能劣势: EasyDL 图像(含图像分类/物体检测/图像宰割)上线『批量预测』。用户在抉择私有云部署形式时,可在模型训练实现后,将已导入百度智能云 BOS 的图片数据应用该性能快捷验证模型成果。同时,还可通过批量预测主动生成带标签训练数据,疾速实现数据裁减,升高训练老本。 典型场景: 1、须要可视化验证批量数据的模型成果; 2、须要提供大量训练数据进行模型训练的场景。如:电商网站反对对用户 UGC 上传内容进行分类辨认、视频网站为历史存量视频数据疾速打标签等。 EasyDL 桌面版全新降级,自动检测装置环境,匹配最优的训练计划【能力降级】 性能劣势: EasyDL 桌面版上线「训练环境检测」性能,用户在发动训练前,可对如 CUDA、cuDNN、显卡型号等训练环境进行检测,并提供了各训练环境的要求与筹备形式,防止因训练环境不符导致训练失败。 飞桨 BML 全功能 AI 开发平台 ...

March 22, 2022 · 1 min · jiezi

关于算法:AIScience系列一-飞桨加速CFD计算流体力学原理与实践

前言 AI+Science专栏由百度飞桨科学计算团队出品,给大家带来在AI+科学计算畛域中的一系列技术分享,欢送大家关注和踊跃探讨,也心愿气味相投的小伙伴退出飞桨社区,互相学习,一起摸索前沿未知。 作为系列分享的第一篇,本文内容涵盖行业背景与痛点、AI+科学计算畛域的前沿算法、基于飞桨的AI+科学计算产品计划、波及的飞桨框架关键技术以及PINNs办法求解计算流体力学方腔流的案例等。 行业背景与痛点 以后AI技术在CV、NLP等畛域已有了较为宽泛的利用,代替传统办法实现缺点检测、人脸检测、物体宰割、浏览了解、文本生成等工作,在产业界也造成了规模化的落地。然而放眼到更加广大的工业设计、制作等畛域,仍有诸多迷信和工程问题亟待解决。比方对于高层建筑构造、大跨桥梁、海上石油平台、航空飞机等,流体和构造的简单相互作用会引起能源荷载,进而导致抖振、涡振、驰振、颤振等流致振动,影响构造平安与退役年限。数值模仿是钻研工程构造流致振动的无效办法之一,然而传统数值办法须要大量的计算资源,在计算速度上有很大的局限性等等。 AI+科学计算畛域 前沿算法与典型利用案例 上述的问题指向了AI+科学计算的倒退: 利用深度学习技术冲破维数高、工夫长、跨尺度的挑战,扭转科学研究范式,帮忙传统行业转型。提到AI办法,大家直观的印象是大数据、神经网络模型搭建与训练。在CV,NLP等畛域中也的确如此,AI办法以数据驱动,训练出神经网络以模仿图像分类、语音辨认等理论问题中隐含的简单逻辑,整体是一个“黑盒”问题。但在解决科学计算相干问题上,应用的AI办法有所变动,除了应用纯数据驱动办法解决问题外,有时候还须要退出一些物理信息束缚,因而,须要更多的畛域相干常识。 具体来看,在科学计算畛域,往往须要针对陆地气象、能源资料、航空航天、生物制药等具体场景中的物理问题进行模仿。因为大多数物理法则能够表白为偏微分方程的模式,所以偏微分方程组的求解成为了解决科学计算畛域问题的要害。神经网络具备“万能迫近”的能力,即只有网络有足够多的神经元,就能够充沛地迫近任意一个连续函数。所以应用AI办法解决科学计算问题的一个思路是训练神经网络以模仿某个偏微分方程组的解函数。应用AI 办法解决科学计算问题,绝对传统办法有一些潜在的劣势: (1)高维问题解决劣势传统办法个别是基于无限差分、有限元、无限体积等办法,求得偏微分方程组的近似解。这些办法面临着“维度劫难”,即计算量随着维度减少快速增长。在AI办法的神经网络中,维度减少带来的计算量减少是线性的。 (2)硬件加速劣势传统办法因为存在串行运算,往往难以使用GPU等硬件进行减速。AI办法中的训练和推理过程都比拟容易施展GPU等硬件劣势。 (3)泛化劣势AI办法解决问题分为训练和推理两个过程,一次训练,屡次推理。借助神经网络的泛化能力,在某些物理参数条件下训练出的网络,在其余物理参数下也能够取得很好的模仿成果。 AI+科学计算畛域中最驰名的办法是PINNs(Physics-informed neural networks)办法,该办法提出一种新的复合型的损失函数,由偏微分方程组局部,边界条件局部,初始条件局部三局部组成。 Lu, L., Meng, X., Mao, Z., & Karniadakis, G. E.(2021). DeepXDE: A deep learning library for solving differential equations.SIAM Review, 63(1), 208–228.  https://doi.org/10.1137/19m1274067 因为退出了物理信息束缚,该办法在没有任何输出数据的状况下,只指定边界条件和初始条件,就能够训练出神经网络拟合指标PDE的解。也有一些学者在原始PINNs办法的根底上进行改良,退出一些数据,造成偏微分方程局部、边界条件局部、初始条件局部、数据局部4局部组成的损失函数,进一步提高神经网络的模仿精度,在3D不可压的流体问题上获得了不错的后果。如下图所示,别离对三种不同case应用PINNs算法基于二维二元察看速度进行了3D流场重建,并计算了三种case中不同方向速度及压力的L2范数相对误差。能够发现PINNs办法能够精准捕获漩涡脱落的不稳定性。 Cai, S., Mao, Z., Wang, Z., Yin, M., &Karniadakis, G. E. (2022). Physics-informed neural networks (PINNs) for fluidmechanics: a review. Acta Mechanica Sinica.  https://doi.org/10.1007/s10409-021-01148-1 ...

March 22, 2022 · 1 min · jiezi

关于算法:数组排序之堆排序c实现

数组排序之堆排序,c++实现问题形容采纳堆排序的办法去排序一个数组{47, 35, 26, 20, 18, 7, 13, 10}数组对应堆的图例,根节点大于左右孩子节点剖析: 组建堆,第i个节点和其左右孩子别离对应第2i + 1和2i + 2下标的数据如何确定堆有几层?如下数组的最初一个值的下标为n其父节点为i,所以存在关系n = 2*i+1 => i = (n-1)/2即第0~i个节点是有子节点,i+1~n个节点是叶子节点首次建堆解决,把树解决层,根节点大于或等于其左右孩子的树首次建堆后的数据是大根堆,然而此时从上往下,从左往右并不是有序的然首次建堆不是有序的,然而此时堆顶元素必定是最大的因而把堆顶元素和数组最初一个元素替换地位,而后剔除掉最初一个元素,从新建堆为此时,除了第一个元素,其余元素都是合乎大根堆关系的,因而,从0开始建堆(不同于一开始的,以每一个小节点建堆,再逐渐组装起来)最初的堆顶元素是最大的,反复7、8步骤,直到全副元素处理完毕。 算法实现#include<iostream>using namespace std;class Heap { private: int arr[10] = {47, 35, 26, 20, 18, 7, 13, 10, 8, 6}; public: void show(); void sort(int n); void sortHeap(int k, int n); // 在以后节点中排序 };void Heap::show() { for (int i = 0; i < 10; i++) { cout<<this->arr[i]<<" "; } cout<<endl;}// n 示意数组长度,k示意该根节点的下标 void Heap::sortHeap(int k, int n) { int i, j, temp; i = k; j = 2 * i + 1; // 操作第k层和其孩子比拟 while(j < n) { // 在数组边界内, 比拟左右孩子,较大的孩子与根节点比拟 if (j < n-1 && this->arr[j] < arr[j+1]) j++; if (this->arr[i] > this->arr[j]) { break; } else { temp = this->arr[i]; this->arr[i] = this->arr[j]; this->arr[j] = temp; this->show(); // 替换后,前面可能存在大于改根节点的值,所以替换后的节点作为根节点,持续比拟,直到条件不成立 i = j; j = 2*i+1; } }}void Heap::sort(int n) { int i, temp; // 从后往前遍历有根节点,最初一个根节点的下标n=2*i+1 => i = (n-1) / 2失去根节点 for (i = (n-1)/2; i >= 0; i--) { this->sortHeap(i, n); this->show(); } cout<<"---"<<endl; // 将堆顶的数值和最初一个未替换过的下标的值替换,失去的下标n-i是目前未解决的最大的数值 for (i = 1; i <= n-1; i++) { cout<<"堆顶"<<this->arr[i]<<endl; temp = this->arr[0]; this->arr[0] = this->arr[n - i]; this->arr[n-i] = temp; // 从新建队,n-i个节点下标后曾经是解决过的值,不须要在堆中解决 this->sortHeap(0, n-i); } this->show();}int main() { Heap heap;// heap.show(); heap.sort(10); heap.show(); return 0;} 后果如下 ...

March 22, 2022 · 1 min · jiezi

关于算法:算法练习131-分割回文串

题目leetcode 131. 宰割回文串 参考题解 算法学习 github地址 办法暴力回溯切分字符串s,切出的子串如果是回文串,则基于子串完结的地位持续往下切,直到越界;如果不是,则此分支谬误。 function isPali(str, start, end) { while (start < end) { if (str[start] === str[end]) { start++; end--; } else { return false; } } return true;}function partition(s) { const res = []; function dfs(temp, start) { if (start === s.length) { res.push(temp.slice()); console.log("res: ", res); return; } for (let i = start; i < s.length; i++) { if (isPali(s, start, i)) { temp.push(s.substring(start, i + 1)); console.log("start:", start, "i:", i, ";temp: ", temp); dfs(temp, i + 1); temp.pop(); console.log("dfs: ", start, i, temp); } } } dfs([], 0); return res;}console.log("result ===>", partition("aab"));逐渐解析第一步 start = 0,i = 0,在本次 dfs 的递归中 把素有单值都取了进去,单值必为回文串; ...

March 22, 2022 · 1 min · jiezi

关于算法:每日算法之数组一

背景通过面试才晓得算法的重要性,无论是社招还是校招,所有大厂都很重视对于算法能力的考查,所以算法几乎就是居家旅行升职加薪的必备技能。而且抛去这些功利的想法,算法的确能够锤炼咱们的逻辑思维和形象的能力,这也是程序猿的根本素养。所以我也想专门开一个专栏记录本人的刷题之路,尽可能深刻的分析问题,了解不同的解题思路,而且我也始终在谋求技术的全面性而不是囿于挪动开发一个畛域,所以也会尽可能应用不同的语言,在实践中去领悟不同语言的个性。最初还有一点就是心愿找一件事件始终坚持下去,通过多年我发现坚持不懈才是这个世界上最难能可贵的品质,一个人的工夫是无限的,一个好的习惯能够让咱们戒掉一个坏的习惯。 这次刷题会对题目进行分类总结,即会依据数据结构数组、链表和树等归类,也会依据算法如分治法和动静布局等进行总结,题目次要来源于leetcode。 题目给定一个整数数组 nums 和一个整数目标值 target,请你在该数组中找出 和为目标值 target  的那 两个 整数,并返回它们的数组下标。你能够假如每种输出只会对应一个答案。然而,数组中同一个元素在答案里不能反复呈现。 示例输出:nums = [2,7,11,15], target = 9输入:[0,1]解释:因为 nums[0] + nums[1] == 9 ,返回 [0, 1] 。输出: nums = [3,2,4], target = 6输入: [1,2]输出: nums = [3,3], target = 6输入: [0,1]解题思路暴力枚举这是一道数组简略题,没有太多简单的算法,次要考查对于根本的数组操作,最先想到的能够通过两次枚举来寻找适合的对象,而且第二次枚举能够从第一次枚举开始的下一项开始。 JAVAclass Solution { public int[] twoSum(int[] nums, int target) { for (int i = 0; i < nums.length; i++) { for (int j = i + 1; j < nums.length; j++) { if (nums[i] + nums[j] == target) { return new int[]{i, j}; } } } return new int[0]; }}Python3class Solution: def twoSum(self, nums: List[int], target: int) -> List[int]: for i in range(0, len(nums)): for j in range(0, len(nums)): if nums[i] + nums[j] == target: return [i, j] return [] C++class Solution {public: vector<int> twoSum(vector<int>& nums, int target) { int size = nums.size(); for (int i = 0; i < size; ++i) { for (int j = i + 1; j < size; j++) { if (nums[i] + nums[j] == target) { return {i, j}; } } } return {}; }};哈希表下面暴力枚举的办法尽管能够解决问题,然而因为须要两次遍历工夫复杂度较高,能够引入哈希表在就义肯定空间的前提下升高工夫复杂度。遍历时先判断哈希表中是否有target - nums[i],如果有就返回后果,如果没有则将数组值作为Key,数组索引作为Value存入哈希表。这里拿数组值作为Key可能会有抵触,然而这不影响最终后果。例如数组[2, 2, 1],目标值为3,最初返回后果为[1, 2],因为雷同的数组值只会在遍历时更新哈希表对应的Key的Value,而判断条件是通过Key,所以不会影响。 ...

March 22, 2022 · 2 min · jiezi

关于算法:拉钩算法突击训练营无密

download:拉钩算法突击训练营https://www.sisuoit.com/2587.... Cocos 合成大西瓜游戏在4个月前,我已经也折腾过一两个礼拜的游戏开发,做的是“合成大西瓜”的游戏: file 应用的脚本语言是Typescript,我发现Cocos的脚本构造和Unity的差不多一样,就比如说游戏对象的生命周期: 在Cocos脚本中export default class Game extends cc.Component { start() {} update() {}}在Unity脚本中using System.Collections;using System.Collections.Generic;using UnityEngine;public class Game : MonoBehaviour{ void Start(){} void Update(){}}当然还有很多雷同的局部,游戏引擎应该都这样设计的吧。 什么是游戏脚本?脚本是什么?对于游戏引擎来说脚本到底干了些什么? 首先要明确一点,脚本对于任何一个游戏引擎来说都是必不可少的局部。 它的主要用途是响应玩家的输出,安顿游戏过程中就产生的事件,实例化图形成果,管制游戏对象的物理行为,还能够为角色自定义AI零碎等等。 Unity中的脚本概念Unity创立脚本Project 面板左上方的 Create 菜单新建脚本抉择 Assets > Create > C# Script 来新建脚本 Unity脚本文件分析using System.Collections;using System.Collections.Generic;using UnityEngine;public class Wall : MonoBehaviour{ // Start is called before the first frame update void Start() { } // Update is called once per frame void Update() { }}MonoBehaviour内置类派生类,用于创立可附加到游戏对象的新组件类型。Update(),解决游戏对象的帧更新。Start(),脚本初始化的地位。Unity中的PrefabsPrefabs中文翻译过去是预制件,它个别用于想在运行时实例化简单的游戏对象或游戏对象的汇合时应用,它十分不便,与应用代码从头开始创立游戏对象相比,有以下长处: ...

March 21, 2022 · 1 min · jiezi

关于算法:九章算法班-2021-版无密

download:九章算法班 2021 版备:https://www.sisuoit.com/2399.... Cocos 合成大西瓜游戏在4个月前,我已经也折腾过一两个礼拜的游戏开发,做的是“合成大西瓜”的游戏: file 应用的脚本语言是Typescript,我发现Cocos的脚本构造和Unity的差不多一样,就比如说游戏对象的生命周期: 在Cocos脚本中 export default class Game extends cc.Component { start() {} update() {}}在Unity脚本中using System.Collections;using System.Collections.Generic;using UnityEngine;public class Game : MonoBehaviour{ void Start(){} void Update(){}}当然还有很多雷同的局部,游戏引擎应该都这样设计的吧。 什么是游戏脚本?脚本是什么?对于游戏引擎来说脚本到底干了些什么? 首先要明确一点,脚本对于任何一个游戏引擎来说都是必不可少的局部。 它的主要用途是响应玩家的输出,安顿游戏过程中就产生的事件,实例化图形成果,管制游戏对象的物理行为,还能够为角色自定义AI零碎等等。 Unity中的脚本概念Unity创立脚本Project 面板左上方的 Create 菜单新建脚本抉择 Assets > Create > C# Script 来新建脚本 Unity脚本文件分析using System.Collections;using System.Collections.Generic;using UnityEngine;public class Wall : MonoBehaviour{ // Start is called before the first frame update void Start() { } // Update is called once per frame void Update() { }}MonoBehaviour内置类派生类,用于创立可附加到游戏对象的新组件类型。Update(),解决游戏对象的帧更新。Start(),脚本初始化的地位。Unity中的PrefabsPrefabs中文翻译过去是预制件,它个别用于想在运行时实例化简单的游戏对象或游戏对象的汇合时应用,它十分不便,与应用代码从头开始创立游戏对象相比,有以下长处: ...

March 21, 2022 · 1 min · jiezi

关于算法:春分耕种时AI现身田间地头

“燕草如碧丝,秦桑低绿枝”——经验了寒冬后,人们对春日的渴望更胜从前。二十四节气中的春分已至,气象变得更加温和、雨水充分,动物复苏成长,更是农作物收获的好时节。 现在,传统的农耕因为科技元素的融入呈现出全新模样。在 AI 的加持下,农田耕种、动物成长、病虫害检测等场景有了更精准、高效的伎俩和形式,助力农作物生产提质增效。 农业智能化摸索前行的路上,也有着百度飞桨深度学习开源开放平台的身影。 现代化智能动物工厂: 让农学专家产能翻番的 AI 助手 位于北京市大兴区长子营镇,由裕农、京东方后稷、百度飞桨一起打造的现代化水培动物工厂中,AIPaaS 利用零碎应用极少人力就能生产出数倍于以往的陈腐蔬菜。 已经,动物工厂中的农业专家每天要走上两三万步巡逻蔬菜成长和虫害状况,连过年也不能劳动。在引入百度飞桨企业版 AI 零门槛开发平台 EasyDL、与京东方后稷 AIPaaS 零碎联合后,两大难题失去解决: 针对蔬菜成长状态辨认, 基于百度视觉技术和深度学习算法构建了克重辨认模型,能通过图片判断蔬菜分量,进而判断成长是否衰弱、是否采收,自动化辨认成果准确率达到95%以上。主动预警病虫害。 通过飞桨 EasyDL 平台的指标检测模型,最终实现自动识别基于黄蓝背景板下的小菜蛾、白粉虱、潜叶蝇、蝇四类常见昆虫,辨认精度达到90%,能第一工夫发现害虫、升高损失。农业专家从以前一个人能照看20亩地,到利用飞桨 AI 技术后能独自照看60-100亩地,工作效率进步了3-5倍。在这个现代化智能动物工厂里,“新农人”农业智能化的畅想正在变成事实。 智能农田作业机器人: AI 让水稻耕种不再弯腰伸手 长期以来,因为水稻须要在泥中作业的个性,人工种植的效率无限。智能设施的呈现尽管解决了效率问题,但水稻按列种植的特点对设施的主动导航又提出了新要求。 面对简单的水田环境,苏州博田自动化技术有限公司的技术人员综合剖析稻田图像特点,基于飞桨研发了水田导航线自动检测零碎。 利用飞桨图像宰割开发套件 PaddleSeg 中的 ICNet 模型将秧苗按列从背景中宰割进去后,以此为根底实现秧苗列中心线的精准提取,准确率能达到95%以上,解决每帧图像消耗的工夫仅 300ms 左右(包含 ICNet 网络的宰割预测工夫和后续导航线提取的工夫),满足了农机作业环境下的速度要求。 自动检测系统配上 GPS,苏州博田农业机器人曾经实现从出库到入库全程主动导航的无人化作业,大大减少了人力物力的投入,为农民的耕作效率、衰弱等提供了保障。 智能虫情测报: AI 慧眼识虫、虫口夺粮 病虫害是农作物增产的次要诱因。依据全国农技核心组织的专家剖析会商,预计2022年小麦、水稻、玉米、马铃薯等次要粮食作物重大病虫害呈重发态势,对70%以上的粮食作物产区构成威胁。 以往农业生产的虫情检测工作多依附业余技术人员实现,要消耗大量人力物力;关注农药应用的科学性和安全性,实现针对病虫害的品种隔靴搔痒也非常要害。 在此背景下,宁波微能物联科技有限公司借助飞桨 AI 开发能力,自主开发了一套微能云智能虫情测报零碎,不仅可能检测虫情,还能有针对性调整农药、化肥配比与投放。 首先,将害虫吸引到灯下进行灭活、拍照,通过虫情监测零碎将图片主动保留并上传至云端服务器,通过调用基于飞桨 EasyDL 物体检测模型开发的害虫计数与品种辨认模型 API 接口,针对六种水稻常见害虫进行分类与统计,测报人员即可在管制平台轻松获取这些数据,领导水稻田内农药、化肥的应用配比与相干操作。 目前,这套零碎在宁波的水稻田中已实现了利用,帮忙农业种植户近程自动化采集虫情信息、精确地预测虫害的产生,同时也提供了迷信用药的数据根据,从而升高农药的应用,进步农作物的品质。 食粮生产是农业倒退的重中之重,农业科技翻新与智能化程度的晋升能为食粮减产提供短缺的能源和保障。2022年地方一号文件明确指出,放慢倒退设施农业,推动水肥一体化、饲喂自动化、环境管制智能化等设施配备技术研发利用。作为中国首个自主研发、功能丰富、开源凋谢的产业级深度学习平台,百度飞桨将继续深耕智慧农业畛域,以 AI 技术赋能农业生产的各个环节,拓展智能农业设施的利用场景。

March 21, 2022 · 1 min · jiezi

关于算法:雄安新区设立看天翼云以数字底座托起未来之城

2017年4月1日,决定设立河北雄安新区。千年大计,国家小事,在这几年工夫里,雄安新区始终以科技力量探路将来。作为雄安新区政务云的建设者,中国电信天翼云与雄安并肩,全面打造“数字雄安”绿色智慧新城,独特擎起建设“数字中国”的接力棒。 雄安新画卷,热土涌春潮。正如雄安新区首席信息官、雄安新区智能城市翻新联合会秘书长张强所言:“从数字城市的布局期到设计建设期,雄安新区的数字城市要造成热气腾腾的场面。”他还介绍道:“一大批企业积极参与新区智能城市的布局和建设,帮忙新区高起点、高标准建设数字城市”。在这当中,中国电信在四年工夫里,继续为雄安新区搭建高速网络,布局“新基建”我的项目,摸索5G和云利用,以高质量服务助推雄安新区智慧城市建设。 在云计算方面,天翼云充分发挥中国电信云网交融劣势,建设中国电信天翼云雄安专属云资源池,为雄安新区提供平安、可信牢靠的政务云平台,建设全域笼罩、万物互联的利用零碎体系,提高新区古代办公、城市治理、工程建设、公共交通、惠民服务、企业诚信体系和社会征信体系建设等畛域智能化程度。同时,天翼云还以弱小的云平安能力,保障雄安新区政务云平台平安、高效运行,全方位为新区智慧城市建设工作赋能。截至2020年年底,新区、三县已有共4305项政务服务和360项便民利用上线,实现与省9个垂直治理部门信息系统对接和数据回传,开发上线“雄安智慧社保”网站、APP,待遇资格认证、参保补贴查问、口粮补贴查问等38个事项实现在线办理。 据理解,天翼云聚焦国家新型智慧城市建设,公布智慧政务、智慧生态环境等一系列智慧利用产品,凭借着弱小的政务云建设能力和优良的市场倒退能力,位列我国政务云“领导者”象限。目前,天翼云已在全国范畴内承建了11个省级政务云平台,笼罩了100余个地市,打造了1000余个智慧城市我的项目。 现在的新区大地,正是塔吊林立、卡车穿梭,一派忙碌现象。年老的雄安,正秉承新区策略定位,放慢5G、云计算、大数据等新型基础设施建设,向“智慧雄安”阔步前进。

March 21, 2022 · 1 min · jiezi

关于算法:RepLKNet不是大卷积不好而是卷积不够大31x31卷积了解一下-CVPR-2022

论文提出引入多数超大卷积核层来无效地扩充无效感触域,拉近了CNN网络与ViT网络之间的差距,特地是上游工作中的性能。整篇论文论述非常具体,而且也优化了理论运行的体现,值得读一读、试一试 起源:晓飞的算法工程笔记 公众号论文: Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs 论文地址:https://arxiv.org/abs/2203.06717论文代码:https://github.com/megvii-research/RepLKNetIntroduction 在图像分类、特色学习等前置工作(pretext task)以及指标检测、语义宰割等上游工作(downstream task)上,卷积网络的性能一直被ViTs(vision transformer)网络超过。人们普遍认为ViTs的性能次要得益于MHSA(multi-head self-attention)机制,并为此进行了很多钻研,从不同的角度比照MHSA与卷积之间的优劣。 解释VisTs与CNNs的性能差别不是这篇论文的目标,绝对于钻研MHSA和卷积的差别,论文则关注于ViTs与CNNs在构建长距离地位关系(long-range spatial connections)的范式上的差别。在ViTs中,MHSA通常应用较大的感触域($\ge 7\times 7$),每个输入都能蕴含较大范畴的信息。而在CNNs中,目前的做法都是通过重叠较小($3\times 3$)的卷积来增大感触域,每个输入所蕴含信息的范畴较小。 基于下面发现的感触域差别,论文尝试通过引入大量大核卷积层来补救ViTs和CNNs之间的性能差别。借此提出了RepLKNet网络,通过重参数化的大卷积来建设空间关系。RepLKNet网络基于Swin Transformer骨干进行革新,将MHSA替换为大的深度卷积,性能比ViTs网络更好。另外,论文通过图1的可视化发现,引入大卷积核绝对于重叠小卷积能显著晋升无效感触域(ERFs),甚至能够跟ViTs一样可能关注形态特色。 Guidelines of Applying Large Convolutions 间接应用大卷积会导致性能和速度大幅降落,论文通过试验总结了5条高效应用大卷积核的准则,每条准则还附带了一个备注。 Guideline 1: large depth-wise convolutions can be efficient in practice. 大卷积的计算成本很高,参数量和计算量与卷积核大小成二次方关系,而深度卷积恰好能够补救这一毛病。将各stage的卷积核从$[3,3,3,3]$规范卷积改为$[31,29,27,13]$深度卷积,仅带来了18.6%的计算量减少和10.4%的参数量减少。 但因为计算量和内存拜访数的比值较低,$3\times 3$深度卷积在并行设施上的计算效率较低。不过当卷积核变大时,单个特征值被应用的次数减少,深度卷积的计算密度则会相应进步。依据Roofline模型,计算密度随着卷积核的增大而增大,计算提早应该不会像计算量那样减少那么多。 Remark 1 如表1所示,目前的深度学习框架对深度卷积的实现较为低效。为此,论文尝试了不同的办法来优化CUDA内核,最初抉择了block-wise(inverse) implicit gemm算法并集成到了MegEngine框架中。绝对于Pytorch,深度卷积带来的计算提早从49.5%升高到了12.3%,简直与计算量成正比。 具体的相干剖析和实现,能够去看看这篇文章《凭什么 31x31 大小卷积核的耗时能够和 9x9 卷积差不多?》(https://zhuanlan.zhihu.com/p/...)。 Guideline 2: identity shortcut is vital especially for networks with very large kernels. ...

March 21, 2022 · 2 min · jiezi

关于算法:开学季-飞桨AI-Studio课程学习小白也可以成为一名优秀的算法工程师

如何成为一名算法工程师 从事AI行业须要学习哪些课程 最为高效的学习路线是什么样 很多对人工智能畛域感兴趣的同学都会关注这些问题。不论是出于集体的业余趣味或者是市场高薪资的吸引,算法工程师岗位曾经成为目前工作招聘的热门岗位。算法工程师须要把握根底的开发技能,次要负责解决具体的问题,对于前沿技术的利用和落地。成为一名优良的算法工程师,除了教育背景、逻辑能力和沟通能力外,你还要把握数学知识、编程、机器学习和深度学习等内容。然而很多同学在学校学习到的业余内容并不能使本人齐全在理论岗位上胜任这份工作。 如果你心愿在加入大型较量时 面对丰富奖金不再望而不得 如果你心愿面对心仪导师实验室我的项目时 不再力不从心 如果你心愿在求职找工作前 为本人的简历减少亮点 晋升本人的技能 本次的课程学习不容错过! 飞桨AI Studio已累计100+优质课程,40w人次参加AI相干课程学习。此次课程学习推出算法工程师残缺进阶之路,路线次要分为4个阶段:AI根底学习、AI业余进阶、框架/部署/产业利用和学术前沿,工程师倾囊相授,从入门到精通,助力你成为一名业余算法工程师。 更有丰盛的专题课程:世界冠军手把手带你拆解CVPR大赛、大规模预训练语言模型精讲、图像生成网络GAN的技术演进、Paddle2ONNX开发细节、Linux服务器端高性能本地部署、自监督ViT算法:BeiT和MAE等供你抉择。 在学习过程中记录笔记能够有更好地晋升成果。此次流动咱们不仅仅是课程学习,还有丰盛奖品等你来支付。流动期间你能够分享本人的学习笔记,咱们将会对笔记进行评比,优良笔记能够额定赢取千元京东卡、小度智能耳机和小度熊等礼品!还有AI Studio额定积分等你来拿! 流动详情点击此处获取

March 19, 2022 · 1 min · jiezi

关于算法:又一重量级国赛来啦保研可加分-中国软件杯飞桨遥感赛道正式启动

“中国软件杯”大学生软件设计大赛是一项面向中国在校学生的公益性赛事,是2021年全国普通高校大学生比赛榜单内比赛。 大赛由国家工业和信息化部、教育部、江苏省人民政府独特主办,致力于正确引导我国在校学生踊跃加入软件科研活动,切实增强自我创新能力和理论入手能力,为我国软件和信息技术服务业造就出更多高端、优良的人才。2022年,百度飞桨承办了A组和B组各赛道,现A组已正式公布。 赛题介绍 目前,我国遥感畛域已步入了高分辨率影像的快车道,对遥感数据的剖析应用服务的需要也一劳永逸。传统形式对高分辨率卫星遥感图像的对特色刻画能力差且依赖人工教训工作量微小。随着人工智能技术的衰亡,特地是基于深度学习的图像识别办法取得了极大的倒退,相干技术也推动了遥感畛域的改革。绝对于传统基于人海战术的目视解译办法,基于深度学习的遥感图像识别技术能够主动剖析图像中的地物类型,在准确率和效率方面展现出极大的后劲。 此次赛题由百度飞桨和北航LEVIR 团队独特设计,要求选手应用百度AI Studio平台进行训练,基于国产化人工智能框架——百度飞桨PaddlePaddle框架进行开发,设计并开发一个能够通过深度学习技术实现对遥感图像主动解译的WEB零碎。在这套WEB零碎中,选手须要实现目标提取、变化检测、指标检测和地物分类四大剖析性能,官网将提供每个性能实现所需的训练数据集。 指标提取(应用图像宰割技术对卫星图像中指定对象实现宰割)变化检测(应用图像宰割技术对同区域两个期间的卫星图像变动状况实现剖析)指标检测(应用指标检测技术对卫星图像中指定对象实现检测)地物分类(应用图像宰割技术对卫星图像每个像素实现分类) “变化检测”训练数据集样例 为了让选手更加关注软件系统的开发与设计,其中仅“变化检测”一项性能为算法考核项,须要选手依据官网提供的数据集进行人工智能模型的训练,并将后果上传到AI Studio失去分数,计入总分。 参赛对象 本科生、研究生、高职生可报A组赛题,B组赛题只能高职生报名,目前A组赛题已公布。 A组赛题赛事报名入口 https://aistudio.baidu.com/aist

March 19, 2022 · 1 min · jiezi

关于算法:CSE-11-COVID-Genomic

CSE 11 Winter 2021 PA2 - COVID GenomicSequenceDue date: Tuesday, Jan 19 @ 11:59PM PST(Wednesday, Jan 20 @ 11:59PM PST w/ slip day. If you submit your assignment late, the autograder willautomatically use your slip day if you have any remaining. Note that the README portion of this assignmentcannot be submitted late.)Provided FilesNone.Files to SubmitCovidGenomeAnalysis.javaCovidMutation.javaGoal:Programming Assignment 2 is an introduction to loops and Strings in Java. You will use loops, String methods,and other programming techniques to complete the assignment.Please read the entire write-up before getting started.Some General NotesMost of these notes are necessary for autograding purposes. If any of these does not make sense, youprobably aren't doing anything that it applies to. However, you should make sure that when you do learnabout it later on in this quarter that you follow these notes. We cannot be lenient regarding these thingsbecause everything is autograded to ensure fairness.Make sure to read the autograder output after you submit to Gradescope (wait until the autograder isfinished running).Match the specifications that we provide exactly, otherwise we cannot ensure that the autograder willfunction correctly. This includes file names, class names, method signatures, extending, throwing, etc."Constants" should be defined as private static final variables.Do not use any static variables (other than for constants) or instance variables that are not specified in thiswriteup. We cannot ensure that these do not get clobbered during grading. Any extra variables used shouldbe local only.Unless otherwise specified, do not add any extra import s or use any packages not from java.lang otherthan those implied or explicitly named by this writeup.Do not specify a package for your files. This will cause them to fail to compile with the autograder.Do not add any extra classes to your files and do not write code in files that are not specified.Do not call helper methods except from the class where they are implemented, as we will be using our ownversion of classes during grading (which will only have the instance variables and methods specified in thiswriteup). You shouldn't be able to call them anyway if you have declared them as private .If a behavior (say, on some specific input) is not specified, you may handle that case however you want.However, if you implement a specific behavior for some special case of a required method, do not rely onthat specific behavior when using that method as a helper method (only assume that the method works asspecified in the writeup).For the surveys and AI form, make sure you are filling them out and specifying your email address as theone linked with your Gradescope account. If you fill them out after submitting, you can either resubmit toupdate your score immediately or wait for us to rerun the autograder for everyone after the deadline.Any late submission will trigger a slip day usage for this assignment. There will be no exceptions for"accidents," since we cannot determine if it is an actual accident. If you need special accommodations,please email the Professor directly.Part 1: CovidGenomeAnalysis.javaYou have just been recruited to join a lab working on the genetic sequence of COVID-19 because of the skillsyou demonstrated in the first PA of CSE 11. You will now use your knowledge of computer science to helpscientists develop a more effective and cheaper vaccine for COVID-19.COVID-19 is a RNA virus, meaning that its genetic sequence is made of adenine (A), cytosine (C), guanine (G),and uracil (U). This is in contrast with DNA, which has thymine (T) instead of uracil. However, because ofinstability of RNA, it must be converted into DNA to be sequenced (you can read more about this here). For thisreason, we will use the complementary DNA for this assignment. In other words, the genome we are using willcontain A, T, C, and G.A DNA molecule consists of two strands wound around each other, with each strand held together bybonds between the nitrogenous nucleobases. Adenine (A) pairs with thymine (T) and cytosine (C) pairswith guanine (G).Reference: https://www.genome.gov/geneti...TaskWe want to first run some simple analyses. Given what we know about DNA, if we know the bases on one strandof the DNA, we can know which bases are on the other strand of the DNA. We also want to count the number oftimes a specific base appears on one strand. We will implement this in the main() method of theCovidGenomeAnalysis class in the CovidGenomeAnalysis.java file.public class CovidGenomeAnalysis {...}public class CovidGenomeAnalysis {...public static void main(String[] args);...}InputJust like in PA1, we will read user input from System.in using a Scanner (remember to import it). After a userruns java CovidGenomeAnalysis , they will then type the input to our program. For this assignment, you cansafely assume that this input will come as a single line of characters and that each character before the newlinecharacter is a capital letter A, T, C, or G.OutputAfter reading in the string given as input, we will print some output to System.out . This printout will always betwo items, separated by a single space, followed by a single newline character. The System.out.println() methodwill print whatever String is passed in followed by a newline, so if you use it, you do not need to explicitly add anewline character. If you use System.out.print() , you will need to add the newline character yourself. There is noinvalid input case (as we are assuming the input will always follow the format above), so the first item should bethe number of thymine (T) nucleotides that appear in the strand opposite the input sequence and the second itemshould be the strand opposite the input sequence. The opposite strand's bases should be in the order determinedby the input strand and should only contain the capital characters A, T, C, and G.ImplementationValid InputsYour program should work for any input that does not have characters other than A, T, C, and G.TestingIt would be easier to use input from a file rather than typing the input each time we want to test our code. Wehaven't learned file IO yet, but we can still do it with the Scanner from System.in and pipe input to System.infrom a file.There are many ways to do this, but the following way is pretty system agnostic. First, we will need to makesome file whose raw data has the genome sequence we want to use as input (as the first line). For this purpose,we should use the .txt extension for the file name to avoid confusion. To have the .txt file be read fromSystem.in , we will use a combination of the cat command and | (pipe) symbol.Cat is short for concatenate. If we were to just do cat file.txt , it would display the content of the file.txt to theterminal. We instead then use | to redirect the output of a command to the input of another. In our case, we willbe redirecting the output of the cat command into the input of our Java program.(For Windows users, cat is available in PowerShell. If you are using command prompt, the equivalent commandis type .)Sample Test CaseCase 1Input: ACGTAAGCAOutput: 4 TGCATTCGT (there are 4 letter T s in the output sequence)You can run this case by having a file sample1.txt with the contents:and running the command cat sample1.txt | java CovidGenomeAnalysis .ACGTAAGCAPart 2: CovidMutation.javaYou are recruited to another research lab at UCSD because of your exceptional work in helping create a newvaccine. This lab is on the forefront of a COVID-19 cure using an antibody treatment. This antibody treatmentworks by mutilating the genome sequence of the COVID-19 virus such that the nucleotides formed by thismutilated sequence turn the virus into a clump of molecules that is easily broken down by white blood cells. Theresearch lab has a genome sequence simulator that takes in a string of nucleobases (Adenine, Guanine, Thymine,Cytosine, Uracil) and simulates the DNA/RNA chain of that potential organism and its characteristics. The labhas found that they can control the antibodies to specifically reverse every k (some integer) nucleobases of thevirus. In order to produce this treatment with the greatest efficacy possible, the research lab must find some k thathas the highest rate of weakening the COVID-19 virus.TaskGiven some genome sequence, we want to find the output if we did this k-reversing. Since this technique mightbe used in the future, we will allow it to work for any string and hope the researchers pass in a valid genomesequence for now. Our application should be able to take any string and any integer k and reverse every k-sizedchunk of the string. We will implement this in the main() method of the CovidMutation class in theCovidMutation.java file.InputWe will read user input from System.in using a Scanner (remember to import it). After a user runs javaCovidMutation , they will then type the input to our program. For this assignment, you can safely assume that thisinput will come as a single line of characters followed by an integer on the next line. The first line will be thestring to k-reverse (all you need to know is that it is a string of characters) and the integer k in the next line willrepresent the "chunk size."OutputAfter checking the inputs, we will print some output to System.out . This printout will always be a single stringfollowed by a single newline character. In the case of invalid inputs (see below), print out the original input string(the string from the first line of input). In the case of valid inputs, print out the k-reversed version of the inputstring. If the length of the string is not divisible by k then you should reverse all characters in the remainder of thestring after the last full chunk. If the given k is greater than the length of string, you should completely reversethe entire string (this is a special case of the case described in the previous sentence). Note that this means thatthe output should always have the same length as the input string.public class CovidMutation {...}public class CovidMutation {...public static void main(String[] args);...}ImplementationValid InputsThe first line will be any string (i.e., possibly composed of characters other than A, C, T, and G). The integer inthe second line must be at least 1 to be valid.TestingYou can use the same procedure as for Part 1. The only difference will be that the second line of your input filewill contain the integer representing the chunk size.Sample Test CaseCase 1Input: sequence = ACGTAAGCA ; k = 3Output: GCAAATACG (the k chunk size given is 3 which means we will reverse every 3nucleotides. This gives ACG|TAA|GCA, if we are to reverse each chunk we will haveGCA|AAT|ACG.)Case 2Input: sequence = ACGTAAGCA ; k = 7Output: GAATGCAAC (the k chunk size given is 7 which means we will reverse every 7nucleotides. This gives ACGTAAG|CA, if we are to reverse the first full chunk we will haveGAATGCA. If there is a remainder left that is smaller than the k given, it should be fully reversed. Inthis case we have 2 nucleobases (CA) left and those should be reversed to result in GAATGCA|AC.)StyleCoding style is an important part of ensuring readability and maintainability of your code. We will grade yourcode style in all submitted code files according to the style guidelines. Namely, there are a few things you musthave in each file/class/method: ...

March 18, 2022 · 11 min · jiezi

关于算法:STAC51数据分析

Department of Computer and Mathematical SciencesSTAC51: Categorical Data AnalysisWinter 2021Instructor: Sohee KangE-mail: sohee.kang@utoronto.caOffice: IC 483Online Office Hours: Monday 5-6 pm and Wednesday 5-6 pm(416) 208-4749TA: Bo Chen TA: Lehang ZhongE-mail: bojacob.chen@mail.utoronto.ca E-mail: lehang.zhong@mail.utoronto.caCourse Description: In this course we discuss statistical models for categorical data. Contingencytables, generalized linear models, logistic regression, multinomial responses, logit models fornominal responses, log-linear models for two-way tables, three-way tables and higher dimensions,models for matched pairs, repeated categorical response data, correlated and clustered responsesand statistical analyses using R. The students will be expected to interpret R codes and outputson tests and the exam.Prerequisite(s): STAB27H3 or STAB57H3 or MGEB12H3 or PSYC08H3Credit Hours: 3Required Text: An Introduction to Categorical Data Analysis, 3rd EditionAuthor(s): Alan AgrestiWebLink for 2nd edition: https://search.library.utoron...Sub-text1: Categorical Data with R, 3rd editionAuthor: Alan AgrestiSub-text2: Analysis of Categorical Data with R (2014)Author:Bilder C. and Loughin T.Course Objectives:At the completion of this course, students will be able to: ...

March 18, 2022 · 5 min · jiezi

关于算法:Meetup预告-AIOps指标相关算法体系分享

2016年,Gartner提出了AIOps(智能运维)这一概念,即利用 AI 技术的新一代 IT 运维,旨在通过算法进一步解决企业遇到的运维难题。随后,AIOps 的概念失去了宽泛遍及和倒退。 AIOps是一个穿插畛域,涵盖了AI、零碎和工程常识,其核心技术在数据算法、机器学习方面。AI算法专家须要把握比如、档次聚类、随机森林、时序数据合成等能力。 目前随着各行业数字化转型过程放慢,AIOps 市场倒退速度未然迈上了新的台阶。云智慧在AIOps(智能运维)畛域继续深耕,一直在算法上进行摸索和实际。 云智慧智能研究院致力于AIOps前沿技术的钻研,推动人工智能算法与工业场景的深刻交融与落地,承当着云智慧外围智能算法的钻研和工程化研发工作。 本期线上Meetup由云智慧算法总监严川(Kappa Yan) 给大家分享——AIOps 中的指标相干算法体系,利用场景及面临的挑战。 如果你对AIOps充斥渴求,肯定要来赴约本场直播~ 直播预报主题 : AIOps 指标相干算法体系分享 工夫 : 3月24日(周四)18:00-19:00 讲师简介: 严川(Kappa Yan) 云智慧算法总监 北京大学博士后,具备5年以上人工智能从业教训,3年以上AIOPs从业教训,专一于智能运维畛域时序相干场景钻研及落地工作。 直播内容: 1、AIOps(智能运维)算法体系总览 AIOps算法倒退现状 常见的AIOps算法场景 2、AIOps异样检测算法场景深度分析 理解指标异样检测面临的挑战 异样检测算法体系落地场景 3、解读AIOps预测场景 理解指标预测面临的挑战 预测算法体系落地场景 听众收益 理解AIOps前沿算法场景及算法体系理解AIOps算法场景中存在的艰难和挑战、理论工业落地实例报名形式扫描下方二维码,增加小助手微信,备注「324」获取直播链接 对于 MeetupAIOps Developer Meetup是由AIOps社区推出的,面向宽广开发者的系列线上直播及线下分享流动,咱们将汇聚AIOps社区专家团的力量给你提供优质的技术内容,无论是技术解读、开源治理、行业解决方案,置信宽广developers总能在这里找到你想要的内容 AIOps社区是由云智慧发动,针对运维业务场景,提供算法、算力、数据集整体的服务体系及智能运维业务场景的解决方案交换社区。该社区致力于流传AIOps技术,旨在与各行业客户、用户、研究者和开发者们独特解决智能运维行业技术难题、推动AIOps技术在企业中落地、建设衰弱共赢的AIOps开发者生态。 往期回顾上期Meetup由Apache APISIX PMC —张晋涛分享了 《 云原生 的架构及演进》 次要内容回顾: 单体架构向微服务架构的演进微服务架构向云原生架构的演进云原生场景下基础架构的改革和挑战AIOps 如何辅助云原生场景落地视频回放&ppt材料:增加文中小助手,备注“干货”获取。

March 18, 2022 · 1 min · jiezi

关于算法:算法测试探索与实践

本篇将分享,算法测试团队在日常的测试工作中,摸索与积攒的算法测试教训。将会依照以下目录构造进行分享介绍。 机器学习根底简介 大家在学习算法测试之前,首先须要对机器学习基础知识,有一个初步的理解,在此将从机器学习分类,哈啰算法利用场景,要害术语,算法研发步骤,四个维度进行介绍。 机器学习分类机器学习算法,个别分为:监督学习,无监督学习和深度学习。 监督学习监督学习,从给定的一组输出 x输入 y的训练集中,学习将输出映射到输入的函数(如何关联输出和输入),且训练集中的数据样本都有标签(Label)或指标(Target),这就是监督学习。 监督学习个别应用两种类型的指标变量:标称型和数值型。标称型指标变量的后果只在有限目标集中取值,如真与假、动物分类汇合{匍匐类、鱼类、哺乳类、两栖类};数值型指标变量则能够从有限的数值汇合中取值,如0.100、42.001、1000.743等。数值型指标变量次要用于回归剖析。 无监督学习无监督学习,无监督学习和监督学习最大的区别就是无监督学习的训练数据没有标签。 在无监督学习中,将数据汇合分成由相似的对象组成的多个类的过程被称为聚类;将寻找形容数据统计值的过程称之为密度估计。 深度学习深度学习,是一种试图应用由多重非线性变换形成的多个解决层,对数据进行高层形象的算法。它的特征提取并不依附人工,而是机器主动提取的。 哈啰算法利用场景在哈啰,算法根本利用在公司的所有业务场景中,如: 智能调度(单车,助力车和红包车的车辆调度,助力车电池换电调度,算法负责调度工作生成,工作派发)营销算法(波及到两轮智能定价L3,首页Banner个性化定制,智能权利等)位置服务(波及两轮&四轮业务的定位服务,北极星,超区判断,蓝牙嗅探&反嗅探)资产顾全(波及单车,助力车预失联,用户报障,超时未关锁等)司乘生态(波及司机乘客订单匹配,奖励金,完单概率计算,接单工夫预估等)计算机视觉(波及人脸识别,OCR辨认,用户报障等)风控算法(波及行程危险,虚伪订单,身份核实等)自然语言解决(波及VOC工单分类,机器人问答用意辨认,猜你想问等)数科算法(波及渠道举荐,危险分,反欺诈等)要害术语特色:上面测量的这四种值(weight分量,wingspan翅展长度,webbed feet脚蹼,back color后背色彩)称之为特色,也能够称作属性。 指标变量:指标变量是机器学习算法的预测后果,在分类算法中指标变量的类型通常是标称型的,而在回归算法中通常是连续型的。 训练样本:特色或者属性通常是训练样本集的列,它们是独立测量失去的后果,多个特色分割在一起独特组成一个训练样本 常识示意:假设这个鸟类分类程序,通过测试满足精确度要求,是否咱们就能够看到机器曾经学会了如何辨别不同的鸟类了呢?这部分工作称之为常识示意,某些算法能够产生很容易了解的常识示意,而某些算法的常识示意兴许只能为计算机所了解。常识示意能够采纳规定集的模式,也能够采纳概率分布的模式,甚至能够是训练样本集中的一个实例。 分类:它的次要工作是将实例数据划分到适合的分类中 回归:它次要用于预测数值型数据。大多数人可能都见过回归的例子——数据拟合曲线:通过给定数据点的最优拟合曲线。 聚类:是否须要将数据划分为离散的组。如果这是惟一的需要,则应用聚类算法 算法研发步骤 收集数据:咱们能够应用很多办法收集样本数据;筹备输出数据:失去数据之后,还必须确保数据格式符合要求;剖析输出数据:此步骤次要是人工剖析以前失去的数据,这一步的次要作用是确保数据集中没有垃圾数据;训练算法:机器学习算法从这一步才真正开始学习。依据算法的不同,第4步和第5步是机器学习算法的外围,如果应用无监督学习算法,因为不存在指标变量值,故而也不须要训练算法,所有与算法相干的内容都集中在第5步;测试算法:这一步将理论应用第4步机器学习失去的常识信息。为了评估算法,必须测试算法工作的成果。对于监督学习,必须已知用于评估算法的指标变量值;对于无监督学习,也必须用其余的评测伎俩来测验算法的成功率。无论哪种情景,如果不称心算法的输入后果,则能够回到第4步,改过并加以测试。问题经常会跟数据的收集和筹备无关,这时你就必须跳回第1步从新开始;算法预测:将机器学习算法转换为应用程序,执行理论工作,以测验上述步骤是否能够在理论环境中失常工作;算法测试能力建设四大维度从算法研发步骤中,可知,在算法测试品质保障畛域,次要波及:数据&依赖服务质量保障,模型成果品质保障,系统工程品质保障和根底能力撑持建设。四大维度的建设。咱们将在上面逐个铺开介绍。 数据品质保障 在数据品质保障畛域,基于公司以后的数据品质保障能力建设,尚不足业务数据品质监控的能力,在业界称之为“业务数据核查平台”。而算法研发流程对数据品质要求十分高,业界将算法对数据品质的依赖形象的总结成一句话:“数据和特色决定了机器学习的下限,而模型和算法只是迫近这个下限而已”。 因而,算法测试团队,从数据比对的形式登程,针对不同的数据存储介质,不同的应用服务,开发出针对多数据源比对的数据品质监控服务。 截止6月底,曾经接入多条业务线。数据品质监控工作数19个,累计发现问题数25个以上,其中线上问题占比超过70%。造成常态化的线上数据品质监控能力,无效的进步数据品质保障能力和效率。 依赖服务质量保障 算法研发过程,依赖十分多的上下游服务来获取要害数据,特色或参数。比方客服的ASR服务,21年6月之前,始终依赖内部的ASR语音转换服务,然而因为语音转换服务是内部的软件包外部调用的形式提供服务而且返回后果正确与否也很难通过响应内容做剖析,因而,既无奈通过埋点,也无奈通过响应内容来判断内部ASR服务的可用性。始终依赖客服人工报障的形式来感知服务的可用性。 因而,算法测试团队,借鉴线上服务拨测系统的设计思路,开发了服务可用性探测平台,当时筹备好测试语音,如“你好”的语音文件,定时被动向ASR服务发动探测申请,对响应后果做监控。当响应后果非文本“你好”时,被动告警告诉相干责任同学。基于这套计划的落地,ASR服务问题解决时效从2020均匀2.5天,升高到2021均匀30分钟,同比降落99%。 服务可用性探测平台,可能满足不同的服务类型:Web服务,RPC服务,音讯服务,存储服务。监控场景也能够满足:可用性监控,响应工夫监控,语义正确性监控等。目前已有客服平台,供应链,2条业务线接入,2021年上半年累计新增发现11个线上问题。通过及时感知服务可用性,帮忙研发同学疾速定位解决,防止产生线上故障。 模型成果品质保障模型成果品质保障,次要分为两个维度:模型性能评估和模型成果评估。以下将一一介绍。 模型性能评估 模型性能评估,是咱们算法测试团队与一站式AI平台共建的我的项目,由算法测试团队独立研发提供模型性能评估能力,而后在一站式AI平台的模型平台内,为所有的模型,提供自助的模型性能测试服务。 整个流程如下:算法研发同学,在一站式AI平台的模型平台内上传模型,在测试页面,能够提交模型性能测试申请。模型平台会将性能测试申请发送给算法团队提供的模型性能测试服务,在性能测试服务上做性能压测排队,配置解析,压测数据筹备的操作之后,会将压测申请提交到公司的性能压测平台,由性能压测平台发动对特定模型的压测申请,在压测完结后,由性能压测平台返回本次性能压测后果给到模型性能测试服务,模型性能测试服务将保留本次压测后果及具体报告链接,供后续模型平台查看。 预计模型性能评估能力上线后,将为模型平台内近300个模型提供性能压测能力,前置的模型性能评估,可能无效解决线上模型性能问题,同时提前进行模型资源调配,优化资源散布。 模型成果评估 模型成果评估,是模型成果品质保障中,十分重要的一环,依靠模型成果评估,可能帮忙业务方和研发同学及时理解模型的成果品质,并进行后续的优化迭代。 目前,算法测试团队布局落地监督学习模型的成果评估能力建设。整个流程次要波及3个环节:数据预处理,模型数据服务,模型成果评估。 数据预处理次要指,对3种类型的数据进行解决: 人工结构数据,如ASR语音辨认所须要的语料,NLP辨认所需的文本数据,均可通过人工制作,人工标注后,提供给模型进行训练和测试应用。线上离线数据,如保留在hive内的模型埋点数据,电动车用户语音唤醒语料等。这些数据,数据量宏大,数据参数繁冗,因而须要离线数据ETL工作,来实现数据的提取,转换,加载的处理过程。线上实时数据,如实时的模型埋点音讯,通过实时数据计算工作(Flink工作),咱们能够采样获取线上模型的埋点数据,并实时计算出模型成果的评测后果,供业务方和算法研发同学及时感知模型上线后的成果品质,并进行后续的优化迭代。模型数据服务通过数据预处理后,这些通过加工的数据,将会转换成可供模型训练和测试应用的特色和标注数据。对于波及到人工标注环节的数据,咱们建设标注平台来反对人工标注,算法预标注,标注品质评估,标注数据输入等能力。布局21年下半年,标注平台反对ASR语料标注和NLP文本标注的能力。而对于不须要人工干预,可通过内部服务调用主动获取参照后果的数据,咱们提供特色计算平台,对特色数据,参照后果等进行自动化计算封装和数据输入的能力。 模型成果评估在实现模型数据筹备之后,咱们就能够通过模型调用服务,获取被测模型的返回后果,通过模型评测服务的各项计算指标,如ASR语音辨认的指标波及:字正确率,词正确率,句正确率,模型调用性能等指标。对于地图限行模型,波及与参照后果一致性的指标。依靠模型评测服务计算的后果,将会输入Bad Case和评测报告。对于Bad Case将会存储到Bad Case治理平台中,进行后续的Bad Case优化测试,回归集补充,Bad Case状态治理等等工作。 系统工程品质保障 对于算法测试的系统工程品质保障,与业务线的品质保障体系相比,除了算法相干的模型测试,上线前算法模型空跑外,基本一致。在此不做赘述。 根底能力撑持根底能力撑持,亦和业务线的根底能力撑持相一致。在此不做赘述。 (本文作者:陈震) 本文系哈啰技术团队出品,未经许可,不得进行商业性转载或者应用。非商业目标转载或应用本文内容,敬请注明“内容转载自哈啰技术团队”。

March 18, 2022 · 1 min · jiezi

关于算法:EBS2043软件设计

AssignmentsEBS2043Introduction to Software in Econometrics and OperationsResearchOperations Research PartAcademic Year: 2020/2021Period: 3School of Business and EconomicsBachelor©Academic year 2019–2020 Maastricht University School of Business and EconomicsNothing in this publication may be reproduced and/or made public by means of printing, offset, photocopyor microfilm or in any digital, electronic, optical or any other form without the prior written permissionof the owner of the copyright.Introduction to Software in OR / EBS2043 2019–2020 Page 2Contents1 General Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1 Assignments & Deliverables . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Grading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Part A: Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Part B: Integer Linear Programming in Logistics . . . . . . . . . . . . . 102.3 Part C: Combinatorial Optimization . . . . . . . . . . . . . . . . . . . 152.4 Part D: Discrete Network Location Models . . . . . . . . . . . . . . . . 172.5 Part E: Logic Puzzles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Introduction to Software in OR / EBS2043 2019–2020 Page 31 General Information1.1 Assignments & DeliverablesFor this course you have to model, implement, and solve four optimization problems fromdifferent domains.• Part A: non-integer Linear Programming and economic interpretations• Part B: Integer Linear Programming in logistic context• Part C: Combinatorial Optimization problems• Part D: Discrete Network Location models• Part E: Logic PuzzlesParts A-D each contain four different problems. The problems you have to solve dependon your student ID. Here is how you find out which problems you have to solve:• Consider the last four digits of your student ID.• Replace each digit by the remainder when it is divided by four.• The obtained sequence of four numbers shows you which problem you have to solvefrom Parts A-D.Here is an example: your student ID is i6214235. The last four digits are 4235. Wheneach digit is divided by four, the remainders are 0231. Hence you’d have to solve problemsA0, B2, C3, and D1. If you are not sure which problems you have to solve, please let meknow.If you like logic puzzles, you can replace your assigned problem from part A or B by apuzzle problem of your choice from Part E (be aware: they may be harder than they looklike).For each problem you have to submit: ...

March 18, 2022 · 27 min · jiezi

关于算法:算法异或的应用

异或, 雷同则为0, 不同则为1一、异或运算的性质交换律:A ^ B = B ^ A;结合律:A ^ (B ^ C) = (A ^ B) ^ C;恒等律:X ^ 0 = X;归零律:X ^ X = 0;自反:A ^ B ^ B = A ^ 0 = A;对于任意的 X: X ^ (-1) = ~X;如果 A ^ B = C 成立,那么 A ^ B = C,B ^ C = A; 二、异或利用 由自反定律可替换2个数;

March 17, 2022 · 1 min · jiezi

关于算法:COM6509-Assignment

COM6509 Assignment 2 - Deadline: 11:00 AM, Friday 22nd Jan 2021Click for FAQ at the bottom of this document.A. Assignment BriefPlease READ the whole assignment first, before starting to work on it.A1. How and what to submit1) A Jupyter Notebook with the code in all the cells executed, output displayed, andcode documented.2) Upload your notebook to Blackboard before the deadline above. Name your file asCOM6509_Assignment_2_USERNAME.ipynb, where USERNAME should be replacedwith your username such as abc18de3) NO DATA UPLOAD: Please do not upload the data files used. We have a copy already.Instead, please use a relative file path in your code (data files under folder ‘data’), asin the lab notebook so that we can run your code smoothly when needed. So ‘./data/’,instead of ‘/User/username/myfiles/mlai/assignment1/’A2. Assessment Criteria (Scope: Sessions 6-8; Total marks: 30)1) Being able to build complete, reproducible machine learning pipelines from loading datato evaluating prediction performance.2) Being able to design different machine learning models to compare/optimise predictionperformance.3) Being able to perform exploratory data analysis to gain insights.A3. Late submissionsWe follow the Department's guidelines about late submissions, i.e., a deduction of 5% of themark each working day the work is late after the deadline, but NO late submission will bemarked one week after the deadline. Please see this link.A4. Unfair meanPlease carefully review the handbook on what constitutes Unfair Means if not sure.B. Assignment on Fashion-MNIST [30 marks]Fashion-MNIST is a dataset of Zalando's article images, with examples shown above. Itconsists of a training set of 60,000 examples and a test set of 10,000 examples. Each exampleis a 28x28 grayscale image, associated with a label from 10 classes: 0=T-shirt/top; 1=Trouser;2=Pullover; 3=Dress; 4=Coat; 5=Sandal; 6=Shirt; 7=Sneaker; 8=Bag; 9=Ankle boot. In thisassignment, we will work on this dataset.● You will make several design choices (e.g. hyperparameters) in this assignment. There areno “standard answers”. You are encouraged to explore several design choices to settledown with good/best ones, if time permits.● The assignment questions specify the tasks and you make your design choices tocomplete the tasks. You are free to use scikit-learn or pytorch, batching or nobatching, as long as you can complete the assignment questions.B1. Reproducibility & readability [2 marks]Whenever there is randomness in the computation, you MUST set a random seed forreproducibility. Use your UCard number XXXXXXXXX as the random seed throughout thisassignment. [1 mark]Answers for each question should be clearly indicated in your notebook, e.g., including questionnumbers below in bold such as B2.1a. All code should be clearly documented and explained. [1mark]B2. Supervised learning on Fashion-MNIST [16 marks]We aim to train machine learning models to classify the 10 classes in Fashion-MNIST using thestandard train/test split with decent performance, i.e. much better than the chance level atworst.B2.1 Data loading and inspection [2 marks]a) Use the PyTorch API for Fashion-MNIST to load both the training and test data ofFashion-MNIST. You may refer to similar procedures in Lab 7 for CIFAR-10.Preprocessing is NOT required but you are encouraged to explore and usepreprocessing such as those in the torchvision.transforms API. [1 mark]b) Display at least eight images for each of the 10 classes (8x10=80 images). [1 mark]B2.2 Evaluation metrics [2 marks]Keep a record of the four metrics M1-4 below for each of the six models in B2.3 and B2.4:M1) Training accuracy: the prediction accuracy of the trained model on the training dataset.M2) Testing accuracy: the prediction accuracy of the trained model on the test dataset.M3) Training time: the time taken to train the model (i.e. to learn/estimate the learnableparameters) on the training dataset.M4) The number of learnable parameters of the model.B2.3 Logistic regression [4 marks]If a hyperparameter needs to be set, you are free to choose one that can deliver satisfactoryresults for you.a) Train a logistic regression model on the training set of Fashion-MNIST and test thetrained model on the test set of Fashion-MNIST. Report the four metrics M1 to M 4 andplot a confusion matrix for predictions on the test data. [2 marks]b) Train and test a logistic regression model with L1 regularisation as in a). Report M1 to M4and plot a confusion matrix for predictions on the test data [1 mark]c) Train and test a logistic regression model with L2 regularisation as in a). Report M1 to M4and plot a confusion matrix for predictions on the test data [1 mark]B2.4 Convolutional Neural networks (6 marks)This question asks you to design various convolutional neural networks (CNNs). Only thenumber of convolutional (Conv) layers and the number of fully connected (FC) layers will bespecified below. The CNN in Lab 7 can be a reference but you are free to design other aspectsof the network. For example, you can use other types of operation (e.g. padding), layers (e.g.pooling, or preprocessing (e.g. augmentation), and you can choose the number of units/neuronsin each layer. Likewise, you may choose the number of epochs and many other settingsaccording to your accessible computational power. Reminder: there are no standard answers.a) Design a CNN with two Conv layers and two FC layers. Train and test it as in B2.3a.Report M1 to M4 and plot a confusion matrix for predictions on the test data. [2 marks]b) Design a CNN with two Conv layers and five FC layers. Train and test it as in B2.3a.Report M1 to M4 and plot a confusion matrix for predictions on the test data. [2 marks]c) Design a CNN with five Conv layers and two FC layers. Train and test it as in B2.3a.Report M1 to M4 and plot a confusion matrix for predictions on the test data. [2 marks]B2.4 Performance comparison (2 marks)a) Summarise each of the four metrics from the six models in B2.3 and B2.4 using a bargraph. In total, four bar graphs need to be generated and displayed, one for each metricwith six results from B2.3 and B2.4. [1 mark]b) Describe at least two observations interesting to you. [1 mark]B3. Unsupervised learning on Fashion-MNIST [12 marks]Choose two out of the 10 classes according to your preference and use only the training datafor these two chosen classes to complete all tasks in this section B3. It will be better to finishreading the remaining part of this section before choosing the two classes. Again, you maychoose any two and there is no “correct” answer about which two to choose but some choicesmay make your studies below more interesting than others.B3.1 PCA and k-means [7 marks]a) Apply PCA to all images of these two chosen classes. Visualise the top 24 eigenvectorsas images and display them in the order of descending corresponding values (the onecorresponding to the largest eigenvalue first). [2 marks]b) Use the top 24 PCs to reconstruct 30 images, with 15 from each class (any 15 imagesare fine from each class). Compute and report the mean squared error between thereconstructed and original images for these 30 images (a single value to be reported).Show these 30 pairs of reconstructed and original images. [2 marks]c) Plot the PCA representations of all data points in a 2D plane using the top two PCs. Usedifferent colours/markers for the two classes for better visualisation (Hint: You need touse the class labels here for visualisation). [2 marks]d) Use k-means to cluster all data points as represented by the top two PCs (clustering oftwo-dimensional vectors, where each vector has two values, PC1 and PC2). Visualisethe two clusters with different colours/markers and indicate the cluster centers clearlywith a marker in a figure similar to question c) above. [1 mark].B3.2 AutoEncoder [4 marks]a) Design a new autoencoder with five Conv2d layers and five ConvTranspose2d layers.You are free to choose the activation functions and settings such as stride and padding.Train this new autoencoder on all images of these two chosen classes for at least 20epochs. Plot the mean squared error against the epoch. [2 marks]b) Modify the autoencoder in 3.2a so that the code (bottleneck) has a dimension of 2 only.Plot the 2-dimensional representations in terms of this autoencoder code for all datapoints in a 2D plane as in 3.1c and cluster them as in 3.1d, showing similar colour/markervisualisation. [2 marks]B3.3 Observation [1 marks]Describe at least two observations interesting to you from B3.1 and B3.2 above.The END of AssignmentC. FAQ or further clarification ...

March 17, 2022 · 7 min · jiezi

关于算法:沈阳飞桨领航团Meetup邀请你来探索AI如何赋能智慧城市

3月19日,沈阳飞桨领航团将举办线上开发者Meetup,邀请百度高级研发工程师和飞桨开发者一起探讨飞桨在交通管理和城市卫星遥感影像中的利用。 欢送沈阳的开发者敌人报名参加,看AI如何描述智慧城市的模样~ AI助力交通与遥感 赋能智慧城市建设 2022年3月19日 14:00-16:30 因疫情起因,本次流动通过线上会议进行 \ 精彩预报 在智慧城市利用场景中,对车辆进行跟踪是实现交通智能化管控的利用之一。百度高级研发工程师见见基于PaddleDetection带来从模型训练、评估、预测、优化到部署的多类别车辆跟踪全流程计划,通过梳理模型抉择、精度以及速度优化的思路,帮忙开发者更加高效地解决相干问题,并提供根本的服务器端部署指南。 为解决城市卫星遥感影像通过插值对尺寸扩增后产生的低分辨率问题,为城市道路提取、遥感影像车辆检测、修建变化检测等上游工作提供品质更高的数据,飞桨开发者、同济大学硕士研究生孔远杭带来基于PaddleGAN的图像超分重建计划,通过迁徙学习及训练、模型评估与调用,疾速助力单幅遥感影像的超分辨率重建。同时还会带来基于PaddleGAN复现CNN-based的超分辨率网络的办法。 除此之外,还有零门槛上手AI开发平台的Workshop环节等你来玩,本次流动将率领大家上手实操车辆智能分类我的项目,欢送感兴趣的敌人来体验哦(自备电脑)~ 惊喜福利 参加流动的每位开发者都可取得一份飞桨提供的精美伴手礼,主讲嘉宾也将现场抽奖,随机抽取侥幸开发者送出惊喜礼品哦! 对于飞桨领航团 飞桨领航团是飞桨开发者趣味社区,面向所有对人工智能和深度学习畛域感兴趣的开发者,提供丰盛的本地技术沙龙、Meetup、及线上交换实际机会。在各个城市/高校飞桨领航团团长及成员的激情反对下,飞桨领航团已在寰球建设200+社群,覆盖全国30个省级行政区、160+高校,汇集超过10000名AI开发者。 退出咱们 无论你在哪个城市、哪所高校,只有你对技术有激情、对开源有趣味、认同咱们的社区文化、违心为社区奉献出工夫/力量/常识/想法,欢送报名退出飞桨领航团! 咱们期待你具备以下专长之一: 技术大咖,相熟深度学习技术与利用,热衷于技术/教训/常识交换与分享 体验专家,会拍照/会摄影,相熟平面设计或视频处理软件,具备肯定的审美能力 ✍公众号小编,相熟技术社区的内容经营,具备肯定的文案采编能力 ♀️业余氛围组,帮助飞桨领航团流动会务及现场组织工作 你将播种: ✔业余技术能力、团队单干能力、协调组织能力、集体影响力的综合晋升 ✔参加组织Meetup等各类流动,与技术大佬面对面交换,拓展人脉网络 ✔收费参加飞桨领航团开发者团聚团建 ✔取得百度飞桨精美周边纪念品 ✔取得百度飞桨官网授予的志愿者证书

March 17, 2022 · 1 min · jiezi

关于算法:天翼云中南数字产业园落地长沙-天心数谷初具雏形

4月27日下午,中国电信集团湖南分公司与长沙天心经开区管委会独特举办“中国电信天翼云中南数字产业园”我的项目签约典礼。本次签约标记着中国电信正式布局中南地区,将来将在长沙天心经开区打造国家级大数据中心和天翼云外围节点。 长沙市委副书记、市长、湖南湘江新区党工委书记郑建新、中国电信集团公司副总经理刘桂清缺席典礼并讲话。中国电信集团湖南分公司副总经理江乘东与天心经开区党工委副书记、管委会主任贺国权代表单方进行我的项目签约。长沙市委常委、市委统战部部长谭小平,中国电信集团公司云网发展部总监吴伟,天翼云科技有限公司副总经理张文强,湖南分公司副总经理万鹏、周小玩,天心区委书记刘汇,天心区委副书记、区长黄滔,天心经开区党工委书记吴江等独特见证。长沙市委市政府、天心区委区政府、天心经开区、中国电信集团湖南分公司及其相干部门负责人缺席。 签约典礼上,中国电信相干负责人示意:“中南数字产业园我的项目定位为具备‘绿色、平安’特色的国家级区域数据中心,长沙将成为继京津冀、长三角、粤港澳、川陕渝之后,中国电信在中南区域的大数据中心和天翼云外围节点,并将成为国家级大数据中心枢纽节点和国家自主可控云的重要组成部分。” 据介绍,中国电信天翼云中南数字产业园我的项目是迄今湖南省内投资最大的新基建我的项目,将来将成为中国电信集团全国第五大区域级数据中心。中南数字产业园建设布局总用地约300亩,总建筑面积28万平米,建成后将具备40万台服务器的云资源能力。一方面,中南数字产业园将成为湖南省将来重要的云网核心,为物联网、云计算、人工智能、区块链等新技术提供算力基础设施;另一方面,中南数字产业园将与5G协同倒退,为车联网、智能制作、智慧医疗、智慧城市等5G利用提供丰盛详实的内容源和数据源,从而全面助力湖南省及周边省份数字产业倒退,晋升整个中南地区的数字化服务水平和能力。 中南数字产业园胜利落地长沙,表明长沙“腾笼换鸟”策略已初具功效,“天心数谷”已初具雏形。随着数据中心成为推动社会高质量倒退不可或缺的数字底座,以5G、云计算、大数据、人工智能等为代表的先进数字技术,将成为减速数据中心向算力基础设施转变,推动“长株潭一体化”建设,全面构建数字经济腾飞倒退新格局的重要力量。

March 17, 2022 · 1 min · jiezi

关于算法:东数西算加快云网与数据融合-天翼云架起云间高速

近日,发改委等部委公布告诉,批准在京津冀、长三角、粤港澳大湾区、成渝、内蒙古、贵州、甘肃、宁夏等八地启动建设国家算力枢纽节点,并布局了十个国家数据中心集群。至此,全国一体化大数据中心体系实现总体布局设计。这标记着“东数西算”工程正式全面启动,成为了继“南水北调”、“西气东输”、“西电东送”后的第四个超级工程。从国家产业布局的层面登程,“东数西算”可能更好地实现东西部区域协同倒退,全国兼顾协调实现规模化、集约化。能源网和算力网高效协同,将在肯定水平上升高经营老本。“东数西算”的绿色实际近年来,“东数西算”在产业层面已有实际。以中国电信云计算贵州信息园为例,早在2013年建设贵州信息园的过程中,中国电信就踊跃推动绿色倒退,高度重视能耗的缩小,致力实现园区能源效率最大化和环境影响最小化,2020年PUE最优值达到了1.18,远低于国家数据中心建设规范要求的1.5。现在,中国电信云计算贵州信息园也成为了国家级数据中心和国家级战略性新兴产业倒退示范基地。在云服务方面,作为建设网络强国和数字中国保护网信平安的主力军,中国电信天翼云早在2020年就明确了“2+4+31+X”云资源布局,该布局与全国一体化大数据中心的国家枢纽节点的选址、业务定位以及外围集群与城市数据中心的分类高度吻合。其中,2指在内蒙古、贵州两个枢纽的内蒙古和贵州数据中心园区,定位为全国数据存储备份、离线剖析的基地;4为京津冀、长三角、粤港澳大湾区和成渝四个枢纽的布局,定位为热点地区高密度人口高频次访问的视频播放、电子商务等实时要求较高的业务承载;31+X为包含甘肃、宁夏两个枢纽在内的为31省及X个重点城市的布局,重点定位为车联网、主动驾驶、无人机、工业互联网、AR\VR等超低提早、大带宽、海量连贯的业务。据理解,X是目前天翼云须要大力发展。X”是接入层面,把内容和存储放到离用户最近的中央,实现网随云动、入云便捷、云间畅达,满足用户按需抉择和低时延需要。“东数西算”工程的全面启动,将有利于天翼云充分发挥云网交融劣势,进一步优化算力资源布局,进一步升高数据中心的经营老本,推动信息基础设施布局的欠缺。将来,中国电信将进一步放慢在八大枢纽节点的征地、建设工作,预计“十四五”末占比达到85%。同时,东西部比例将由当初的7:3优化至“十四五”末的6:4。“东数”如何“西算”“东数西算”意味着数据与算力之间,须要一条云间高速架起跨区域数据桥梁,满足数据中心与各地区之间的通信需要。为此,中国电信基于SDN技术构建起高带宽能力的数据中心互联专网网络,推出了“云间高速”。这一条“云间高速”将中国电信天翼云的资源池连为一体,为用户提供天翼云资源池在全网间的平安、高速、便捷的网络互联通道,让“东数西算”中不同地区的数据可能畅通无阻的飞驰在云间,实现数据之间的高速互访。另外,低时延的网络传输能力也是天翼云以数据中心构建“东数西算”格局的劣势之一。凭借端到端的自动化网络能力,天翼云数据中心能够实现跨区域网络的分钟级部署,晋升算力资源利用率,为用户提供最优业务体验,擘划“东数西算”新蓝图。大数据时代数字化转型曾经进入深入阶段,算力降级的背地是新的技术利用和新场景的迭代,云数赋能的行业将更加宽泛。作为云服务国家队的天翼云将继续助力“东数西算”,赋能更多企业在这场高效与低碳相结合的数字化转型路上放弃当先。

March 17, 2022 · 1 min · jiezi

关于算法:天翼云供应链API安全治理实践获优秀治理实践奖

近日,由工业和信息化部网络安全管理局领导,中国信息通信研究院主办的首届信息通信软件供应链平安社区成员大会顺利召开。大会以“强化软件供应链平安治理 助力信息通信业衰弱倒退”为主题,旨在摸索软件供应链平安治理模式,从本源上防备危险,促使平安治理工作失去真正落实并播种功效。大会现场颁布了4类优秀成果评比后果,天翼云基于边缘云WAF实现API平安治理实际成绩获评“优良治理实际”奖。 随着边缘计算的衰亡,越来越多的业务被部署在边缘,企业对于平安层面的需要也从核心逐步下沉到边缘节点。同时,企业越来越依赖混合云进行协同办公。为了突破数据孤岛,更好地实现资源管理、调度,很多企业抉择以API的模式,凋谢其数字资产,开释企业外部数据和服务的价值。这也造成API平安的物理边界逐步隐没,API服务的一直增长使得针对API的防护火烧眉毛。 此次大会中获奖的天翼云供应链API平安治理实际是基于天翼云自研的边缘云WAF平台。该平台实现了API的平安治理,可能在跨网络物理边界的场景下,对认证拜访用户、治理用户权限、爱护用户敏感信息、防护API破绽等方面做出整合,为企业和上下游供应链提供平安保障。 据理解,天翼云的边缘云能力已全面降级,可能在本地节点、边缘节点、客户节点、业务现场提供丰盛多态的云服务,能够部署到园区、厂区、各级政府等单位的客户机房,也能够部署到中国电信的区域性机房,从而为客户提供低延时、数据本地化的服务。天翼云边缘云WAF平台能够提供一体化的边缘数据认证爱护能力,以全方位的边缘API平安治理能力、分布式边缘平安防护、大数据智能危险剖析三大劣势,为企业API提供平安保障。全方位的边缘API平安治理能力 通过天翼云边缘云WAF平台,用户能够对API的认证受权、防爬虫、限流限速、监控告警、API参数校验、API重写等平安性能进行对立全面的治理,满足用户全网API平安治理的需要。比照传统WAF私有化部署,天翼云边缘云WAF平台提供边缘接入层的流量平安检测,从源头保障网站业务的可信可认证,提供Web平安和API防护的一体化建设。分布式边缘平安防护 天翼云边缘云WAF平台可能帮忙用户将平安能力集成到边缘节点,间隔攻击者更近,让防护更智能。同时,边缘部署的形式能够帮用户实现单点故障主动转移,确保网站的高可用性。此外,边缘云WAF平台联合了DDoS防护性能,可有效应对大规模的API攻打,爱护企业服务的同时产出更多价值。大数据智能危险剖析天翼云大数据平台的劣势就是可能基于全网、全行业流量的攻打数据,并联合机器学习算法,天翼云边缘云WAF平台构建一套智能防护体系,可智能剖析潜在的API攻打类型,并依据企业受到的API攻打类型联动边缘节点进行拦挡,做到防患于未然。数字时代的到来推动着社会各个领域的转型降级,保障软件供应链平安,营造平安可信的网络空间生态环境,曾经变得迫不及待。API作为当今应用程序驱动翻新的根底,其安全性更须要失去亲密关注。天翼云边缘云WAF平台将继续通过多维检测与防护,护航企业云端数据安全,助力千行百业数字化转型。

March 17, 2022 · 1 min · jiezi

关于算法:BEEM061

BEEM061 Main Assignment Part B BriefDecember 21, 2020AbstractYour main assignment (80%) must be handed in by Friday 15thJanuary 2021. It consists of two equally weighted parts: part A) A1,500 word essay based on Topic 2; and part B) A technical task-basedassignment. This document outlines your tasks for Part B, which onits own contributes 40% to your overall module grade. Throughoutthe following tasks you MUST solve them using Jupyter Notebookswhere appropriate, with each line of code stored. You will submityour assignment as a set of documents with your notebooks storedseparately (with the .ipynb extension so that they can be easily verified).You are welcome to store your own code on your own githubrepository or elsewhere, but the .ipynb files must be submitted.11 Explore the Bitcoin Blockchain and BasicWeb Coding(25 marks)1.1 Extract Information From Your Own Transaction(15 marks)❼ Download a Bitcoin SV Wallet (we recommend Centbee) and shareyour address with your module lead, who will then send you a tinyamount (0.001 units of Bitcoin SV, roughly 10 pence).❼ Use this to send an even tinier amount (0.001 units of Bitcoin SV,roughly 1 pence) back (or to another address).❼ Once you have done this, go to your transaction history and find a wayto locate the transaction on the blockchain. Centbee has a feature forviewing the transaction on the blockchain. Take a note of which blockyour transaction is in by taking its block height.❼ From a Jupyter notebook, extract the following information from thesame block by fetching data from the whatsonchain API.https://api.whatsonchain.com/... place blockheight hereYour notebook should fetch, then print your data in JSON format, andyou should obtain the following for the block with your transaction init:– txcount– time– totalFees– confirmations– minerInclude some code that converts the unix timestamp into human readableformat to the nearest second.Explain what each of these parts of the block are in words.21.2 Extract Information from Famous Blocks (5 marks)For the famous transactions below, go through the same process to obtain thetime they occurred, including some code that converts the unix timestampinto human readable format.The First ever transaction from Satoshi to Hal Finney in 2010f4184fc596403b9d638783cf57adfe4c75c605f6356fbc91338530e9831e9e16The Pizza purchase for 10,000BTC in 2010a1075db55d416d3ca199f55b6084e2115b9345e16c5cf302fc80e9d5fbf5d48d1.3 Basic Web Coding (5 marks)Construct your own simple web page in a simple text editor and save at as a.html file that can be read by a web browser like Chrome. This page shouldinclude a javascript function that allows the viewer to change an image backand forth when they click on it.32 Time Series Investigation of Bitcoin Price(50 marks)You are working for a FinTech firm that provides customers with real timefinancial data and analysis. Part of the marketing strategy for this firm isproviding a regular newsletter via a blog discussing current issues for personalportfolio management. Your boss has asked you to investigate the idea thatBitcoin is mostly viewed as a store of value. To provide the background tothis report, you are required to carry out the following:2.1 Obtain Time Series Data (5 marks)Obtain the following data by calling the FRED api from a Jupyter notebook,and provide simple time series plots of the raw data: ...

March 17, 2022 · 5 min · jiezi

关于算法:IB2070数学分析

Page 1 of 5University of Warwick - Warwick Business SchoolJanuary 2021IB2070 – MATHEMATICAL PROGRAMMING IIOpen Book Assessment – 2 hoursInstructions:A. You have 2 hours and 45 minutes in total to complete your assessment and upload it to theAEP. This includes provision for technical delays.B. There are three number of questions. You should attempt every question.C. The number of marks each question will carry is indicated.D. Save your file with your Student ID number and Module Code before submitting via the AEP.E. By starting this assessment you are declaring yourself fit to undertake it. You are expected tomake a reasonable attempt at the assessment by answering the questions.F. Use the AEP immediately to seek advice if you cannot access the assessment, or believe youare registered to the incorrect paper.Other information (read the following very carefully):G. During the online assessment:i. If you have a question about the content of your assessment you should ask theModule Convenor/invigilator via the AEP query system. Do not send an email to them,or to undergraduate@wbs.ac.uk as this will not be answered.ii. If you experience technical difficulties that prevent you from completing andsubmitting your answer file please submit a mitigating circumstances case via my.wbs(or the University’s portal if you are a non-WBS student). Please do not attach youranswer file as it will not be marked.H. Your document:i. You must type your answers in Word (or an equivalent piece of software) and uploadthese to the AEP within a single PDF file.ii. Ensure your answers to each question follow the order in which the questions appearin the exam paper. Start each new question or question part on a new page (orseparate piece of paper, where handwritten answers are required) and write thequestion number at the top of each page of your answers.iii. Include on each page of your file your Student ID Number, the module code and thepage number (using the format ‘x of y pages’ to confirm how many pages in total youare submitting).iv. Within your one, single file you may include images (where required or where youwant to use as part of your answer) (i.e. screenshots, photos or online drawings) ofhand-written calculations, formulae, charts or graphs to complement your answersand show your workings. These images should be embedded at the point in thedocument where they are relevant in your answers.v. Ensure that you label any images you include to inform and assist the marker. Whereyou have handwritten items please write legibly, preferably in dark blue or black ink, IB2070Page 2 of 5and ensure that it is not too faint to be captured by a scan or photograph. Remember,it is your responsibility to ensure that your work can be read.I. Academic integrity (plagiarism/collusion):i. You are allowed to access module materials, notes, resources, other referencematerials and the internet during the assessment.ii. You should not communicate with any other candidate during the assessment periodas this could be interpreted as collusion and may lead to your work being reviewed forcheating. This includes the sharing of the exam paper with other students. Collusion istaken very seriously by the University.iii. To further maintain the academic integrity of online assessments:You are asked to provide details of your working out to indicate your approach toaddressing the questions posed.Warwick Business School reserves the right to viva any students suspected ofcheating.J. Submitting your answer file:i. You have an additional 45 minutes beyond the stated duration of the assessment. Thisis for finalising your answer file, converting to PDF (if relevant), uploading to the AEP –ensure you upload the correct document, and submitting. This includes provision fortechnical delays.ii. This online assessment will close at 11:45am UK time and you will not be able tosubmit your answer file after that time, unless you have Reasonable ExamAdjustments.iii. If you have an agreement that entitles you to additional time (reasonableadjustments), you should see the amount of additional time you have been granted onthe AEP. If you have any queries regarding the amount of additional time you havebeen granted please email exams@wbs.ac.uk.iv. Only documents submitted via the AEP will be accepted and marked.v. Incorrect documents submitted via the AEP may be marked and that mark will befinal. You should therefore use the 45 minutes of extra time granted to ensure yousubmit the correct document.vi. Documents sent via email or through the mitigating circumstances portal will not bemarked.Your assessment starts below. IB2070Page 3 of 5[Question 1](1) Consider the following linear programming problem, in whichandare parameters:(a) Show that the problem is feasible.[3 marks](b) Derive the dual of the problem.[8 marks](c) Find the condition ofandso that the given problem has a finite optimal value. Provideall details of your working. (Hint: check the feasibility of the dual problem.)[12 marks](2) The branch-and-bound algorithm has been used to solve an integer linear programming problem(IP) of maximizing the investment return. Part of the corresponding branch-and-bound tree ispresented below, where the circled numbers are the optimal values of the corresponding LPrelaxations and, of all eight integer variables,andare binary.(a) Unfortunately, there is one misprint among the circled numbers in the branch-and-boundtree. Identify this misprinted number and explain your answer.[5 marks](b) Use the information presented in the tree to provide the best (i.e., smallest) upper bound ofthe optimal objective value of the original IP problem. Explain your answer.[7 marks](Continued…/) IB2070Page 4 of 5[Question 2]The following diagram indicates the costs of travelling along the arcs of the network, where a negativecost represents a profit.(1) Use the Label Correcting Algorithm to find a cheapest directed path from node 1 to node 5.Provide all the working details, including every step and the final result: the shortest path you findand its total length.[12 marks](2) Formulate the shortest-path (i.e., cheapest-path) problem as an integer linear programming (IP)problem, which is a special case of the minimum-cost network flow problem. Provide all the detailsof your formulation: variables, objective function and constraints.[8 marks](3) Construct the dual of the linear program (LP) relaxation of the IP you constructed in part (2).[10 marks](4) Use the dual in part (3) to confirm the optimality of the shortest path you find in part (1).[5 marks](5) The problem in part (2) can be considered as finding a cheapest route for one truck. Nowformulate an IP model for finding the cheapest routes for two trucks (i.e., two directed paths fromnode 1 to node 5 with the minimum total cost): Denote by= {(1,2), (1,3), (2,3), (2,4), (2,5),(3,4), (4,5) } the set of all seven arcs in the above network. Let the travelling costs of one truckand two trucks along arc (,)be and, respectively, for each arc (,)∈ .Provide all detailswithout omission.[15 marks](Hint: Introduce a set of binary variables to define a path for the first truck, a set of binaryvariables to define a path for the second truck, and a set of binary variables to identify thesituation where two trucks use the same arc (,).)[Question 3]Consider the following network, where the number by each arc is the capacity of the arc:(1) Use the Ford-Fulkerson algorithm to find a maximum flow from node 1 to node 5 in the networkabove.[10 marks](2) Identify a cut that can be used to verify the optimality of the maximum flow you have found inpart (1). ...

March 16, 2022 · 6 min · jiezi

关于算法:37242-Optimisation

37242 Optimisation in QuantitativeManagementAssignmentStudents can do this Assignment either individually or in group. Thenumber of students in a group cannot exceed four.QUESTION 1This question is based on the material in the pdf file Pineapple Cannersthat has been emailed to you.You must• formulate a linear programming problem that determines the optimalproduction plan for the Pineapple Canners;• solve this linear program using LINGO;• present the results of your work as a written report.The written report must• clearly describe each variable and each constraint;• present the entire linear programming formulation;• present the LINGO code which was used to solve the linear program;• present the LINGO printouts with the results;• present the optimal production plan.The LINGO code (the linear program in LINGO) must• use sections SETS and DATA;• use the commands @FOR and @SUM.1QUESTION 2This question is based on the material in the section 5.5.2 The Big-MMethod in S.G. Nash and A. Sofer, Linear and Nonlinear Programming.McGraw-Hill, 1996. A copy of this section has been emailed to you as thepdf file Nash and Sofer Linear and Nonlinear Programming.• Study the section 5.5.2 The Big-M Method in S.G. Nash and A. Sofer,Linear and Nonlinear Programming. McGraw-Hill, 1996.• Using the Big-M Method, solve the linear programming problemmin −4x1 − 5x2 + 3x3subject tox1 + 2x2 + x3 = 10x1 − x2 ≥ 6x1 + 3x2 + x3 ≤ 14x1 ≥ 0, x2 ≥ 0, x3 ≥ 0.Show your working.QUESTION 3Let A be an m×n matrix of rank m, m < n, and b ∈ Em. and z and w be the optimal values of the objective functions of the linearprogramsmin xnsubject toAx = bx ≥ 0andmax xnsubject toAx = bx ≥ 0respectively. Prove that for any a ∈ [z, w] there existsx ∈ {y : Ay = b, y ≥ 0, y ∈ En}such that xn = a.2QUESTION 4Consider the linear programming problemAfter introducing slack variables x3 and x4, the simplex method producedthe following final tableau(a) Find d, e, f, g, h, k, and w. Show your working.(b) Find c1, c2, b1, b2, a1,1, a1,2, a2,1, a2,2. Show your working.QUESTION 5Consider the linear programmin cT xsubject toAx ≤ bx ≥ 0where c ∈ En is a nonzero vector, b ∈ Em, m < n, and A is m × n matrix ofrank m. Prove that ifAx0 < b and x0 > 0,then x0cannot be an optimal solution.3QUESTION 6Consider the linear programming problemmin cT xsubject to Ax = bx ≥ 0where A is an m × n matrix of rank m, m < n, b ∈ Em, c ∈ En. Supposethat in the optimal basic feasible solution, obtained according to the PhaseI of the two-phase method, all basic variables are non-artificial variables.(a) What is the value of the objective function for this solution? Justifyyour answer.(b) What is the reduced cost of each basic variable in this solution? Justifyyour answer.(c) What is the reduced cost of each artificial variable for this solution?Justify your answer.(d) What is the reduced cost of each non-artificial nonbasic variable in thissolution? Justify your answer.QUESTION 7Consider the linear programming problem(a) Prove that the feasible region of this linear program has no extremepoints.(b) Convert this linear program into an equivalent linear programmingproblem in standard form.(c) Show that the feasible region of the linear program obtained in (b) hasextreme points. ...

March 16, 2022 · 3 min · jiezi

关于算法:Python-计算从1NN可以任何数内的素数并行计算多线程优化计算

1、我的项目介绍1.1钻研背景随着深度学习的高速倒退,大数据技术的遍及,只管大数据处理越来越风行,然而少不了本地python解决大量数据的场景,这时候单线程解决效率较低。接下来紧接的会遍及倒退的想必就是硬件层面的配合,而GPU无疑是最重要的趋势。 1.2题目介绍并行计算与GPU编程的大作业我抉择的是质数运算方面的我的项目,通过Python实现查找N以内(N随便赋值)的质数个数的程序代码,因为质数统计运算始终是十分经典的问题,也通过这个问题深刻优化算法代码去践行“简化计算,提高效率”的准则,进一步地去探索Python运算能力的多样性以及便利性。 2.我的项目优化过程2.1 源代码依照质数的定义,我编写的源代码如下: 通过引入工夫模块定义并计算出程序的运行工夫,以此与前面的优化做出比照,这里我简略地跳过了while循环的应用,间接应用效率较快的for循环,以此用了嵌套for循环实现根本的N以内质数查找运算。(这里对立给N赋值为200000)运算状况如下: 能够看出根本的一般算法的效率并不高,共计用时162.300478s,这样的查找效率着实使人抓狂,上面咱们对它进行优化。 2.2 优化1 - 数理的角度优化我首先从数理的角度登程对代码进行优化,咱们晓得所有大于10的质数中,个位数只有1,3,7,9,加上这个排除条件之后效率又有所提高,这样能够进一步简化代码,咱们具体来看一下会有多少晋升。批改后的代码如下: 运行状况如下: 果然退出了限度条件,是有点起步,绝对于源代码快了82s。 2.3 优化2 - 数理的角度优化咱们持续从数理层面动手去欠缺代码!因为偶数除了2都不是质数,去除偶数相当于去掉一半运算量,效率进步差不多一半,这样代码能够进一步被优化,批改后的代码如下: 运行状况如下: 这外面通过运行工夫效率能够证实了咱们的推断,共计用时80.789876s,效率绝对于最后的源代码晋升了一半还多!比优化1快了近2s。 2.4 优化3 - 引入列表合数肯定能够合成为几个质数的乘积;质数肯定不能整除1和自身之外的整数;用列表实现,批改后的代码如下: 运行状况如下: 引入了列表的概念进行优化,通过在质数表中去除合数,留下质数,这个速度曾经绝对于后面有了质的飞跃,工夫只用了13.643432s。所以Python外面的列表对于数据运算代码优化想比一般算法耗时都不是一个档次的。 2.5 优化4 - 退出列表后数理的角度优化在引入列表后,咱们持续联合数理方面的概念,因为合数肯定能够合成为几个质数的乘积,质数肯定不能整除1和自身之外的整数,用列表实现代码,批改后的代码如下: 运行状况如下: 这里咱们能够看进去,当数理层面的优化联合相应的Python列表,这样的话只需0.607469s,二十万内的质数统计计算曾经是眨眨眼的功夫了。 2.6 优化5 - 数理的角度优化进一步细化一下数理概念,看还有没有优化的空间,我的想法是例如像15、33、39等这样的数字被称为合数,合数是由两个质数相乘得出的所以只须要计算到一半就能够排除了,同理可知其余数字也实用这一条件;相当于去掉了一半的计算量,效率又肯定水平上进步了。批改后的代码如下: 运行状况如下: 当然,0.329552s,这曾经非常粗疏了,在数理层面我也不想再深究上来,然而如果利用Python的一些密集计算的概念,看能不能帮忙我的我的项目持续“减速”上来! 2.7 优化6 - 多线程优化接下来,使用学习到的多线程根底对这两个新的判断质数代码进行测试,批改的代码如下: 运行后果如下: 如同单纯利用多线程计算起来是不是实现了“1”的冲破,有迈步了一个台阶,然而代码仍须要改良,还须要更谨严一点,但根本必定多线程的起步曾经远远超过了最后的源代码,也肯定意义上实现了代码优化的工作!但还需加深了解,进一步学习! 2.8 优化7 - jit技术优化用pypy的jit技术实现 用时0.296242秒,比之前所有的优化都要优良,曾经非常厉害了 2.9 优化8 - Python cache缓存优化应用Python cache 缓存,给须要缓存的函数加一个润饰器。 第一次调用时,失常执行,并缓存计算结果。 应用雷同的参数,第二次调用时,不执行,间接加载计算结果 ...

March 16, 2022 · 1 min · jiezi

关于算法:Artificial-Intelligence-Assignment-1

School of Computer ScienceThe University of AdelaideArtificial IntelligenceAssignment 1Semester 1 2022Due 11:59pm Wednesday 23 March 20221 PathfindingPathfinding is the problem of finding a path between two points on a plane. It is afundamental task in robotics and AI. Perhaps the most obvious usage of pathfinding isin computer games, when an object is instructed to move from its current position to agoal position, while avoiding obstacles (e.g., walls, enemy fire) along the way.Pathfinding in commercial games is frequently accomplished using search algorithms1.We consider a simplified version in this assignment. The following shows a 2D mapdrawn using ASCII characters:1 1 1 1 1 1 4 7 8 X1 1 1 1 1 1 1 5 8 81 1 1 1 1 1 1 4 6 71 1 1 1 1 X 1 1 3 61 1 1 1 1 X 1 1 1 11 1 1 1 1 1 1 1 1 16 1 1 1 1 X 1 1 1 17 7 1 X X X 1 1 1 18 8 1 1 1 1 1 1 1 1X 8 7 1 1 1 1 1 1 1Given a start position and an end position on the map, our aim is to find a path from thestart position to the end position. The character ‘X’ denotes an obstacle that cannot betraversed by a path, while the digits represent the elevation at the respective positions.Any position is indicated by the coordinates (i, j), where i is the row number (orderedtop to bottom) and j is the column number (ordered left to right). For example, the1http://theory.stanford.edu/~a...Semester 1 2022 Page 1 by Tat-Jun Chintop left position is (1, 1), the bottom right is (10, 10), while the position with elevation‘3’ is (4, 9). Given start position (1, 1) and end position (10, 10), a possible path is ...

March 16, 2022 · 13 min · jiezi

关于算法:回溯法之迷宫最短路径c实现

回溯法之迷宫最短门路,c++实现迷宫的算法很多,然而解释原理的却很少,在这里我利用本人的亲身经历来解说一下求解迷宫的原理 迷宫求解能够利用栈构造,即深度优先,摸索一个地位就标记,通则走不通则后退寻找下一个地位,能够求出通路,简略然而不肯定是最短门路这里求最短门路利用的是广度优先的思维,什么是广度优先,利用队列实现,一个元素出队而后拜访这个元素相邻的所有元素,原理是,一个二维数组,0示意墙,1示意路,这里我利用随机数生成0和1,4个方向在广度优先算法的思维下,队头元素出队,而后广度顺次拜访他的4个方向,顺次入队,并记下他们的前一个坐标在队列中的地位反复直到出对的是起点,在找到起点后,利用每一个地位都有前一个坐标在队列中的下标进行回访,拜访到终点即走了一遍找到的门路,此时便可正向输入门路即可。广度优先拜访的过程就是,假如当初队头是5,5出队后,拜访5的相邻元素,行将6,8,4,2入队,这里是顺时针方向,一次类推。假如这里9个元素全副是路,一开始1入队,而后1出队,拜访周围,2,4顺次入队,前一个坐标是1,2出队,3,5入队,前一个坐标是2,4出队,7入队,前一个坐标是4,3出队,6入队,前一坐标是3,5出队,8入队,前一坐标是8,6出队,9入队,前一坐标是6,拜访了起点9,完结入队,从9开始回访,9->6->3->2->1 即找到最短门路。 #include<iostream>#include<stdlib.h>#include<time.h>using namespace std;struct Node{ int data; int flag;};struct Path { int xpath; int ypath; int pox; //在队列中的下标 }; class Maze { private: int n, m; //迷宫的行和列 Node *maze; //迷宫寄存 Path *que; int top = -1; int front = -1; int rear = -1; public: void create() { int i, j; cout<<"输出迷宫的行和列:"; cin>>n>>m; maze = new Node[n*m]; srand(time(NULL)); for(i = 0; i<n; i++) { for(j = 0; j<m; j++) { int temp = rand()%4; if(temp != 1) maze[i*m+j].data = 1; else maze[i*m+j].data = 0; maze[i*m+j].flag = 0; } } maze[0].data = 8; //设置终点 maze[n*m-1].data = 1; show(); } /*搜寻门路*/ void seek_road() /*先实现一个门路先*/ { //path = new Path[n*m]; int x1, y1; que = new Path[n*m]; //利用广度优先实现最短门路 que[0].xpath = 0; que[0].ypath = 0; que[0].pox = 0; maze[0].flag = 1; rear++; while(front != rear) { int x = que[(++front)%(n*m)].xpath; //获取队头的坐标,而后将其周围的通路进队,晓得操作完队尾元素 int y = que[front%(n*m)].ypath; // path[++top] = que[front]; if(judge_head()) return; if(y+1<m) push_road(x,y+1); if(x+1<n) push_road(x+1,y); if(y-1>=0) push_road(x,y-1); if(x-1>=0) push_road(x-1,y); } cout<<"没有通路!!"<<endl; } void show() { for(int i = 0; i<n; i++) { for(int j = 0; j<m; j++) { if(maze[i*m+j].data == 8) cout<<"■ "; else cout<<maze[i*m+j].data<<" "; } cout<<endl; } } int judge_head() { int k=1; if(que[front].xpath == n-1 && que[front].ypath == m-1) { cout<<"找到迷宫的通路!"<<endl; int x = que[front].xpath; int y = que[front].ypath; int t = que[front].pox; //前一个坐标在队列的下标 while(x != 0 || y != 0) { maze[x*m+y].data = 8; x = que[t].xpath; y = que[t].ypath; t = que[t].pox; k++; } show(); cout<<"门路长度为:"<<k<<endl; return 1; } return 0; } void push_road(int x, int y) { if(maze[x*m+y].data == 1 && maze[x*m+y].flag == 0) { que[(++rear)%(n*m)].xpath = x; que[rear%(n*m)].ypath = y; que[rear%(n*m)].pox = front; //设置上一个坐标在队列中的地位 maze[x*m+y].flag = 1; } }};int main(){ Maze *ma = new Maze(); /*待解决-迷宫最短门路问题*/ ma->create(); ma->seek_road(); return 0;} ps:这是集体学习过程中得领会,如果有谬误得中央,欢送留言揭示,定会及时批改,如果感觉有帮忙,能够加个关注,前面还会有其余算法得原理剖析和代码,也可私聊我哦 ...

March 16, 2022 · 2 min · jiezi

关于算法:Algorithms-Data-Structures

Algorithms & Data Structures 2020/21CourseworkKonrad Dabrowski & Matthew JohnsonHand in by 15 January 2021 at 2pm on DUO.Attempt all questions. Partial credit for incomplete solutions may be given. In written answers,try to be as precise and concise as possible. Do however not just give us the what but also thehow or why.The following instructions on submission are important. You need to submit a number of filesand we automate their downloading and some of the marking. If you do not make the submissioncorrectly some of your work might not be looked at and you could miss out on marks.You should create a folder called ADS that contains the following files (it is important to get eachname correct):• q1.ipynb containing the function hash• q2.pdf containing the written answer to Question 2 and q2.ipynb containing the functionsfloodfill stack and floodfill queue• q3.ipynb containing the functions make palindrome, balanced code and targetsum• q456.pdf containing your written answers to Questions 4, 5 and 6.• q6.ipynb containing the functions InsertionSort, Merge3Way and HybridSortYou should not add other files or organise in subfolders. You should create ADS.zip and submitthis single file. Your written answers can be typed or handwritten, but in the latter case it is yourresponsibility to make sure your handwriting is clear and easily readable. We will use Python 3to test your submissions. Please remember that you should not share your work or make itavailable where others can find it as this can facilitate plagiarism and you can be penalised. Thisrequirement applies until the assessment process is completed which does not happen until theexam board meets in June 2021.1 ...

March 16, 2022 · 6 min · jiezi

关于算法:MAT00027I

Mathematical Skills II (MAT00027I) 2020/21Project 2 – An infectious diseaseThe modelThe infectious disease MS2V-2020 is spreading across the world.∗ The disease is highly contagious:Any person that catches MS2V-2020 will, after an incubation time of 5 days, becomeinfectious and transmit the disease to anyone they come into close contact with. After 7 moredays, the disease subsides and the person is no longer infectious, but they remain immuneuntil 60 days after the original infection. After that, they can catch the illness again. WhileMS2V-2020 does not lead to death, it can cause considerable discomfort for infected persons.It is therefore imperative to understand how the disease spreads in the community.The aim of this project is to produce a computer simulation for the spreading of MS2V- ...

March 15, 2022 · 10 min · jiezi

关于算法:推荐算法基于隐语义模型的协同过滤推荐之商品相似度矩阵

我的项目采纳ALS作为协同过滤算法,依据MongoDB中的用户评分表计算离线的用户商品举荐列表以及商品类似度矩阵。 通过ALS计算商品类似度矩阵,该矩阵用于查问以后商品的类似商品并为实时举荐零碎服务。 离线计算的ALS 算法,算法最终会为用户、商品别离生成最终的特色矩阵,别离是示意用户特色矩阵的U(m x k)矩阵,每个用户有 k个特征描述;示意物品特色矩阵的V(n x k)矩阵,每个物品也由 k 个特征描述。 V(n x k)示意物品特色矩阵,每一行是一个 k 维向量,尽管咱们并不知道每一个维度的特色意义是什么,然而k 个维度的数学向量示意了该行对应商品的特色。 所以,每个商品用V(n x k)每一行的向量示意其特色,于是任意两个商品 p:特征向量为,商品q:特征向量为之间的类似度sim(p,q)能够应用和的余弦值来示意: 举荐算法!基于隐语义模型的协同过滤举荐之商品类似度矩阵数据集中任意两个商品间类似度都能够由公式计算失去,商品与商品之间的类似度在一段时间内根本是固定值。最初生成的数据保留到MongoDB的ProductRecs表中。 举荐算法!基于隐语义模型的协同过滤举荐之商品类似度矩阵外围代码如下: //计算商品类似度矩阵//获取商品的特色矩阵,数据格式 RDD[(scala.Int, scala.Array[scala.Double])]val productFeatures = model.productFeatures.map{case (productId,features) => (productId, new DoubleMatrix(features))}// 计算笛卡尔积并过滤合并val productRecs = productFeatures.cartesian(productFeatures) .filter{case (a,b) => a._1 != b._1} .map{case (a,b) => val simScore = this.consinSim(a._2,b._2) // 求余弦类似度 (a._1,(b._1,simScore)) }.filter(_._2._2 > 0.6) .groupByKey() .map{case (productId,items) => ProductRecs(productId,items.toList.map(x => Recommendation(x._1,x._2))) }.toDF()productRecs .write .option("uri", mongoConfig.uri) .option("collection",PRODUCT_RECS) .mode("overwrite") .format("com.mongodb.spark.sql") .save()其中,consinSim是求两个向量余弦类似度的函数,代码实现如下: ...

March 15, 2022 · 2 min · jiezi

关于算法:分治法之棋盘覆盖复牌问题c实现

分治法之棋盘笼罩复牌问题,c++实现问题形容一个棋盘,其中有一个非凡点,用L的骨牌去笼罩,骨牌不能重叠,如何做到全副笼罩,如图: ![在这里插入图片形容](https://img-blog.csdnimg.cn/3ba0eedd2e5944c1bbd6947513fed6f9.png)实现采纳分治法将一个棋盘划分为4个棋盘,有一个棋盘存在非凡点,其余三个棋盘没有非凡点将没有非凡点的棋盘连贯,用L型骨牌连贯三个棋盘,使每个棋盘变成有一个非凡点的子棋盘反复操作,直到大小为1,算法完结// 将一个棋盘划分为4个棋盘,则有三个棋盘是没有非凡点的,将这三个棋盘用一个L型骨牌连贯,将会失去一个规模小的子棋盘 // # 假如非凡点的下标为dr,dc,棋盘左上角坐标为tr,tc,大小为s // # 初始化,dr = 1,dc = 1;tr = 0;tc = 0;s = 8;// # 棋盘用board[size][size]二维数组示意,size = 2^k #include<iostream>using namespace std;const int n = 8;int t = 1;int board[n][n] = {0};void chessBoard(int tr, int tc, int dr, int dc, int size) { if (size == 1) return; int t1 = ++t; // 牌号+1 int s = size / 2; // 划分棋盘为4块 // 别离针对非凡点的地位对棋盘进行递归 // 非凡点在左上角, tr+s示意左上角棋盘的范畴 // 1. 左上角棋盘解决 if (dr < tr + s && dc < tc + s) { chessBoard(tr, tc, dr, dc, s); // 间接递归 } else { board[tr+s-1][tc+s-1] = t1; // 非凡点 chessBoard(tr, tc, tr+s-1, tc+s-1, s); // 左上角的棋盘非凡标记放在右下角 } // 2. 右上角棋盘解决 if (dr < tr + s && dc >= tc + s) { chessBoard(tr, tc+s, dr, dc, s); // 间接递归 } else { board[tr+s-1][tc+s] = t1; // 非凡点 chessBoard(tr, tc+s, tr+s-1, tc+s, s); // 右上角的棋盘非凡标记放在左下角 } // 3. 左下角棋盘解决 if (dr >= tr + s && dc < tc + s) { chessBoard(tr+s, tc, dr, dc, s); // 间接递归 } else { board[tr+s][tc+s-1] = t1; // 非凡点 chessBoard(tr+s, tc, tr+s, tc+s-1, s); // 左下角的棋盘非凡标记放在右上角 } // 4. 左上角棋盘解决 if (dr >= tr + s && dc >= tc + s) { chessBoard(tr+s, tc+s, dr, dc, s); // 间接递归 } else { board[tr+s][tc+s] = t1; // 非凡点 chessBoard(tr+s, tc+s, tr+s, tc+s, s); // 左上角的棋盘非凡标记放在右下角 } }int main() { board[1][1] = t; chessBoard(0, 0, 1, 1, n); for (int i = 0; i < n; i++) { for (int j = 0; j < n; j++) { cout<< board[i][j] << "\t"; } cout<<endl; } return 0;}如图,数字代表笼罩程序 ...

March 14, 2022 · 2 min · jiezi

关于算法:EIE110数据结构

EIE110Homework 3 -- String FunctionsCS110/EIE110/LP101 2020 Fall1 IntroductionThere are many interesting functions on strings that we can implement, for example the famousCaesar Cipher can encrypt and decrypt strings, whose idea is simply adding or subtracting a fixedvalue from each character, as depicted by the above graph found from the internet[1].We will write a C program that will perform different operations on strings. The program is organizedusing multiple functions and files. The emphasized knowledge aspects in this homework include thefollowing:Function design and implementation.c and .h files. A program with multiple files.Friendly interaction between the computer and the user.User's menu of choicesShow friendly and clear messages to a user.Clear input queue when needed. This is a technique to avoid input confusion.When user's input is wrong, show some error message, and ask the user to do input again.Algorithms. We will implement some simple encryption and decryption algorithms.2 Tasks of this homework2.1 prepare multiple source files.The program should have multiple files, whose names and brief descriptions are listed below.myStrLib.h : function declarations (prototypes) of Section 2.2.strShape.h : function declarations of Section 2.3.strCypher.h : function declarations of Section 2.4.strShape.c : function definitions of Section 2.2.myStrLib.c : function definitions of Section 2.3.strCypher.c : function definitions of Section 2.4.driver.c : function definitions of Section 2.5With these .h and .c files, a function defined in one .c file can be used by another .c file where thecorresponding .h file is included ( #included ).2.2 Implement several string library functionsC standard library provide many helpful functions on string computation, they can be used includingthe head string.h which contains the prototypes of these functions. In this homework we willimplement our versions of some string library functions, with the same parameters, return value, andcomputation process as the standard library functions. At least three library functions should beimplemented, which are described by their prototype and comments as follows:unsigned int my_strlen(const char str[]);/ my version of the strlen() function./char my_strcpy( char dest, const char *src);/* my version of the strncpy() functionhttps://en.cppreference.com/w... */int my_strcmp(const char str1, const char str2);/* my version of the strcmp() function.https://en.cppreference.com/w... */In your program, the three library functions strlen , strcpy , and strcmp should not be used, theyshould all be replaced by your versions.In addition, you can implement other functions in string.h as you like Detailed documents of thestring library functions can be found online at some websites [2]like www.cppreference.com .2.3 Some design and implement several other tool functionsWe wll design several functions related to string input and output. Their prototypes and computationdescriptions are as follows.int input_long_str(char storage[], unsigned int sizeLimit, int endMark)The function will record user's input from the keyboard. The input can contain white spacesand multiple lines.The recorded input is saved in the array storage as a true C string.sizeLimit should be the size (number of elements) of storage . So, at mostsizeLimit - 1 characters of user's input can be recorded in storage , one more spacefor the null character ( \0 ) of a C string.Recording the input will end in two cases:a) The endMark signal appears (returned by getchar() ), which is chosen by user of thisfunction. It could be some special character , or the EOF signal.b) Or, the number of input characters recorded is more than sizeLimit - 1 .void clear_input_queue(void)This function will try to clear the input queue of the stream stdin , as discussed in class.void print_str_at_center(const char str[]);Given a string, which can contain multiple lines, print these lines in some centered way. Forexample, when the augment is a string of three lines:123456789abcgoodthe function will prints:123456789abcgoodThe longest line does not need to indent, but the shorter lines need to indent for some spaces.void print_str_in_rectangle(const char str[], unsigned int row_width);Given a string str , print its characters in a rectangle where each row contains row_widthcharacters.Each newline characters and Tab character in str is printed as a space in the printing. Thereason of this requirement is that, the newlines make the square less compact, not goodlooking; and the width of a Tab character on different systems is different, making theprinting effect less predictable.For example, given the argument stringlong long time agoI can still rememberhow the music used to make me smileWith row_width as 7, the printing will be : ...

March 14, 2022 · 8 min · jiezi