Software Estimation
Software Estimation
July 01
Software Planning Metrics Plan
SOURCE: http://sern.ucalgary.ca/courses/seng/621/W97/johnf/metplan.htm#INTRO_PUR
Software Metrics Plan
For The
Project Planning
Key Process Area Of The
Capability Maturity Model
for
The SENG 623 Company.
Produced by Dale Couprie, John Frankovich, and Bin Li.
Table of Contents.
1. Introduction
1.1. Purpose
1.2. Scope
1.3. Principles
1.4. Maintenance
1.5. Resources
1.6. Acronyms Used
1.7. Definitions
1.8. References.
2. Project Planning Measurement Plan
2.1. Goal 1.
2.1.1. Objectives.
2.2. Goal 2.
2.2.1. Objectives.
2.3. Goal 3.
2.3.1. Objectives.
3. Metric Identification.
3.1. Estimated Component Sizes
3.2. Total Estimated Size
3.3. Actual Total Size
3.4. Estimated Component Times
3.5. Total Estimated Time
3.6. Actual Total Time
3.7. Size Estimation Error
3.8. Time Estimation Error
3.9. Total Number of Planning Activities
3.10. Number of Completed Planning Activities
3.11. Earned Value
3.12. Availability of Human Resources.
3.13. Total Resources Per Day
3.14. Absentee Rate
3.15. Projected Project Lifespan
3.16. Expected Starting Date
3.17. Expected Finishing Date
4. Data Collection
4.1. Collection Procedures.
4.2. Responsibilities.
4.3. Data to be collected.
4.4. Validation of estimates and actual results.
5. Data Analysis.
5.1. Estimaton Accuracy.
5.2. Historical Regression Analysis.
5.3. Productivity Trends and consistency tests.
6. Reports.
6.1. Overall Master Report.
6.2. Planning Progress Report.
6.3. Project Manager's Evaluation Report.
6.4 Absenteeism By Cause.
6.5. Staff Availability Report.
7. Software Continuous Improvement.
7.1. Assess Current Situation
7.2. Analyse Causes
7.3. Develop and Verify Solutions
7.4. Analyse Results
7.5. Formalize Process
7.6. Assess Further Action
1. Introduction.
1.1 Purpose
The Project Planning Measurement Plan (PPMP) documents the project planning measurement activities for the software project. It identifies the software planning measurement activities, the organization and the responsibilities of the groups and individuals responsible, the resources required, and data collection, analysis and reporting procedures.
1.2 Scope
This Project Planning Measurement Plan applies to the SENG 623 project measurement program. Additionally, this document could be used as a template for organizations that are currently implementing a Software Process Improvement (SPI) effort by adopting the Software Engineering Institute (SEI) Capability Maturity Model (CMM).
This metric plan will allow the SENG 623 project and other organizations to :
document their software estimates for use in planning and tracking the software project;
document their software planning activities and commitments; and
provide measure that affected groups and individuals can use to agree to their commitments related to the software planning project.
1.3 Principles
A Software Project Plan defines what the work is, and how this work can be completed. This plan is developed at the beginning of the software project and is continually refined and improved as the work processes. It can be useful to management as a frame work for review and control the process of developing the software. Additionally, the Software Project Plan can define each of the major tasks and estimate the time and resources that are required to complete these tasks.
The Project Planning Measurement program uses the following Software Planning principles as its fundamental framework:
While requirements are initially vague and incomplete, a quality program can only be built from an accurate and precise understanding of the users' needs. The project plan thus starts by mapping the route from vague and incorrect requirements to accurate and precise ones.
A conceptual design is then developed as a basis for the software planning. This initial structure must be produced with care since it generally defines the breakdown of the product into units, and the relationship between them. Since this provides the organizational framework for planning and implementing the work, it is almost impossible to recover from a poor conceptual design.
At each subsequent requirements refinement, resource projections, size estimates, and schedules are also refined.
When the requirements are sufficiently clear, a detailed design and implementation strategy is developed and incorporated in the plan.
As various parts of the project become sufficiently well understood, implementation details are established and documented in further plan refinements.
Throughout this cycle, the plan provides the framework for negotiating the time and resources to do the job. [Humphrey 89]
Following these Project Planning principles, a deep understanding of the software's functional requirements, performance characteristics, system constraints, and reliability concerns can be formed. This allows the Software Planning group to apply techniques and tools to derive effort and time estimates.
Once the group understands what its current Software Planning process is capable of, it may then begin the process of improving the quality of its planning process. The Project Planning Measurement program improve the quality of the planning process by using the following activities:
Determining the projects goals;
Use the Goal Question Metric paradigm framework;
Defining and developing a set of metrics;
Collecting the metrics;
Validating the metrics collected;
Analyzing the metrics;
Reporting the findings;
Using the metrics to initiate Software Planning Process improvements if they are necessary.
Back to the Contents
1.4 Maintenance
The SENG 623 project has the responsibility to ensure that any requirements to this Project Planning Measurement Plan are incorporated in a timely manner.
The Project Planning Measurement Plan will be made available for distribution on the organizations file server. As the Project Planning group makes alterations to this plan, the affected individuals will be notified and they will then be able to download the new plan to replace the older plan. Additionally, an electronic mail address will be set up to allow the affected people to communicate any questions that they may have concerning the current plan. Feedback from the users of the plan is welcome and will provide useful information on the identification of possible oversights and these issuse will be discussed in the Project Planning group.
1.5 Resources
The SENG 623 project will use the following resources:
An Oracle data base (Personal Oracle 7) as the primary tool to store all of the data metrics that will be collected. This provides maximal flexibility on the analyses of the metric data.
A user friendly Graphical User Interface (Omnis 7) will be used as the front end to allow for the data to be entered into the data base. This tools built-in automatic triggers will increase the accuracy of the data that is entered.
A commercial report generating tool (Crystal Reports) will be used to generate visual graphs and all the necessary reports.
1.6 Acronyms
PPMP
Project Planning Measurement Plan
SPI
Software Process Improvement
CMM
Capability Maturity Model
GQM
Goal Question Metric paradigm
KLOC
Thousands Lines of Code
Back to the Contents
1.7 Definitions
The following is a list of definitions that are related to the field of Software Engineering and Software Project Planning. Other definitions that are not used in the Project Planning Measurement Plan are listed primarily to allow give the reader a working vocabulary of indirectly related concepts that may be required knowledge to read the references.
Customer
Any extern or selected internal end-user of the software.
Defect
Defects are errors found later than the review of the phase where it was introduced. A pre-release defect is a defect discovered before release of the software product. A post-release defect is a defect discovered after release of the software product.
Error
Errors are any mistake that results in incorrect software or documentation. Errors include omission or incorrect requirements in a software requirements specification, a developers misinterpretation of a requirement or an incorrect translation from design to code.
Fault
An error's manifestation in software that causes a functional unit of the software systems to fail in performing its required function. Sometimes called a "bug", a fault is part of the code that needs to be fixed. Both errors and defects are considered faults.
Measure
To ascertain or appraise by comparing to a standard; to apply metric.
Measurement
The act or process of measuring; a figure, extent, or amount obtained by measuring.
Phase error
An error found during the review of the phase where it was introduced.
Phase defect
A Phase error that was found by the customer in a released product.
Phase review
The activity of uncovering problems in the documents or source code during one phase of the software life cycle.
Programmer Day.
This is the standard unit that will be used for human resource measurement. One programmer day is equal to eight man hours a day. It can be expressed in decimal format.
Software metric
A quantitative determination of the extent to which a software process or product possesses a certain characteristic.
Software product
Any item created during the software development project. This includes both documentation and source code.
Back to the Contents
1.8 References
[Boehm 1981]
Boehm, B. (1981). Software Engineering Economics. Englewood Cliffs, NJ: Prentice-Hall, Inc.
[Humphrey 1989]
Humphrey W. S., Managing the Software Process, Addison-Wesley Publishing Company, 1989
[Humphrey 1995]
Humphrey, W.S (1997), A Discipline for Software Engineering. Addison Wesley, Reading, M.A
[Iyer 1996]
Iyer, S. (1996), PSP and The Capability Maturity Model. Technical Report, Software Engineering Institute, Carnage Melon University, Pittsburgh, PA.
[McConnell]
McConnell, S. (1996), Rapid Development: Taming wild Software Schedules. Microsoft Press, Redmond, Washington.
[Paulk 1993]
Paulk, M.C., Bill Curtis, M. B. Chrisis (1993), "Capability Maturity Model for Software Version 1.1," Technical Report, CMU/SEI-93-TR-24, Software Engineering Institute, Carnage Melon University, Pittsburgh, PA.
[Pressman 1992]
Pressman, R. (1992). Software Engineering: A Practitioner's Approach. New York: McGraw-Hill, Inc.
Back to the Contents
2. Project Planning Measurement Plan
The goals for this Project Planning Measurement program have been developed using the Goal Question Metric paradigm. The GQM is an approach to identify measurement needs into metrics. The goals that were used correspond to the Software Engineering Institute's Key Process Area of Project Planning of the Capability Maturity Model (Level 2: The Repeatable Level). The actual questions that were generated and their corresponding metrics will follow later in this document.
The following are the goals for the software measurement for the SENG 623 project:
Software estimates are documented for use in planning and tracking the software project.
Software project activities and commitments are planned and documented.
Affected groups and individuals agree to their commitments related to the software project.
Back to the Contents
2.1 Goal: Software estimates are documented for use in planning and tracking the software project.
2.1.1 Objectives
One of the fundamental activities in any software developmental project is the creation of the software plan. When a software project is planned, estimates of required human effort, chronological project duration, and cost can be estimated. Many different tools may be used in the creation of these estimates and their accuracy will have a direct impact on the finished project.
There exist two fundamental reasons why these estimates should be documented in a systematic process:
The estimates are documented so that they can be used through out the projects life as an aid to track the progress of the software development. Thus, the data is not only used at the start of the project to determine a finishing date, rather it can also be used to track the software.
One of the most powerful methods for estimating the effort required to develop a project is to base the estimate on historical data of previous projects.
The Project Planning Measurement Plan (PPMP) will collect the metrics that are necessary to adequately estimate software development effort from historical data while supplying the data needed to track the current software project.
Back to the Contents
2.2 Goal: Software project activities and commitments are planned and documented.
2.2.1 Objectives
The Key Process Areas of the SEI CMM lists certain activities that can be used by an organization to help guide them through the corresponding KPA. Two of the more important "lists of things" to do are:
Commitment to Perform
Commitment to Perform describes the actions the organization must take to ensure that the process is established and will endure. Commitment to Perform typically involves establishing organizational policies and senior management sponsorship.
Activities Performed
Activities Performed describe the roles and procedures necessary to implement a key process area. Activities Performed typically involve establishing plans and procedures, performing the work, tracking it, ad taking corrective actions as necessary [Paulk 1993].
The Project Planning Measurement Plan (PPMP) will provide metrics that can be used by management to asses the progress that the SENG 623 project is making. This allow the management to make corrective actions if the project planning group is not faithful to the SEI CMM key process areas.
Back to the Contents
2.3 Goal: Affected groups and individuals agree to their commitments related to the software project.
2.3.1 Objectives
A major task of Software Planning is the estimation of the resources that are required to accomplish the software development effort. Human resources are the most important and also the most difficult to estimate.
Once estimates are made and allocated to certain groups, the validity of the estimates must be confirmed. It is quite possible for the Project Planning group to over or under estimate the availability of the resources that have been assigned. Additionally, error can occur if the planning group does not fully understand the complexity of the software. The best way to over come these difficulties is to inform the affected groups and individuals to the nature of their commitment. In this early stage of development, corrective actions can easily be taken to adjust the plan to reflect more achievable goals.
Back to the Contents
3. Metric Identification.
The Following metrics have been identified in order to provide answers to these goals. A number of these metrics mentioned below are computed and not collected from the process. Because historical data provides a good base of knowledge for future project planning, we will require a number of these metrics to be kept in a database. Such metrics will be referred to in analyses for the purposes of generation and validation of a metric estimate. We will be required to keep a number of the metrics in the historical database.
For every project, a Work Breakdown Structure shall be developed such that the project tasks are as simple as possible. Each task will be estimated separately, both in terms of time and size.
3.1. Estimated Component Sizes.
For every component of the WBS, we make an estimate of the software size required to implement it. This metric is an array of sizes. Some components may not involve programming and will have a size of zero.
The measurement unit for software size will be KLOC.
3.2. Total Estimated Size.
The sum of the estimated components makes an estimate for the total size of the project.
We will retain this metric in the historical database.
3.3. Actual Total Size.
This is the total size of the final product to be delivered.
We will retain this metric in our historical database.
3.4. Estimated Component Times.
Each component of the WBS also requires a time to complete. Every component of the WBS requires time to complete, regardless of whether any software is produced in a task. This will be an array of times, one time estimate per task.
The unit of measurement is Programmer Day.
3.5. Total Estimated Time.
The Total estimated time for the project will be computed based upon our historical performance and productivity. We may wish to adjust this estimate based on the difficulty of the problem.
We will retain this metric in the historical database.
3.6. Actual Total Time.
This is the summation of the all estimated component times.
This metric will be retained in the historical database.
3.7. Size estimation Error.3.8.Time Estimation Error.
For the purposes of estimation analysis, this is a percentage error in our estimates. It can apply to both size and time. Error is defined as the difference between the estimated value and the actual value divided by the estimated value, then multiplied by 100 to get a percentage.
If the value of this metric is negative, it implies underestimation. Similarly, a positive value for this metric implies overestimation.
3.9. Total number of Planning activities.
This is the total number of CMM KPA activities involved in the software project planning. For the purpose of earned value analysis, each activity may count as one earned value point.
3.10. Completed Planning Activities.
An activity is considered complete when all work associated with that activity is complete.
3.11. Earned Value.
If the earned value for each activity is 1, then the earned value, or project progress, is the percentage of the total activity list that has been fully completed.
3.12. Availability of Human Resources.
This metric will be an array with one element per employee in the firm. The values will be a percent from 0 to 100, representing the percentage of the work day that a given employee can dedicate to this specific project.
Assuming the work day is eight hours, then 50% represents a worker can work four hours a day on the project.
Overtime work can be permitted by letting this value exceed 100 percent, however, normal business rules as to overtime must apply.
Any change in this metric could alter the time span of the project.
3.13. Total Resources Available Per Day.
The availability of human resources array will be summed to to provide the total time in programmer days that the existing staff can dedicate to this particular project.
3.14. Absentee Rate.
The objective of tracking absenteeism in the firm is to provide a historical background of days lost to a firm due to absenteeism. It should be noted that this metric will not be used to evaluate the performance of the individual personnel.
Absenteeism from work affects productivity, causes a loss of available work hours, and can delay a project if not accounted for.
Absenteeism can be caused by illness, family problems, bad weather, or other circumstances beyond control of the firm. All absences from work are assumed to be justified.
Scheduled holiday time does not constitute absenteeism, and is accounted for by the changeability in the available resource array.
The absenteeism rate will be defined as the proportion of total programmer days lost because of absenteeism.
In event of tardiness, a proportion of the programmer day missed is added to the lost time register in the historical database.
3.15. Projected Project Time.
This is the lifespan of the software development process for a given project. The Total estimated time is divided by the total resources per day available.
This measurement of the expected lifespan of the project is in calendar days. To account for absenteeism, the historical absentee rate represents a proportion of days that can be added on.
The Company's historical absentee rate will play a role in projecting the lifespan of the project.
3.16. Expected Starting Date.3.17. Expected Finishing Date.
These are the target calendar dates for the start and completion of the project.
The start date is decided upon by the Planning committee.
The computation of the completion date is based on the start date and accounts for company policy regarding weekends and holidays.
Back to the Contents
4. Data Collection.
4.1. Data Collection Procedures.
Data collection from the planning process may be time consuming, however, during the planning phase of the project, the data to be collected is inherent in the proceedings. The Planning Metrics Program will make full use of all data identified. Some metrics will be computed from existing metrics as the effectiveness of the Planning process is evaluated.
The Project Planning Group will supply a form that the Project Planning Manager will be able to use to interact and enter data into the Oracle database. This additionally will allow the Manager to view existing data and edit it. It is very important to note that all security features normally associated with database technology will be fully implemented to provide access to authorized personnel only.
The following is a sample screen shot of the data entry form as it is currently envisioned. The tool that was used to create this form will allow the Planning Group to easily introduce new fields or remove redundant fields.
4.2 Who is Responsible to collect data?
The Responsibility for Data Collection should be assigned as follows:
The Project Planning Manager will take on the responsibility for collecting all size estimates, time estimates, and scheduling details. He shall be the hub of activity for this Planning Metrics Program.
For software sizing purposes, Software Quality Assurance shall develop a coding standard and program counting standard. It is the programmers' responsibility to provide SQA a consensus on what should be included in the standard.
Configuration Management shall be responsible to measure and report the code size for the project.
It is the responsibility of each developer working on the project to produce a time recording log of his activities for each day and report to the project planning manager. The developer is also required to provide details of his availability for a new project when required.
The Accounting and Payroll department will keep track of times of employee abscence and a reason for the absence. For the purposes of this Plan, all absences will be assumed as excused.
Back to the Contents
4.3. Data To be Collected.
The following data is required to be collected. This however is not the entire list of metrics proposed. Many of these are computed.
3.1, 3.2. Size Estimation Data. The size of each component of the Work Breakdown Structure is estimated. These components are then summed to produce an estimate for the total size of the project.
3.3. Actual Total Size. Prior to the software's release, the source code is measured to find the actual size of the finished project. This is compared to the estimate in a mathematical analysis.
3.4, 3.5. Time Estimation Data. Time in each WBS component is estimated and then summed to produce the estimate of project time in man hours.
3.6. Actual Total Time. As per size, the total time spent on the project is summed from time logs of the employees working on the project. This result is compared to estimates.
3.9. Total Number of Planning Activities. The total number of activities in the CMM KPA are counted and recorded in a journal. Each can be assigned an earned value.
3.10. Number of Completed Planning Activities. Activities are marked completed as they are done. Their earned values are recorded and added to earned values of other fully complete activities to produce a cumulative earned value.
3.12. Availablity of Human Resources. Each developer is requested to specify the proportion of his working day on this new project.
(3.14). Time lost to absenteeism. If someone is absent from work, the portion of the programmer day missed must be reported for historical reasons. This must be shown in the time log.
Back to the Contents
4.4 Measurement Validation.
It is paramount that the Code counting program used for software size measurements works in accordance with the coding and counting standard designed by the SQA group. In verifying the measurement device's count small code files should be counted by hand and then counted by the device.
One of the strongest points in using Oracle database technology is the availability of triggers. A trigger is a stored section of code that is associated with a single field and is automatically executed each time data is entered into that field. This allows the Planning Group to implement strict validation of data before it is entered into the database.
Back to the Contents
5. Data Analysis.
The metrics that are collected can be used for the following Analyses.
5.1. Estimation Accuracy.
Estimation is a naturally error prone process. It is desired though that such errors are minimal. Our desire is to have our estimation under statistical control so that the risk of a delay or overrun is low. Let ERROR = (EST-ACT)/EST * 100. This is the percentage errors for our estimates. A negative error implies an underestimation while positive errors imply an overestimation.
Estimation errors apply to total time and total size. Separate control charts are required to analyze each of them.
Since it is desirable that our estimation process be under statistical control, we recommend that our estimation errors be plotted on a Control Chart. We are required to calculate the mean (X) and standard deviation (SD) of our errors from previous project estimates (Refer to statistical literature for these procedures). Our estimation errors are under statistical control if and only if all of our estimation errors are within two standard deviations of the mean. This is shown graphically as follows.
Back to the Contents
5.2. Historical Actual vs Estimated Size/Time.
The purpose of this Scatter Diagram and Regression Analysis is to locate trends in our estimations. Plot a graph with estimates on the X axis and the matching actual on the Y axis. Assuming a relatively accurate estimation process, the data points should be linearally correlated and have a best fit line with slope near 1 crossing near the origin. If a significant correlation exists but the regression line has slope nowhere near 1, it should tell us a pattern. If no correlation exists, we have very unstable estimation practices which must improve.
Suppose there exists a significant correlation between actual size and estimated size. A regression analysis can then be performed to produce a line of best fit. This line of best fit has the equation Y = MX+B. This regression analysis can then be used in our estimate for the next project. Suppose the estimate for the next project was quoted. We can substitute X in the above equation with this estimate, which yields a projected size of the next project.
Not only can we retrieve a projected size estimate, we can place a bound on our error of estimation. This statistical bound is computed through regression analysis and requires a level of confidence (likely from 70 to 90 percent). This means we can be confident to the given percentage that the actual size of the final project will fall between in the range set by the projected size plus or minus this bound. This is valuable to management for use in a worst and best case analysis of the project.
This graph can be used for both size and time estimation analysis.
Back to the Contents
5.3. Productivity Graph.
The productivity of a firm for a specific project is computed as the project total size divided by the total time spent developing it. Under the measurement units we have adopted, productivity is measured in KLOC per programmer day.
There are many ways to interpret productivity. We may wish to seek improvements when productivity seems low. However, if our productivity is at a satisfactory level, we may want to check for consistent performance.
Trends in productivity can be illustrated in a Run Chart. A Run Chart is a line chart which shows developing trends for a specific metric over time. This time span may be over real time, or in terms of projects or other items produced. Management can make use of this chart to make long term strategic decisions based on developing trends which may be highlighted on the run chart.
When our productivity is at an acceptable level, we can set a goal to perform consistently between projects. This may be shown in the run chart, but can also be shown as a correlation test. We can plot total time on a project against the project total size.
If the correlation between the data points is significant, it implies we are consistently performing at a relatively consistent productivity rate. If not, then we are very inconsistent and management may think of some strategies to become more consistent. The parameters of the best fit line can be used as follows. The slope of this line can represent the output in KLOC per programmer day. The Y intercept represents the overhead and fixed costs for a project.
Back to the Contents
6. Reports.
6.1. Overall Master Report.
The Master Report includes the control charts, run chart, and scatter diagrams mentioned in the Analysis section of this document. It will be organized in such a way as to be informative to non - technical personnel. Descriptions of how to analyze the charts, where the data came from, and how the data was collected will be included for each individual chart.
Size Estimation Error Control Chart.
Time Estimation Error Control Chart.
Historical Size Estimation graph.
Historical Time Estimation Graph.
Productivity Run Chart.
Productivity Rate graph.
6.2. Planning Progress Report.
Earned Value shall be defined as the total number of planning activities complete divided the total number of scheduled planning activities. This is not to be confused with overall tracking of the project. A convenient way to communicate this to Management is through the Project Planning Group's Planning Progress Report.
Activity Number>
Description
Completed?
1
Description of activity 1.
Yes / No
2
Description of activity 2.
Yes / No
3
Description of activity 3.
Yes / No
Yes / No
N
Description of activity N.
Yes / No
The earned value is computed as the proportion of "yes" answers.
Back to the Contents
6.3. Project Manager's Evaluation Report.
A seasoned project manager can quickly and intuitively analyze the current situation of the project planning group better than any mathematically computed form. For this reason, the project planning group has included a project manager's evaluation report that will allow this knowledge to be passed to the upper management.
We are keeping different historical projects because this can also provide insight into the current situation. The following is the structure that the project planning group has designed to transfer this knowledge.
Project Number
Earned Value.
Comments
1
12.4
Comment 1.
2
16.9
Comment 2.
3
32.8
Comment 3.
N
76.4
Comment N.
This is a convenient structure for management to quickly ascertain the progress of project planning and management can also use the comments to produce process improvement proposals.
Back to the Contents
6.4 Absenteeism By Cause.
Nobody wants to be absent from work, but sometimes it is unavoidable. Whenever someone is absent from work due to circumstances beyond the control of the firm, the cause of the absence should to be logged. Absenteeism data logged will not be used to evaluate one person.
The most common causes of absenteeism are illness, injury, compassionate reasons, family matters, and bad weather.
A bar chart analysis of the number of days lost because of external influences may not be of use right away but could possibly point to more serious problems within the firm. If a high rate of absenteeism is due to illness, it may be a flu bug, but it could also point to a problem with the office environment, likely the air quality. In our example we have a high loss of time to family matters. The source of family matters is variant. The firm should have an employee assistance program to handle serious problems.
However nothing can be done if day care goes on strike. Also, nothing can be done about the weather. In a very snowy winter, this number will be high. Assuming the workplace is safe, injuries happen outside the firm (car accident, sports) and are beyond our control.
Back to the Contents
6.5. Staff Availability Report.
When staff report their availability, their availability may vary by the day. The purpose of this Gantt chart is to provide a milestone report on who is available for what part of the day for a specified project. Scheduling will take this report into account when setting task times and dates.
Back to the Contents
7. Software Continuous Improvement
The Project Planning group recognizes that software improvement initiatives are in a continual state of improvement. This is common across almost every area of Software Engineering and the following steps will be taken:
7.1. Assess Current Situation
This step is concerned with gathering sufficient information to be able to categorize each defect and determine its cause. This data will facilitate understanding of the current process and the way it behaves. The data gathering activity is described in Section 4 of this document.
7.2. Analyse Causes
Once the data is gathered, the SENG 623 team will analyse the data to identify the causes of the most prevalent defects. The team will verify the causes using the data and once verified, will determine if action needs to be taken to improve or modify the process to eliminate the source of the defect.
7.3. Develop and Verify Solutions
As the team develops an understanding of the defects and have determined that there needs to be some action, they will move to the next step of the continuos improvement process of generating continuous improvement alternatives on the existing process.
The team may evaluate alternative solutions in order to identify the best solution for the current situation. They will put into place the solutions and begin to measure whether it provides the improvement that they are looking for.
7.4. Analyse Results
This step consists of monitoring the data to ensure that the continuous improvement process is satisfying the needs as identified in the earlier steps. As the data is analysed, adjustments to the new process may be made in order to ensure that the new process is the correct solution.
7.5. Formalize Process
As the process is improved by using the data to make adjustments, at some point it will be stable and performing as expected. This step is where the SENG 623 team accepts the improved process and integrates it into the standard SENG 623 process. Data is continuously gathered over time to ensure the long term viability of the new process to work consistently. Documentation and training may be required to ensure that all team members understand the new process.
7.6. Assess Further Action
This activities focuses on evaluating the new situation to determine if further action is required. From here, the SENG 623 team may begin a new continuous improvement loop as required.
Back to the Contents
Back to 623.
1:17 PM Add a comment Send a message Permalink View trackbacks (0) Blog it Software Estimation
Mathematical Limits to Software Estimation
Mathematical Limits to Software Estimation Supplementary Material J.P.Lewis zilla@computer.org
We occasionally hear of enormous software projects that run years late and are then canceled. On the other hand, there are a number of software engineering methodologies that claim or hint at objective estimation of development schedules, project complexity, and programmer productivity. With objective estimates of development schedules, projects should rarely run late. Are these failed software projects not using proper software engineering, or is there a deeper problem?
The recent paper Large Limits to Software Estimation provides a framework for answering this question. Specifically, it shows how software estimation can be interpreted in algorithmic (Kolmogorov) complexity terms. Algorithmic complexity results can then easily be interpreted as indicating that claims of purely objective estimation of project complexity, development time, and programmer productivity are necessarily incorrect.
More specifically, the paper shows that
Program size and complexity cannot be objectively estimated a priori.
Development time cannot be objectively predicted.
Absolute productivity cannot be objectively determined.
Even approximate estimators are suspect: there is no estimator that produces a correct fixed bound on the complexity of all programs.
The background for this discussion includes some fun topics -- incompleteness, the foundations of computation, logical paradox -- and similar ideas that have arisen in different fields (math, theoretical computer science, complexity).
The document you are reading provides pointers to some of this background material, and some additional anecdotes of failed software projects and exaggerated software estimation claims. Look through the paper, then return here for the background and extra bits.
Frequent misunderstandingsIn Nov '01 this paper was discussed on several web news/discussion sites. The majority of the posts seemed to agree with what they guessed the conclusions to be, but as one might expect many posters had not actually read beyond the first sentence or two. Others argued with statements they believed might be in the paper (but in fact are not).
The argument is in the general ballpark of "software estimation is hard or impossible", but it actually says something more specific than that.
The article does NOT say the following:
software estimation is impossible
objective software estimation is impossible therefore software estimation is impossible
The article DOES say
Various software engineering authorities claim that objective software estimation is possible [paragraph 3, quotes on first page].
objective software estimation is in fact not possible [body of article]
From this, it does NOT conclude either of the points 1,2 above. Instead, it concludes:
Software construction is inherently creative and subjective, having more in common with physics than manufacturing; software estimation is inherently subjective [conclusion, Bollinger quote].
Because software is used in the government, in vehicles, and other places where it can potentially have a negative on people's lives, we (software writers) have an ethical responsibility to not over-represent our ability to estimate (especially when it comes to software quality- r.e. correctness claim below).
Here are some of the posted responses, paraphrased:
"The article says that estimation must be objective rather than subjective" No, it does not say this.
"The article says that subjective software estimation is not useful" It also does not say this.
"The article says that we are looking for exact answers, not estimates" or "the article doesn't understand what `estimate' means" or No, the article distinguishes subjective and objective estimates, and specifically discusses the case of an objective estimate with bounds in detail.
"The article only is only talking about methods that are completely accurate." No, the paper also considers approximate objective estimators.
"The article says that `if complexity can be objectively estimated then you can make software estimates; complexity cannot be objectively estimated, therefore software scope cannot be estimated', this is denying the antecedent". The reader's claim is that the argument has the structure "If X then Y, but in fact not X, therefore not Y", which would be incorrect if there were ways of obtaining Y other than from X.
First (and as stated above), the article does NOT say "software scope cannot be estimated", it only addresses software estimates that claim objectivity and require a complexity estimate.
R.e. denying the antecedent: this claim is incorrect because the article addresses software estimation methods that require a complexity definition (by their own statement), and of course an objective estimate requires that all its inputs also be objective. As such the argument has an "..only if X.." that is missing from the characterization above. With the added "only if", the "not X" (no objective estimate of complexity) results in "not Y" (objective software estimates that require a complexity estimate cannot exist).
Of course there are some software estimates that are objective and do not require a complexity estimate, for example, LOC for an existing completed program, for whatever that is worth -- this is not in dispute.
"People/organizations can make accurate estimates, I made one last week" or "Estimation is hard, I double my estimates and still miss them". Ok, but slightly off topic: the article is specifically talking about those who claim objective estimates.
"You can do objective estimation, and I did it last week using METHOD X" And where did you get an objective estimate of the complexity of a new project? Read the article...
Consider the following two programs:
A program that computes the square of a very large, algorithmically random number.
A program that computes the Fourier transform of 16 small numbers. Most all programmers would say that writing a Fourier transform program is more complex than writing a program to square a number, but Kolmogorov complexity says the opposite. (Background -- an algorithmically random number is one that has no regularities that would allow it to be compressed, thus the program must represent it by literally storing its digits. So a program to square a million-byte algorithmically random number would need to be larger than a million bytes, if the number is included in the program).
This objection is confusing the complexity of the program with the complexity of a large random number that that the programmer does not write. The difficulty of writing a program is correlated with the complexity of the code that the programmer actually writes, not the complexity of inputs or arguments to the program. The numbers in the programs describe above are properly considered inputs or arguments.
The objection is correct insofar as Kolmogorov complexity conventionally deals with programs that produce output only, and thus would have their inputs or arguments incorporated in the program. The paper sketches how interactivity, arguments, and state might be accommodated in Kolmogorov complexity, but also notes that these are ultimately irrelevant -- complexity will remain non-computable. If you do not want to consider inputs, instead just consider "the complexity of the part that the programmer writes", i.e., measure the complexity of the given number, then that of the finished program, then subtract.
(This discussion suggests another ultimately irrelevant consideration: a programmer is assigned to add a bit of code to an existing large program. (S)he must understand the program before being able to add the code. Here the relevant complexity is the complexity of the added code plus some factor times the complexity of the existing code, where the factor indicates the relative difficulty of learning existing code versus producing new code. This does not affect our argument because there is no way of objectively estimating the complexity of either the existing or the new code...)
Complexity is not the only thing that should be considered, there are other requirements like modularity, maintainability, portability, readability which must be taken into account.
Certainly true, but this does not affect the argument one way or another. The argument says only that software estimates that depend (in part) on complexity cannot be objective. Some such estimates may also depend on other things (that may or may not be objectively estimable) but as long the estimate rests on one component that is subjective, the estimate itself is subjective.
"Software estimation needs common sense, not advanced mathematics." Certainly. The 'manufacturing' camp of software estimators (see Humphrey quote below) say or hint that software construction can be made into a repeatable, fairly boring process where projects are always on time and programmers are like factory workers. This may or may not be true (I don't think it is), but regardless: to make this view seem more science than philosophy some of these people have fallen into the trap of cloaking their estimating process with formal notation and claiming or hinting objectivity. This part is wrong.
On the contrary, [below and conclusion to the article]
Good estimation therefore requires experience and judgment. The software industry should value human experience, intuition, and wisdom rather than claiming false objectivity and promoting entirely impersonal "processes".
Incompleteness
Incompleteness in mathematicsGodel Incompleteness says that every sufficiently powerful system of mathematics will contain true statements that cannot be proven, and no system can prove its own consistency. Godel, Escher, Bach is a rather lengthy popular discussion of Godel incompleteness. Shorter popular introductions are found in several popular books by John Casti (Borders usually has one or more) and by Rudy Rucker (e.g. Mind Tools).
Perhaps the easiest "serious" introduction to Godel incompleteness is the last section of Rucker's Infinity and the Mind (Princeton U. Press, 1995). Mathematical incompleteness is difficult, however; it is much easier to show by establishing that theorem proving and computation are equivalent, and then showing incompleteness in computer science or complexity. This approach is outlined below.
Incompleteness in computer scienceTuring's Halting theorem is the well known example of incompleteness in computer science. Rice's theorem is a more general version: it says that there is no algorithm that can determine an extensional property of all programs. An `extensional property' is one that can be seen from the input-output behaviour of the program without examining its internal state. Examples are: whether a program ever stops running (halting problem), whether a program will crash (automatic debugging problem), whether a program will ever print the secret code '41', whether a (graphics) program will ever produce a pure black image, etc.
The flavor of a proof of halting by contradiction is easy: You say you have written a function which will tell if any program halts. I then write a program which calls your function with my program as input. If your function says my program will exit, I loop, otherwise I exit.
P:: if Halts(P) loop else halt;
A more realistic version considers programs P(x) that have variable input x. Form the program
P(B):: { if Halts(B(B)) loop else halt; } Now call P(B) with P itself passed for parameter B. The resulting program is
P(P):: { if Halts(P(P)) loop else halt; }
Turing's original proof was by diagonalization, similar to the Cantor diagonalization argument that the infinity of reals is bigger than the infinity of integers. The key to this proof is to recognize that programs can be enumerated. That is, fix a machine architecture, then every possible program is a bit pattern that can be interpreted as a large integer, and vice versa, every integer contains a bit pattern that you can try to execute as a program (though few will actually function). Turing's proof by diagonalization is:
Define F(N) as one plus the value of the Nth computable function applied to the number N, or zero if the Nth program does not halt. F cannot be computable because if it were it would be the Mth computable function, and then F(M) = F(M)+1. Because F(N) is straightforwardly defined, the ``flaw'' must be that halting cannot be determined.
These theorems do not say that it is impossible to determine the stated property for many particular problems, they just say that it is not possible to do so for every program. The next obvious question is, how many potential programs are there for which halting (or other extensional properties) cannot be determined? If the only program for which halting could not be determined was the one shown above that calls the halting test function in a paradoxical way, then incompleteness would be somewhat trivial. In fact this is not the case; there are (infinitely) many such exceptions.
In my opinion incompleteness is one of the consequences of allowing the concept of infinity -- infinity brings with it a number of problems and paradoxes. If an infinite number of programs are not allowed, the halting problem goes away... but not so fast: impossibility is merely replaced with intractability.
Intractable problems are those for which a solution is theoretically possible, but which in practice would require aeons of computer time, and a new processor won't begin to help. In fact in both computer science and Kolmogorov flavors of incompleteness, imposing resource limits results in intractable problems.
An example of an intractable problem is this one, which has a genetic programming feel:
Find the fastest x86 program that implements a particular video codec This problem has practical value -- one could code a "reference" version of the codec, and then do a GP style search to find its most optimized implementation. To make the problem easier, fix the input video and state, and find the shortest x86 program that produces the same output from this input and state that an existing codec does. So the search program is now easy: in enumeration order, generate a program, run it for as long as the regular codec takes to run (no longer), check its output if any, keep track of the fastest-running program so far found that produced the desired output.
The intractable nature of this will be obvious to many, but for the rest of you, let's look at the numbers. How big is the codec? Perhaps 100k bytes, but let's say it is only EIGHT bytes for now. There are 2^64 possible programs of 8 bytes in length. To test all these programs, buy a bunch of gigahertz machines -- four such machines can process roughly 2^32 (4 billion) instructions per second. If the test requires 10 seconds to run then four machines could crunch through 2^32 possibilities in 10 seconds. This would then take 2^32 * 10 seconds to search all the possibilities. Calculate this out, using a calculator such as Mathematica that isn't limited to 32-bit computation -- it's a long long time. Now consider that this is for the universe of toy 8-byte programs, whereas the desired codec is 100k bytes. For realistically sized problems, intractable may as well be impossible, and no conventional increase in compute power (including non-quantum parallel machines) will make a difference.
The paper uses "feasible" as a synonym for "tractable".
The book Neil Jones, Computability and Complexity from a Programming Perspective, MIT Press 1997 is a reasonable introduction to the computer science side of incompleteness.
Kolmogorov Complexity, and Incompleteness thereinKolmogorov complexity is also called algorithmic complexity and KCS (Kolmogorov, Chaitin, Solomonoff) complexity. The paper briefly describes this concept. The current "bible" of this field is Li and Vitanyi, Kolmogorov Complexity, Springer. It is dense but readable and suitable for home study.
Incompleteness in Kolmogorov complexity is simply that fact the complexity of a string is not computable; this is mentioned and used in the paper.
Incompleteness relationshipsMathematical incompleteness can be shown from computer science incompleteness, and vice versa, and likewise for complexity incompleteness. Some of the demonstrations are easy and require only accepting the notion that theorem proving can be formalized as computation (which is the essence of the Turing machine work).
Svozil (cited in the paper) lays out one correspondence between mathematics and computation,
axioms
<->
program input or initial state
rules of inference
<->
program interpreter
theorem(s)
<->
program output
derivation
<->
computationbut other correspondences are possible depending on the chosen formalization of "computer". Now to give examples of cross-field incompleteness arguments:
Turing implies GodelBecause there is no procedure for deciding which programs halt, there can be no proofs of which programs halt. In a mathematical system that is powerful enough to discuss programs (recursive functions), there are theorems ``program X will halt'' that are true but cannot be proven.
Complexity proof of Turing theoremAssume the function Halts(P). Write a program Q that looks at all programs up to N bits in size and finds the biggest number produced by any of these programs (that halt). Double this number and print it. The search program Q is approximately log_2 N in size but it printed a number larger than any program of up to size N. The way out of the contradiction is to conclude that Q cannot tell which programs halt.
Complexity proof of IncompletenessThe paper contains the Chaitin complexity proof that mathematical systems cannot prove most statements of the form ``the complexity of (some string) is greater than (some value)''.
Etc.There are many more similar results, including that the maximum run time of a program is not computable, Halting proves Church's theorem (the negative answer to Hilbert's problem, theorems are not decidable by computation), etc.
Claims of the software estimation industry
Technical bookstores have a couple feet of shelf space devoted to software engineering. Some of these books have overly optimistic claims or hints of "objective" estimation based on historical data somewhere in their introduction.
Here are a few samples:
(The original CMM Manifesto) The CMM for Software p.3,4:
"In an immature organization, there is no objective basis for judging product quality or for solving product or process problems" ... "[In a mature organization] There is an objective quantitative basis for judging product quality and analyzing problems with the product and process. Schedules and budgets are based on historical performance and are realistic"
``Software Quality Measurement: A Framework for Counting Problems and Defects'' CMU/SEI-TR-22, p.6:
``Consistent measurements provide data for doing the following:
Quantitatively expressing requirements, goals, and acceptance criteria.
Monitoring progress and anticipating problems.
Quantifying tradeoffs used in allocating resources.
Predicting the software attributes for schedules, cost, and quality.
G. G. Schulmeyer, ``Software Quality Lessons from the Quality Experts,'' in G. G. Schulmeyer and J.I. McManus, Eds., Handbook of Software Quality Assurance} (2nd ed.):
In the Certainty state [of quality management], the objective of software development and software quality management, producing quality software on time with a set cost everytime, is possible.
Course title: "Productivity Improvement through Defect-Free Development". Book title: Zero Defect Software, McGraw-Hill, New York, 1990.
etc...
Note that some of these statements do not quite say that they are going to do objective estimation. For example, both this quote
"[In a mature organization] There is an objective quantitative basis for judging product quality and analyzing problems with the product and process. Schedules and budgets are based on historical performance and are realistic" and this one
``Consistent measurements provide data for doing the following:
Quantitatively expressing requirements, goals, and acceptance criteria.
Monitoring progress and anticipating problems.
Quantifying tradeoffs used in allocating resources.
Predicting the software attributes for schedules, cost, and quality. stop just short of saying will do quantitative estimation based on historical data (though the reader might think otherwise if quickly reading these).
Software disastersGAO-93-13 on major software challenges: ``We have repeatedly reported on cost rising by millions of dollars, schedule delays of not months but years, and multi-billion-dollar systems that don't perform as envisioned.''
A few software "disasters" that I came across during the writing of this paper (1997-2000):
The IRS is making another stab at modernizing its aging computer system, after spending $3.3 billion on a failed upgrade over the past decade. Richard Stengel, ``An Overtaxed IRS,'' Time, April 7 1997 pp. 59-62.
The state of California was faced with scrapping an unfinished $300 million system designed to track child support payments. H. Jordan, ``Computer Computer Snafu May Cost State Millions'', San Jose Mercury News, April 1997.
Windows 97 slips one year, becomes Windows 97/98 (later renamed to Windows 98). And similarly with NT version 5, Copeland, Mac OS/X, etc.
An unpublished review of 17 major DOD software contracts found that the average 28-month schedule was missed by 20 months, and no project was on time.
Evidently the prize for software disasters goes to the FAA's Advanced Automation System (AAS), an attempt at an improved air traffic control system that ran into trouble in the early 90s after an estimated 6.5 billion was spent. The AAS has been described as as "one of the largest and most spectacular computing failures in the history of the field."
R.Britcher, one of the software engineers involved in the AAS project, describes it in the book The Limits of Software (Addison Wesley). Britcher himself characterizes the AAS as
"The greatest debacle in the history of organized work... we learned nothing from it" The book is thoughtful and is a good read.
Importantly, the AAS project did make use of software engineering best practices, such as code inspections, schedule estimates, etc. The software was estimated to be on schedule each and every month... until one month before it was to be delivered. Only at this point did it become evident that the project was hopelessly late.
Another relevant book is The Mythical Man Month. We've all heard of it; it is worth actually reading if you haven't done so yet.
Additional ClaimIn addition to the claims in the paper, there is one additional (somewhat weaker) statement that can be made:
Algorithmic complexity casts doubt on "proofs" of program correctness. This is contrary to the view that formal specifications and program proofs can guarantee program correctness.
The argument here is that the specification must be formal in order to participate in the proof. Since the specification fully specifies the behavior of the program, the complexity (C) of the specification must be at least approximately as large as the complexity of a minimal program for the task at hand. More specifically, given some input it must be possible to determine the desired behavior of the program from the specification. Using this, a small program can be written that, given some input, exhaustively queries the specification until the corresponding output bit pattern (if any) is determined; this bit pattern is then output, thereby simulating the desired program. Formally, C(specification) + C(query program) >= C(program). We are left with the obvious question: if the specification is approximately of the same order of complexity as the program, how can we know that the specification is correct?
This conclusion has been arrived at previously and is often raised when discussing program proofs. The complexity viewpoint strengthens and formalizes this conclusion, by making it clear that a previously intuitive quantity (complexity) can be formally defined and expressed in terms of a familiar measure (bits).
This argument, however, is not as strong as those in the paper. For one, the specification and the code are two different representations of the program, and it may be possible for a human to fully understand one without understanding the other; having both available may help in the understanding of both. Nevertheless, this argument suggests that if the program is far beyond our ability to comprehend it in its entirety, this will likely be true for the specification as well. In fact it appears that humans are only able to fully understand a "page or two" of code (whereas the arbitrary constants in KCS arguments mean that these arguments only apply to much larger programs).
(Note that this argument does not prohibit some particular and incomplete behaviors of a program from being verified. It also says nothing about the case where the "specifications" cannot be formally defined -- in this case one is also using a relaxed definition of "proof".)
Peter Naur's work is worth consulting if there is any doubt to the difficulty of knowing whether a specification is correct (he is the 'N' in BNF; the book Knowing and the Mystique of Logic and Rules, Kluwer, 1995 collects some of his publications). Naur has spent several decades studying how people deal with formal specifications and he has some disappointing comments, including
studies he conducted of problem solving using formal versus informal methods. People perform more poorly (catch fewer errors, etc.) when working with formal specifications.
he points to numerous instances of errors in published proofs of only a page or two in length. If peer-reviewed proofs this short still have errors, how can we know that a 100-page spec for an air traffic control system is correct?
Challenge ScenarioYou are a manager, and you believe that you can objectively estimate development schedules.
How long will it take your team to write a program to solve the Goldbach conjecture? (Idea from John Casti). By the equivalence of math and computation this is possible, and judging from other number theory proofs the program might be pretty short (but this is for you to estimate).
The Goldbach conjecture has been unsolved for a couple of centuries, despite a million-dollar prize that was recently offered for its solution.
So here's a perfectly well formed (no changing requirements) request to write what is probably a fairly simple program, but I don't think anyone can say how long it would take.
If you say, "well this sort of program is not realistic", I say, "it's only unrealistic BECAUSE it cannot be predicted". If things like this were realistic, programmers would be hired to solve important mathematics problems. The market certainly exists -- there are plenty of important problems waiting to be solved.
Conclusions: Disclaimer, Programming vs. Manufacturing, Ethics
Disclaimer, repeatedOnce again, I do not mean to say that Software Estimation, formal specifications, etc. are not useful! Regarding formal specifications, providing an alternate view of a program will help uncover errors that are not clear in it's original form (code). More generally, software estimation probably does help.
I am saying just this:
Claims of objective estimation are wrong. Wrong, and arguably irresponsible as well.
EthicsIrresponsible? A problematic scenario would be this: a software estimation consultant says that if you take his/her $10,000 course, you can transform your software group to produce bullet-proof software on time everytime. The consultant manages to convince the head of R&D at a car company that the software estimation claims are accurate. The car company then decides that it can save $50 per car by removing some manual override and relying entirely on computer control...
Unquestioning faith in software estimation is probably unlikely in the arena of car safety, but computers are everywhere and plenty of smaller scale incidents do happen. Peter Neumann has been tracking such incidents for years, see his risks archives.
Software development: like manufacturing, or like physics?
The CMM (Capability Maturity Model) and the similar ISO-900x got started in an understandable way: the goal was to apply the success and reliability found in manufacturing to software development. Indeed, the CMM was conceived by Watts Humphrey, who writes that industry process control concepts
``...are just as applicable to software as they are to automobiles, cameras, wristwatches, and steel.'' (``Characterizing the Software Process,'' IEEE Software 5(2), pp.~73-79)
Clearly this is well intended, but there is an obvious difference between manufacturing and software. In manufacturing, you make a chair, then another one, then 10,000 more. Then perhaps you switch to a different chair (but it's still a chair) and you make 10,000 of these.
By the nature of software you are never repeating something you did before. If what you need to do is the same as what you did, you can simply make a digital copy. As such, there is no opportunity for the repeatability and codification that are present in manufacturing. This point is better made by Bollinger (cited in the paper), who responds to Humphrey,
``The creation of genuinely new software has far more in common with developing a new theory of physics than it does with producing cars or watches on an assembly line.''
Software construction is an intrinsically creative activity and as such has inherent risks. As a broad consequence, software estimation is necessarily subjective. Good estimation therefore requires experience and judgment. The software industry should value human experience, intuition, and wisdom rather than claiming false objectivity and promoting entirely impersonal "processes". It should attend to the intrinsic uncertainties and risks of software development and where necessary promote the public discussion and honest assessment of these risks.
More Reader feedback
"Programming tools and libraries have to progress to the point where the coding we do is dull and mindless." Posted by slaytanic killer on kuro5hin.org Nov 5th, 2001.
"SDM adopting companies tend not to develop products that could not be trivially predicted." -Constantine Plotnikov, paraphrasing DeMarco & Lister Peopleware, second edition Chapter 29. pages 186-193 in paperback edition.
About the Author and this Paper
My career has been split between academic research, mostly in computer graphics, and commercial programming (again, mostly in the graphics industry). Software engineering is not my field -- how did I end up writing "Large Limits to Software Estimation"?
In 1996 I spent some time working on a notion of `visual complexity' at Interval Research in Palo Alto. As part of this effort, I studied Kolmogorov complexity. A year later I was employed at DreamWorks (the animation company), working on an early version of Shrek. The DreamWorks software staff exposed me to the Capability Maturity Model and similar software process management methods. Since these methods were unfamiliar, I read about them with interest... but something was bothering me. After a couple days I realized that the estimation claims could be seen from a Kolmogorov complexity viewpoint, and that there was a problem with a couple of the claims of these methods.
An early draft of the paper was released on the internet in 1997 and became a minor hit, judging from the email I receive. This version is currently cached at the citeseer repository. The version published in ACM is shortened from the internet draft but added the section on approximate estimation.
SOURCE: http://www.idiom.com/~zilla/Work/Softestim/softestim.html
12:44 PM Add a comment Send a message Permalink View trackbacks (0) Blog it Software Estimation
Kolmogorov complexity
SOURCE: http://en.wikipedia.org/wiki/Kolmogorov_complexity
In computer science, the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, algorithmic entropy, or program-size complexity) of an object such as a piece of text is a measure of the computational resources needed to specify the object. For example consider the following two strings of length 640101010101010101010101010101010101010101010101010101010101010101
1100100001100001110111101110110011111010010000100101011110010110
The first string admits a short English language description, namely "32 repetitions of '01'", which consists of 20 characters. The second one has no obvious simple description other than writing down the string itself, which has 64 characters.
This image illustrates part of the Mandelbrot set fractal. The size of the JPEG file encoding the bitmap of this image is more than 17 kilobytes (approximately 140000 bits). The same file can be generated by a computer program much shorter than 140000 bits, however. Thus, the Kolmogorov complexity of the JPEG file encoding the bitmap is much less than 140000.
More formally, the complexity of a string is the length of the string's shortest description in some fixed description language. The sensitivity of complexity relative to the choice of description language is discussed below. It can be shown that the Kolmogorov complexity of any string cannot be too much larger than the length of the string itself. Strings whose Kolmogorov complexity is small relative to the string's size are not considered to be complex. The notion of Kolmogorov complexity is surprisingly deep and can be used to state and prove impossibility results akin to Gödel's incompleteness theorem and Turing's halting problem.
12:37 PM Add a comment Send a message Permalink View trackbacks (0) Blog it Software Estimation
Algorithmic Information Theory
Source: http://www.hutter1.net/index.htm
News Future events, ...
Introduction What is AIT?
AIT Mailing List
Introductory Material Tutorials, Information...
Bibliography Books, papers...
People Homepages
Communication Mailing Lists
Events Conferences, Workshops...
Miscellaneous Courses, Info on A.K.
Andrei Kolmogorov The theory of probability ... can and should be developed from axioms ...
Entities should not be multiplied unnecessarily William of Occam
News
Conference on Logic, Computability and Randomness , 2007, January 10-13, Buenos Aires, Argentina.
Introduction / What is AIT?Algorithmic Information Theory (AIT) is a subfield of information theory and computer science (and statistics and recursion theory) that concerns itself with the relationship between computation, information, and randomness. The key concepts or subfields are: Algorithmic "Kolmogorov" Complexity (AC). Kolmogorov defined the complexity of a string x as the length of its shortest description p on a universal Turing machine U (K(x)=min{l(p):U(p)=x}). A string is simple if it can be described by a short program, like "the string of one million ones", and is complex if there is no such short description, like for a random string whose shortest description is specifying it bit-by-bit. Kolmogorov complexity is a key concept in (algorithmic) information theory. An important property of K is that it is nearly independent of the choice of U. Furthermore it leads to shorter codes than any other effective code. K shares many properties with Shannon's entropy (information measure) S, but K is superior to S in many respects. To be brief, K is an excellent universal complexity measure, suitable e.g. for quantifying Occam's razor. The major drawback of K as complexity measure is its incomputability. So in practical applications it has always to be approximated, e.g. by MML/MDL or Lempel-Ziv compression or others. Algorithmic "Solomonoff" Probability (AP). Solomonoff defined the closely related universal a priori probability M(x) as the probability that the output of a universal Turing machine U starts with x when provided with fair coin flips on the input tape. M can be used as a universal sequence predictor that outperforms (in a certain sense) all other predictors. Universal "Levin" Search (US). The major drawbacks of AC and AP are their incomputability. Time-bounded ``Levin'' complexity penalizes a slow program by adding the logarithm of its running time to its length. This leads to computable variants of AC and AP, and Universal ``Levin'' Search (US) that solves all inversion problems in optimal (apart from some multiplicative constant) time. Algorithmic "Martin-Loef" Randomness (AR). AC and AP also allow a formal and rigorous definition of randomness of individual strings that does not depend on physical or philosophical intuitions about nondeterminism or likelihood. Roughly, a string is algorithmically ``Martin-Loef'' random if it is incompressible in the sense that it's algorithmic complexity is equal to its length. Applications of AIT AC, AP, US, and AR are the core subdisciplines of AIT, but AIT spans into many other areas. It serves as the foundation of the Minimum Description Length (MDL) principle, can simplify proofs in computational complexity theory, has been used to define a universal similarity metric between objects, solves the Maxwell daemon problem, and many others.
AIT Mailing List The Algorithmic Information Theory (AIT) mailing list is a moderated mailing list intended for people in Information Theory, Computer Sciences, Statistics, Recursion Theory, and other areas or disciplines with interests in AIT. Researchers in these fields are encouraged to join the list and participate. You're invited to discuss and exchange ideas regarding any AIT or related aspect. Software, conferences, courses, jobs, and related announcements are also welcome. You can post, read, and reply messages solely on the Web. You can choose to receive messages as individual emails, daily summaries, daily full-text digest, or read them on the Web only.
The simplest way to join the Kolmogorov Complexity Solomonoff Induction Mailing List is to
just send an Email to mailto:kolmogorov-request@idsia.ch?subject=subscribe with subject subscribe [password],
to unsubscribe send an Email to mailto:kolmogorov-request@idsia.ch?subject=unsubscribe with subject unsubscribe [password].
Here are the old Archive (1999-2001) and the current Mailing List Archive (2002-now).
There is also a web interface for subscription (management) and a
Email based interface: Send an Email to mailto:kolmogorov-request@idsia.ch?subject=help with subject help.
To post a message to all the list members, send email to kolmogorov@idsia.ch
Introductory Online Material
Tutorials
Scholarpedia on Algorithmic Information Theory
Wikipedia on Kolmogorov complexity
Kolmogorov Complexity: Sources, Theory and Applications, by A.Gammerman and V.Vovk, The Computer Journal 42:4 (1999) 252-255
Introduction to Algorithmic Information Theory, by Nick Szabo.
Courses
Topics in Kolmogorov Complexity, Seminar 236804, by Ran El-Yaniv
Complexity of Algorithms, by Laszlo Lovasz
Lecture notes on Kolmogorov complexity (in German) , by J. Schmidhuber, TUM, Munich, Germany, 1994.
Information
Entropy in Information and Coding Theory, by Chris Hillman.
Predictive Complexity at CLRC
Minimum Description Length on the Web
Minimum Message Length and Minimum Encoding Length Inference, list of links by David Dowe
Complexity measures for complex systems and complex objects, by Pablo Funes.
Grupo de Pesquisa de Fundamentos da Matemática e da Computação Universidade Federal de Pelotas (UFPel) (in portuguese)
Bibliography
Books
An Introduction to Kolmogorov Complexity and Its Applications, Ming Li and Paul Vitanyi (1997)
Information and Randomness: An Algorithmic Perspective, Cristian Calude (1994&2002)
Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability, Marcus Hutter (2004)
Algorithmic Randomness and Complexity, Rod Downey and Denis Hirschfeldt (2007)
Elements of Information Theory, Thomas Cover and Joy Thomas (1991)
The Limits of Mathematics, Gregory Chaitin (1998&2003)
Kolmogorov Complexity and Computational Complexity, Osamu Watanabe (1992)
Magazines
Special Issue on Kolmogorov Complexity, The Computer Journal, Volume 42, Issue 4, 1999.
Special Issue on Kolmogorov Complexity, Theoretical Computer Science, Volume 271, Issue 1-2, B. Durand, January 2002
Special Issue on Kolmogorov Complexity, Theoretical Computer Science, vol 207, no 2, A.Semenov (1998)
Some Online Papers
M. Mueller, Stationary Algorithmic Probability, http://arXiv.org/abs/cs/0608095, 2006.
M. Hutter, On the Foundations of Universal Sequence Prediction, Proc. 3rd Annual Conference on Theory and Applications of Models of Computation (TAMC), LNCS3959, 408-420, 2006.
R. Cilibrasi and P.M.B. Vitanyi, Clustering by Compression, IEEE Trans. Information Theory, 51(4):1523--1545, 2005.
J. Schmidhuber, The New AI: General & Sound & Relevant for Physics In Artificial General Intelligence and http://arxiv.org/abs/cs.AI/0302012, 2003.
J. Lutz, The dimensions of individual strings and sequences, ACM Computing Research Repository, 31 pages, 2002.
M. Hutter, The fastest and shortest algorithm for all well-defined problems, International Journal of Foundations of Computer Science, 13:3 (2002) 431-443
P. Gacs, J. Tromp, P. Vitanyi, Algorithmic statistics, IEEE Trans. Inform. Theory, 47:6(2001), 2443-2463.
H. Bannai, A Theory of Discovery and Its Applications in Discovery Science", Master Thesis, Department of Information Science, Graduate School of Science, University of Tokyo, January 2000.
J. Schmidhuber. Algorithmic theories of everything. quant-ph/0011122, TR IDSIA-20-00, Version 1.0, IDSIA, Manno (Lugano), Switzerland, November 2000. http://arxiv.org/abs/quant-ph/0011122. HTML version .
M. Hutter Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions Proceedings of the 12th European Conference on Machine Learning (ECML-2001) 226-238
Chris Hillman, Kolmogorov complexity of generalized computably enumerable sets, preprint, also available in PDF.
H. Bannai and S. Miyano, A Definition of Discovery In Terms of Generalized Descriptional Complexity, Discovery Science 1999: 316-318
S. Hochreiter and J. Schmidhuber. Feature extraction through LOCOCODE. Neural Computation 11(3): 679-714, 1999
Martin Schmidt, Time-Bounded Kolmogorov Complexity May Help in Search for Extra Terrestrial Intelligence (SETI)
C. Wallace and D. Dowe, Minimum Message Length and Kolmogorov complexity, Comp. J., Vol 42, No. 4 (1999), pp270-283
John Tromp, A CL-based definition of Kolmogorov Complexity
Vladik Kreinovich and Luc Longpré, Human Visual Perception and Kolmogorov Complexity : Revisited
Vladik Kreinovich, Luc Longpre, and Misha Koshelev, Kolmogorov Complexity, Statistical Regularization of Inverse Problems, and Birkhoff's Formalization of Beauty
Sanjeev Subbaramu, Ann Q. Gates and Vladik Kreinovich, Applications of Kolmogorov Complexity to Image Compression,
Arthur De Vany, How Much Information is there in an Economic Organization and Why Can't Large Ones be Optimal?
Andrei Muchnik, Andrei Romashchenko, Alexander Shen and Nikolai Vereshchagin, Upper semi-lattice of binary strings with the relation "x is simple conditional to y". Abstract
M. Conte, et al. Genetic Programming Estimates of Kolmogorov Complexity. Proc. 7th Int. Conf. on GA, 743-750, 1997.
S. Hochreiter and J. Schmidhuber. Flat Minima. Neural Computation, 9(1):1-42, 1997.
J. Schmidhuber. Low-complexity art. Leonardo, Journal of the International Society for the Arts, Sciences, and Technology, 30(2):97-103, 1997.
Yongge Wang, Randomness and Complexity, PhD thesis, 1996
People
Research field abbreviations:Algorithmic Information Theory (AIT),Kolmogorov Complexity (KC),Algorithmic Probability (AP),Algorithmic Randomness (AR),Universal Levin Search (US),Artificial Intelligence (AI),Machine Learning (ML),Minimum Description Length (MDL),Minimum Message Length (MML),Computational Complexity (CC).
Eric Allender
Rutgers University, New Jersey, USA (AR)
Klaus Ambos-Spies
Universität Heidelberg, Germany (AR)
Eugene Asarin
Université Joseph Fourier, Grenoble, France
Luis Antunes
University of Porto, Portugal (AIT)
Maxim Babenko
Moscow State University, Russia (KC)
Janis Barzdins
University of Latvia (AIT)
Verónica Becher
University of Buenos Aires, Argentina (AR,KC)
Harry Buhrman
CWI and University of Amsterdam, The Netherlands (AR,KC)
Cristian S. Calude
University of Auckland, New Zealand (AR)
Gregory J. Chaitin
IBM Research, Yorktown Heights, NY, USA (AR,AI)
Alexey Chernov
Istituto Dalle Molle di Studi Sull'Intelligenza Artificiale (IDSIA), Lugano, Switzerland (KC,CC)
Thomas Cover
Stanford University, CA, USA (AIT,other)
David Dowe
Monash University, Melbourne, Australia (MML)
Rodney G. Downey
Victoria University, New Zealand (AR)
Bruno Durand
Université de Provence, Marseille, France
Allan Erskine
University College London, UK (ML,KC)
Santiago Figueira
University of Buenos Aires, Argentina (AR,KC)
Lance Fortnow
University of Chicago, USA (CC)
Péter Gács
Boston University, USA (AIT)
Alex Gammerman
Royal Holloway University, London, UK (AIT,ML)
Murray Gell-Mann
Santa Fe Institute, USA (AIT,other)
Peter Grünwald
CWI and Eindhoven University of Technology, The Netherlands (MDL)
Denis R. Hirschfeldt
University of Chicago, USA (AR)
John Hitchcock
University of Wyoming, Laramie, WY, USA (AIT,CC)
Achim Hoffmann
University of New South Wales, Australia (KC,Occam)
Günther Hotz
Universität Saarbrücken, Germany (AIT,CC)
Marcus Hutter
Istituto Dalle Molle di Studi Sull'Intelligenza Artificiale (IDSIA), Lugano, Switzerland (AIT,AP,AI,ML)
Yuri Kalnishkan
Royal Holloway University of London, UK (AIT,ML)
Bjorn Kjos-Hanssen
University of Connecticut, Storrs, CT, USA (AR)
Andrei Kolmogorov
Moscow, Russia (AIT)
Kevin Korb
Monash University, Melbourne, Australia (MDL)
Sophie Laplante
Université Paris-Sud, France (CC,KC,AR)
Troy Lee
CWI Amsterdam, The Netherlands (CC,KC)
Leonid A. Levin
Boston University, USA (AIT)
Ming Li
University of Waterloo, Canada (AIT)
Luc Longpré
University of Texas at El Paso, USA (KC,CC)
Jack Lutz
Iowa State University, USA (AR)
Jean-Yves Marion
LORIA Université Nancy, France (CC,KC)
Per Martin-Löf
University of Stockholm, Sweden (AR)
Elvira Mayordomo
University of Zaragossa, Spain (AR)
Nenad Mihailovic
Universität Heidelberg, Germany (AR)
Wolfgang Merkle
Universität Heidelberg, Germany (AR)
Philippe Moser
Iowa State University, USA (AR)
Andrej Muchnik
Institute of New Technologies, Moscow, Russia (AIT)
Andre Nies
University of Auckland, New Zealand (AR)
Jan Poland
Istituto Dalle Molle di Studi Sull'Intelligenza Artificiale (IDSIA), Lugano (AIT,ML)
Jan Reimann
Universität Heidelberg, Germany (KC,AR)
Jorma Rissanen
Almaden Research Center, San Jose, CA, USA (MDL)
Andrei Romashchenko
Institute of Problems of Information Transmission, Moscow, Russia (KC)
Boris Ryabko
Siberian State University of Telecommunication and Computer Science, Siberia (AR,KC)
Jürgen Schmidhuber
Istituto Dalle Molle di Studi Sull'Intelligenza Artificiale (IDSIA), Lugano, Switzerland (US,AIT,ML)
Claus-Peter Schnorr
Goethe-Universität Frankfurt, Germany (AR)
Robert Solovay
University of California, Berkeley, USA (AIT)
Alexander Kh. Shen
Institute of Problems of Information Transmission, Moscow, Russia (CC,AIT)
Theodor A. Slaman
University of California, Berkeley, USA (AR,CC)
Ray J. Solomonoff
Oxbridge Research, Cambridge, MA, USA (AP,ML,AI)
Ludwig Staiger
Universität Halle, Germany (AR)
Frank Stephan
Universität Heidelberg, Germany (AR)
Sebastiaan A. Terwijn
Technical University of Vienna, Austria (AR)
Leen Torenvliet
University of Amsterdam, The Netherlands
Boris A. Trakhtenbrot
Tel Aviv University, Israel
John Tromp
CWI Amsterdam, The Netherlands (CC,KC)
Maxim Ushakov
Moscow State University, Russia
Vladimir Uspensky
Moscow State University, Russia
Vladimir V'yugin
Institute of Problems of Information Transmission, Moscow, Russia (KC)
Michael Vyugin
Royal Holloway University, London, UK (KC,ML)
Nikolai Vereshchagin
Moscow State University, Russia (KC,CC)
Paul M. B. Vitanyi
CWI, Amsterdam, The Netherlands (KC,ML)
Volodya Vovk
Royal Holloway University, London, UK (KC,ML)
Chris Wallace
Monash University, Australia (MML)
Yongge Wang
University Heidelberg, Germany (CC,AR)
Osamu Watanabe
Tokyo Institute of Technology, Tokyo, Japan (CC,KC)
Events
Conference on Logic, Computability and Randomness,2007, January 10-13, Buenos Aires, Argentina.
Workshop on Effective Randomness,2006, August 7-11, ARCC, Palo Alto, California, USA.
Seminar on Kolmogorov Complexity and Applications,2006, January 29 - February 3, Schloss Dagstuhl, D-66687 Wadern, Saarbrücken, Germany.
Conference on Logic, Computability and Randomness,2004, September 20-24, Córdoba, Argentina.
Centennial Seminar on Kolmogorov Complexity and Applications,2003, April 27 - May 2, Schloss Dagstuhl, D-66687 Wadern, Saarbrücken, Germany.
Workshop on Computability and Randomness,2003, April 25-26, D-69120 Heidelberg, Germany.
TAI'01 5th Workshop on Algorithmic Information Theory,2001, September 24-26, Porquerolles, Hyères, France.
TAI'00, 4th Workshop on Algorithmic Information Theory,2000, June 8-9, Lille, France
TAI '99, Third Workshop on Algorithmic Information Theory,1999, May 20-21, LORIA Nancy, France
Journées françaises sur la complexité de Kolmogorov,1997, June 23 - 24, Université Charles de Gaulle, Lille 3, France
Descriptional Complexity: A Multidisciplinary Perspective,1993, May 3-7, Schloss Dagstuhl, D-66687 Wadern, Saarbrücken, Germany.
Miscellaneous
Annual Kolmogorov Lecture and Medal at RHUL
A. N. Kolmogorov resources (biography, publications, pictures, interviews, ...)
Ray Solomonoff ... Algorithmic Probability can serve as a kind of 'Gold Standard' for induction systems
Only math nerds would call 2500 finite Leonid Levinbers in the programs describe above are properly considered inputs or arguments.
The objection is correct insofar as Kolmogorov complexity conventionally deals with programs that produce output only, and thus would have their inputs or arguments incorporated in the program. The paper sketches how interactivity, arguments, and state might be accommodated in Kolmogorov complexity, but also notes that these are ultimately irrelevant -- complexity will remain non-computable. If you do not want to consider inputs, instead just consider "the complexity of the part that the programmer writes", i.e., measure the complexity of the given number, then that of the finished program, then subtract.
(This discussion suggests another ultimately irrelevant consideration: a programmer is assigned to add a bit of code to an existing large program. (S)he must understand the program before being able to add the code. Here the relevant complexity is the complexity of the added code plus some factor times the complexity of the existing code, where the factor indicates the relative difficulty of learning existing code versus producing new code. This does not affect our argument because there is no way of objectively estimating the complexity of either the existing or the new code...)
Complexity is not the only thing that should be considered, there are other requirements like modularity, maintainability, portability, readability which must be taken into account.
Certainly true, but this does not affect the argument one way or another. The argument says only that software estimates that depend (in part) on complexity cannot be objective. Some such estimates may also depend on other things (that may or may not be objectively estimable) but as long the estimate rests on one component that is subjective, the estimate itself is subjective.
"Software estimation needs common sense, not advanced mathematics." Certainly. The 'manufacturing' camp of software estimators (see Humphrey quote below) say or hint that software construction can be made into a repeatable, fairly boring process where projects are always on time and programmers are like factory workers. This may or may not be true (I don't think it is), but regardless: to make this view seem more science than philosophy some of these people have fallen into the trap of cloaking their estimating process with formal notation and claiming or hinting objectivity. This part is wrong.
On the contrary, [below and conclusion to the article]
Good estimation therefore requires experience and judgment. The software industry should value human experience, intuition, and wisdom rather than claiming false objectivity and promoting entirely impersonal "processes".
Incompleteness
Incompleteness in mathematicsGodel Incompleteness says that every sufficiently powerful system of mathematics will contain true statements that cannot be proven, and no system can prove its own consistency. Godel, Escher, Bach is a rather lengthy popular discussion of Godel incompleteness. Shorter popular introductions are found in several popular books by John Casti (Borders usually has one or more) and by Rudy Rucker (e.g. Mind Tools).
Perhaps the easiest "serious" introduction to Godel incompleteness is the last section of Rucker's Infinity and the Mind (Princeton U. Press, 1995). Mathematical incompleteness is difficult, however; it is much easier to show by establishing that theorem proving and computation are equivalent, and then showing incompleteness in computer science or complexity. This approach is outlined below.
Incompleteness in computer scienceTuring's Halting theorem is the well known example of incompleteness in computer science. Rice's theorem is a more general version: it says that there is no algorithm that can determine an extensional property of all programs. An `extensional property' is one that can be seen from the input-output behaviour of the program without examining its internal state. Examples are: whether a program ever stops running (halting problem), whether a program will crash (automatic debugging problem), whether a program will ever print the secret code '41', whether a (graphics) program will ever produce a pure black image, etc.
The flavor of a proof of halting by contradiction is easy: You say you have written a function which will tell if any program halts. I then write a program which calls your function with my program as input. If your function says my program will exit, I loop, otherwise I exit.
P:: if Halts(P) loop else halt;
A more realistic version considers programs P(x) that have variable input x. Form the program
P(B):: { if Halts(B(B)) loop else halt; } Now call P(B) with P itself passed for parameter B. The resulting program is
P(P):: { if Halts(P(P)) loop else halt; }
Turing's original proof was by diagonalization, similar to the Cantor diagonalization argument that the infinity of reals is bigger than the infinity of integers. The key to this proof is to recognize that programs can be enumerated. That is, fix a machine architecture, then every possible program is a bit pattern that can be interpreted as a large integer, and vice versa, every integer contains a bit pattern that you can try to execute as a program (though few will actually function). Turing's proof by diagonalization is:
Define F(N) as one plus the value of the Nth computable function applied to the number N, or zero if the Nth program does not halt. F cannot be computable because if it were it would be the Mth computable function, and then F(M) = F(M)+1. Because F(N) is straightforwardly defined, the ``flaw'' must be that halting cannot be determined.
These theorems do not say that it is impossible to determine the stated property for many particular problems, they just say that it is not possible to do so for every program. The next obvious question is, how many potential programs are there for which halting (or other extensional properties) cannot be determined? If the only program for which halting could not be determined was the one shown above that calls the halting test function in a paradoxical way, then incompleteness would be somewhat trivial. In fact this is not the case; there are (infinitely) many such exceptions.
In my opinion incompleteness is one of the consequences of allowing the concept of infinity -- infinity brings with it a number of problems and paradoxes. If an infinite number of programs are not allowed, the halting problem goes away... but not so fast: impossibility is merely replaced with intractability.
Intractable problems are those for which a solution is theoretically possible, but which in practice would require aeons of computer time, and a new processor won't begin to help. In fact in both computer science and Kolmogorov flavors of incompleteness, imposing resource limits results in intractable problems.
An example of an intractable problem is this one, which has a genetic programming feel:
Find the fastest x86 program that implements a particular video codec This problem has practical value -- one could code a "reference" version of the codec, and then do a GP style search to find its most optimized implementation. To make the problem easier, fix the input video and state, and find the shortest x86 program that produces the same output from this input and state that an existing codec does. So the search program is now easy: in enumeration order, generate a program, run it for as long as the regular codec takes to run (no longer), check its output if any, keep track of the fastest-running program so far found that produced the desired output.
The intractable nature of this will be obvious to many, but for the rest of you, let's look at the numbers. How big is the codec? Perhaps 100k bytes, but let's say it is only EIGHT bytes for now. There are 2^64 possible programs of 8 bytes in length. To test all these programs, buy a bunch of gigahertz machines -- four such machines can process roughly 2^32 (4 billion) instructions per second. If the test requires 10 seconds to run then four machines could crunch through 2^32 possibilities in 10 seconds. This would then take 2^32 * 10 seconds to search all the possibilities. Calculate this out, using a calculator such as Mathematica that isn't limited to 32-bit computation -- it's a long long time. Now consider that this is for the universe of toy 8-byte programs, whereas the desired codec is 100k bytes. For realistically sized problems, intractable may as well be impossible, and no conventional increase in compute power (including non-quantum parallel machines) will make a difference.
The paper uses "feasible" as a synonym for "tractable".
The book Neil Jones, Computability and Complexity from a Programming Perspective, MIT Press 1997 is a reasonable introduction to the computer science side of incompleteness.
Kolmogorov Complexity, and Incompleteness thereinKolmogorov complexity is also called algorithmic complexity and KCS (Kolmogorov, Chaitin, Solomonoff) complexity. The paper briefly describes this concept. The current "bible" of this field is Li and Vitanyi, Kolmogorov Complexity, Springer. It is dense but readable and suitable for home study.
Incompleteness in Kolmogorov complexity is simply that fact the complexity of a string is not computable; this is mentioned and used in the paper.
Incompleteness relationshipsMathematical incompleteness can be shown from computer science incompleteness, and vice versa, and likewise for complexity incompleteness. Some of the demonstrations are easy and require only accepting the notion that theorem proving can be formalized as computation (which is the essence of the Turing machine work).
Svozil (cited in the paper) lays out one correspondence between mathematics and computation,
axioms
<->
program input or initial state
rules of inference
<->
program interpreter
theorem(s)
<->
program output
derivation
<->
computationbut other correspondences are possible depending on the chosen formalization of "computer". Now to give examples of cross-field incompleteness arguments:
Turing implies GodelBecause there is no procedure for deciding which programs halt, there can be no proofs of which programs halt. In a mathematical system that is powerful enough to discuss programs (recursive functions), there are theorems ``program X will halt'' that are true but cannot be proven.
Complexity proof of Turing theoremAssume the function Halts(P). Write a program Q that looks at all programs up to N bits in size and finds the biggest number produced by any of these programs (that halt). Double this number and print it. The search program Q is approximately log_2 N in size but it printed a number larger than any program of up to size N. The way out of the contradiction is to conclude that Q cannot tell which programs halt.
Complexity proof of IncompletenessThe paper contains the Chaitin complexity proof that mathematical systems cannot prove most statements of the form ``the complexity of (some string) is greater than (some value)''.
Etc.There are many more similar results, including that the maximum run time of a program is not computable, Halting proves Church's theorem (the negative answer to Hilbert's problem, theorems are not decidable by computation), etc.
Claims of the software estimation industry
Technical bookstores have a couple feet of shelf space devoted to software engineering. Some of these books have overly optimistic claims or hints of "objective" estimation based on historical data somewhere in their introduction.
Here are a few samples:
(The original CMM Manifesto) The CMM for Software p.3,4:
"In an immature organization, there is no objective basis for judging product quality or for solving product or process problems" ... "[In a mature organization] There is an objective quantitative basis for judging product quality and analyzing problems with the product and process. Schedules and budgets are based on historical performance and are realistic"
``Software Quality Measurement: A Framework for Counting Problems and Defects'' CMU/SEI-TR-22, p.6:
``Consistent measurements provide data for doing the following:
Quantitatively expressing requirements, goals, and acceptance criteria.
Monitoring progress and anticipating problems.
Quantifying tradeoffs used in allocating resources.
Predicting the software attributes for schedules, cost, and quality.
G. G. Schulmeyer, ``Software Quality Lessons from the Quality Experts,'' in G. G. Schulmeyer and J.I. McManus, Eds., Handbook of Software Quality Assurance} (2nd ed.):
In the Certainty state [of quality management], the objective of software development and software quality management, producing quality software on time with a set cost everytime, is possible.
Course title: "Productivity Improvement through Defect-Free Development". Book title: Zero Defect Software, McGraw-Hill, New York, 1990.
etc...
Note that some of these statements do not quite say that they are going to do objective estimation. For example, both this quote
"[In a mature organization] There is an objective quantitative basis for judging product quality and analyzing problems with the product and process. Schedules and budgets are based on historical performance and are realistic" and this one
``Consistent measurements provide data for doing the following:
Quantitatively expressing requirements, goals, and acceptance criteria.
Monitoring progress and anticipating problems.
Quantifying tradeoffs used in allocating resources.
Predicting the software attributes for schedules, cost, and quality. stop just short of saying will do quantitative estimation based on historical data (though the reader might think otherwise if quickly reading these).
Software disastersGAO-93-13 on major software challenges: ``We have repeatedly reported on cost rising by millions of dollars, schedule delays of not months but years, and multi-billion-dollar systems that don't perform as envisioned.''
A few software "disasters" that I came across during the writing of this paper (1997-2000):
The IRS is making another stab at modernizing its aging computer system, after spending $3.3 billion on a failed upgrade over the past decade. Richard Stengel, ``An Overtaxed IRS,'' Time, April 7 1997 pp. 59-62.
The state of California was faced with scrapping an unfinished $300 million system designed to track child support payments. H. Jordan, ``Computer Computer Snafu May Cost State Millions'', San Jose Mercury News, April 1997.
Windows 97 slips one year, becomes Windows 97/98 (later renamed to Windows 98). And similarly with NT version 5, Copeland, Mac OS/X, etc.
An unpublished review of 17 major DOD software contracts found that the average 28-month schedule was missed by 20 months, and no project was on time.
Evidently the prize for software disasters goes to the FAA's Advanced Automation System (AAS), an attempt at an improved air traffic control system that ran into trouble in the early 90s after an estimated 6.5 billion was spent. The AAS has been described as as "one of the largest and most spectacular computing failures in the history of the field."
R.Britcher, one of the software engineers involved in the AAS project, describes it in the book The Limits of Software (Addison Wesley). Britcher himself characterizes the AAS as
"The greatest debacle in the history of organized work... we learned nothing from it" The book is thoughtful and is a good read.
Importantly, the AAS project did make use of software engineering best practices, such as code inspections, schedule estimates, etc. The software was estimated to be on schedule each and every month... until one month before it was to be delivered. Only at this point did it become evident that the project was hopelessly late.
Another relevant book is The Mythical Man Month. We've all heard of it; it is worth actually reading if you haven't done so yet.
Additional ClaimIn addition to the claims in the paper, there is one additional (somewhat weaker) statement that can be made:
Algorithmic complexity casts doubt on "proofs" of program correctness. This is contrary to the view that formal specifications and program proofs can guarantee program correctness.
The argument here is that the specification must be formal in order to participate in the proof. Since the specification fully specifies the behavior of the program, the complexity (C) of the specification must be at least approximately as large as the complexity of a minimal program for the task at hand. More specifically, given some input it must be possible to determine the desired behavior of the program from the specification. Using this, a small program can be written that, given some input, exhaustively queries the specification until the corresponding output bit pattern (if any) is determined; this bit pattern is then output, thereby simulating the desired program. Formally, C(specification) + C(query program) >= C(program). We are left with the obvious question: if the specification is approximately of the same order of complexity as the program, how can we know that the specification is correct?
This conclusion has been arrived at previously and is often raised when discussing program proofs. The complexity viewpoint strengthens and formalizes this conclusion, by making it clear that a previously intuitive quantity (complexity) can be formally defined and expressed in terms of a familiar measure (bits).
This argument, however, is not as strong as those in the paper. For one, the specification and the code are two different representations of the program, and it may be possible for a human to fully understand one without understanding the other; having both available may help in the understanding of both. Nevertheless, this argument suggests that if the program is far beyond our ability to comprehend it in its entirety, this will likely be true for the specification as well. In fact it appears that humans are only able to fully understand a "page or two" of code (whereas the arbitrary constants in KCS arguments mean that these arguments only apply to much larger programs).
(Note that this argument does not prohibit some particular and incomplete behaviors of a program from being verified. It also says nothing about the case where the "specifications" cannot be formally defined -- in this case one is also using a relaxed definition of "proof".)
Peter Naur's work is worth consulting if there is any doubt to the difficulty of knowing whether a specification is correct (he is the 'N' in BNF; the book Knowing and the Mystique of Logic and Rules, Kluwer, 1995 collects some of his publications). Naur has spent several decades studying how people deal with formal specifications and he has some disappointing comments, including
studies he conducted of problem solving using formal versus informal methods. People perform more poorly (catch fewer errors, etc.) when working with formal specifications.
he points to numerous instances of errors in published proofs of only a page or two in length. If peer-reviewed proofs this short still have errors, how can we know that a 100-page spec for an air traffic control system is correct?
Challenge ScenarioYou are a manager, and you believe that you can objectively estimate development schedules.
How long will it take your team to write a program to solve the Goldbach conjecture? (Idea from John Casti). By the equivalence of math and computation this is possible, and judging from other number theory proofs the program might be pretty short (but this is for you to estimate).
The Goldbach conjecture has been unsolved for a couple of centuries, despite a million-dollar prize that was recently offered for its solution.
So here's a perfectly well formed (no changing requirements) request to write what is probably a fairly simple program, but I don't think anyone can say how long it would take.
If you say, "well this sort of program is not realistic", I say, "it's only unrealistic BECAUSE it cannot be predicted". If things like this were realistic, programmers would be hired to solve important mathematics problems. The market certainly exists -- there are plenty of important problems waiting to be solved.
Conclusions: Disclaimer, Programming vs. Manufacturing, Ethics
Disclaimer, repeatedOnce again, I do not mean to say that Software Estimation, formal specifications, etc. are not useful! Regarding formal specifications, providing an alternate view of a program will help uncover errors that are not clear in it's original form (code). More generally, software estimation probably does help.
I am saying just this:
Claims of objective estimation are wrong. Wrong, and arguably irresponsible as well.
EthicsIrresponsible? A problematic scenario would be this: a software estimation consultant says that if you take his/her $10,000 course, you can transform your software group to produce bullet-proof software on time everytime. The consultant manages to convince the head of R&D at a car company that the software estimation claims are accurate. The car company then decides that it can save $50 per car by removing some manual override and relying entirely on computer control...
Unquestioning faith in software estimation is probably unlikely in the arena of car safety, but computers are everywhere and plenty of smaller scale incidents do happen. Peter Neumann has been tracking such incidents for years, see his risks archives.
Software development: like manufacturing, or like physics?
The CMM (Capability Maturity Model) and the similar ISO-900x got started in an understandable way: the goal was to apply the success and reliability found in manufacturing to software development. Indeed, the CMM was conceived by Watts Humphrey, who writes that industry process control concepts
``...are just as applicable to software as they are to automobiles, cameras, wristwatches, and steel.'' (``Characterizing the Software Process,'' IEEE Software 5(2), pp.~73-79)
Clearly this is well intended, but there is an obvious difference between manufacturing and software. In manufacturing, you make a chair, then another one, then 10,000 more. Then perhaps you switch to a different chair (but it's still a chair) and you make 10,000 of these.
By the nature of software you are never repeating something you did before. If what you need to do is the same as what you did, you can simply make a digital copy. As such, there is no opportunity for the repeatability and codification that are present in manufacturing. This point is better made by Bollinger (cited in the paper), who responds to Humphrey,
``The creation of genuinely new software has far more in common with developing a new theory of physics than it does with producing cars or watches on an assembly line.''
Software construction is an intrinsically creative activity and as such has inherent risks. As a broad consequence, software estimation is necessarily subjective. Good estimation therefore requires experience and judgment. The software industry should value human experience, intuition, and wisdom rather than claiming false objectivity and promoting entirely impersonal "processes". It should attend to the intrinsic uncertainties and risks of software development and where necessary promote the public discussion and honest assessment of these risks.
More Reader feedback
"Programming tools and libraries have to progress to the point where the coding we do is dull and mindless." Posted by slaytanic killer on kuro5hin.org Nov 5th, 2001.
"SDM adopting companies tend not to develop products that could not be trivially predicted." -Constantine Plotnikov, paraphrasing DeMarco & Lister Peopleware, second edition Chapter 29. pages 186-193 in paperback edition.
About the Author and this Paper
My career has been split between academic research, mostly in computer graphics, and commercial programming (again, mostly in the graphics industry). Software engineering is not my field -- how did I end up writing "Large Limits to Software Estimation"?
In 1996 I spent some time working on a notion of `visual complexity' at Interval Research in Palo Alto. As part of this effort, I studied Kolmogorov complexity. A year later I was employed at DreamWorks (the animation company), working on an early version of Shrek. The DreamWorks software staff exposed me to the Capability Maturity Model and similar software process management methods. Since these methods were unfamiliar, I read about them with interest... but something was bothering me. After a couple days I realized that the estimation claims could be seen from a Kolmogorov complexity viewpoint, and that there was a problem with a couple of the claims of these methods.
An early draft of the paper was released on the internet in 1997 and became a minor hit, judging from the email I receive. This version is currently cached at the citeseer repository. The version published in ACM is shortened from the internet draft but added the section on approximate estimation.
SOURCE: http://www.idiom.com/~zilla/Work/Softestim/softestim.html
12:44 PM Add a comment Send a message Permalink View trackbacks (0) Blog it Software Estimation
Kolmogorov complexity
SOURCE: http://en.wikipedia.org/wiki/Kolmogorov_complexity
In computer science, the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, algorithmic entropy, or program-size complexity) of an object such as a piece of text is a measure of the computational resources needed to specify the object. For example consider the following two strings of length 640101010101010101010101010101010101010101010101010101010101010101
1100100001100001110111101110110011111010010000100101011110010110
The first string admits a short English language description, namely "32 repetitions of '01'", which consists of 20 characters. The second one has no obvious simple description other than writing down the string itself, which has 64 characters.
This image illustrates part of the Mandelbrot set fractal. The size of the JPEG file encoding the bitmap of this image is more than 17 kilobytes (approximately 140000 bits). The same file can be generated by a computer program much shorter than 140000 bits, however. Thus, the Kolmogorov complexity of the JPEG file encoding the bitmap is much less than 140000.
More formally, the complexity of a string is the length of the string's shortest description in some fixed description language. The sensitivity of complexity relative to the choice of description language is discussed below. It can be shown that the Kolmogorov complexity of any string cannot be too much larger than the length of the string itself. Strings whose Kolmogorov complexity is small relative to the string's size are not considered to be complex. The notion of Kolmogorov complexity is surprisingly deep and can be used to state and prove impossibility results akin to Gödel's incompleteness theorem and Turing's halting problem.
12:37 PM Add a comment Send a message Permalink View trackbacks (0) Blog it Software Estimation
Algorithmic Information Theory
Source: http://www.hutter1.net/index.htm
News Future events, ...
Introduction What is AIT?
AIT Mailing List
Introductory Material Tutorials, Information...
Bibliography Books, papers...
People Homepages
Communication Mailing Lists
Events Conferences, Workshops...
Miscellaneous Courses, Info on A.K.
Andrei Kolmogorov The theory of probability ... can and should be developed from axioms ...
Entities should not be multiplied unnecessarily William of Occam
News
Conference on Logic, Computability and Randomness , 2007, January 10-13, Buenos Aires, Argentina.
Introduction / What is AIT?Algorithmic Information Theory (AIT) is a subfield of information theory and computer science (and statistics and recursion theory) that concerns itself with the relationship between computation, information, and randomness. The key concepts or subfields are: Algorithmic "Kolmogorov" Complexity (AC). Kolmogorov defined the complexity of a string x as the length of its shortest description p on a universal Turing machine U (K(x)=min{l(p):U(p)=x}). A string is simple if it can be described by a short program, like "the string of one million ones", and is complex if there is no such short description, like for a random string whose shortest description is specifying it bit-by-bit. Kolmogorov complexity is a key concept in (algorithmic) information theory. An important property of K is that it is nearly independent of the choice of U. Furthermore it leads to shorter codes than any other effective code. K shares many properties with Shannon's entropy (information measure) S, but K is superior to S in many respects. To be brief, K is an excellent universal complexity measure, suitable e.g. for quantifying Occam's razor. The major drawback of K as complexity measure is its incomputability. So in practical applications it has always to be approximated, e.g. by MML/MDL or Lempel-Ziv compression or others. Algorithmic "Solomonoff" Probability (AP). Solomonoff defined the closely related universal a priori probability M(x) as the probability that the output of a universal Turing machine U starts with x when provided with fair coin flips on the input tape. M can be used as a universal sequence predictor that outperforms (in a certain sense) all other predictors. Universal "Levin" Search (US). The major drawbacks of AC and AP are their incomputability. Time-bounded ``Levin'' complexity penalizes a slow program by adding the logarithm of its running time to its length. This leads to computable variants of AC and AP, and Universal ``Levin'' Search (US) that solves all inversion problems in optimal (apart from some multiplicative constant) time. Algorithmic "Martin-Loef" Randomness (AR). AC and AP also allow a formal and rigorous definition of randomness of individual strings that does not depend on physical or philosophical intuitions about nondeterminism or likelihood. Roughly, a string is algorithmically ``Martin-Loef'' random if it is incompressible in the sense that it's algorithmic complexity is equal to its length. Applications of AIT AC, AP, US, and AR are the core subdisciplines of AIT, but AIT spans into many other areas. It serves as the foundation of the Minimum Description Length (MDL) principle, can simplify proofs in computational complexity theory, has been used to define a universal similarity metric between objects, solves the Maxwell daemon problem, and many others.
AIT Mailing List The Algorithmic Information Theory (AIT) mailing list is a moderated mailing list intended for people in Information Theory, Computer Sciences, Statistics, Recursion Theory, and other areas or disciplines with interests in AIT. Researchers in these fields are encouraged to join the list and participate. You're invited to discuss and exchange ideas regarding any AIT or related aspect. Software, conferences, courses, jobs, and related announcements are also welcome. You can post, read, and reply messages solely on the Web. You can choose to receive messages as individual emails, daily summaries, daily full-text digest, or read them on the Web only.
The simplest way to join the Kolmogorov Complexity Solomonoff Induction Mailing List is to
just send an Email to mailto:kolmogorov-request@idsia.ch?subject=subscribe with subject subscribe [password],
to unsubscribe send an Email to mailto:kolmogorov-request@idsia.ch?subject=unsubscribe with subject unsubscribe [password].
Here are the old Archive (1999-2001) and the current Mailing List Archive (2002-now).
There is also a web interface for subscription (management) and a
Email based interface: Send an Email to mailto:kolmogorov-request@idsia.ch?subject=help with subject help.
To post a message to all the list members, send email to kolmogorov@idsia.ch
Introductory Online Material
Tutorials
Scholarpedia on Algorithmic Information Theory
Wikipedia on Kolmogorov complexity
Kolmogorov Complexity: Sources, Theory and Applications, by A.Gammerman and V.Vovk, The Computer Journal 42:4 (1999) 252-255
Introduction to Algorithmic Information Theory, by Nick Szabo.
Courses
Topics in Kolmogorov Complexity, Seminar 236804, by Ran El-Yaniv
Complexity of Algorithms, by Laszlo Lovasz
Lecture notes on Kolmogorov complexity (in German) , by J. Schmidhuber, TUM, Munich, Germany, 1994.
Information
Entropy in Information and Coding Theory, by Chris Hillman.
Predictive Complexity at CLRC
Minimum Description Length on the Web
Minimum Message Length and Minimum Encoding Length Inference, list of links by David Dowe
Complexity measures for complex systems and complex objects, by Pablo Funes.
Grupo de Pesquisa de Fundamentos da Matemática e da Computação Universidade Federal de Pelotas (UFPel) (in portuguese)
Bibliography
Books
An Introduction to Kolmogorov Complexity and Its Applications, Ming Li and Paul Vitanyi (1997)
Information and Randomness: An Algorithmic Perspective, Cristian Calude (1994&2002)
Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability, Marcus Hutter (2004)
Algorithmic Randomness and Complexity, Rod Downey and Denis Hirschfeldt (2007)
Elements of Information Theory, Thomas Cover and Joy Thomas (1991)
The Limits of Mathematics, Gregory Chaitin (1998&2003)
Kolmogorov Complexity and Computational Complexity, Osamu Watanabe (1992)
Magazines
Special Issue on Kolmogorov Complexity, The Computer Journal, Volume 42, Issue 4, 1999.
Special Issue on Kolmogorov Complexity, Theoretical Computer Science, Volume 271, Issue 1-2, B. Durand, January 2002
Special Issue on Kolmogorov Complexity, Theoretical Computer Science, vol 207, no 2, A.Semenov (1998)
Some Online Papers
M. Mueller, Stationary Algorithmic Probability, http://arXiv.org/abs/cs/0608095, 2006.
M. Hutter, On the Foundations of Universal Sequence Prediction, Proc. 3rd Annual Conference on Theory and Applications of Models of Computation (TAMC), LNCS3959, 408-420, 2006.
R. Cilibrasi and P.M.B. Vitanyi, Clustering by Compression, IEEE Trans. Information Theory, 51(4):1523--1545, 2005.
J. Schmidhuber, The New AI: General & Sound & Relevant for Physics In Artificial General Intelligence and http://arxiv.org/abs/cs.AI/0302012, 2003.
J. Lutz, The dimensions of individual strings and sequences, ACM Computing Research Repository, 31 pages, 2002.
M. Hutter, The fastest and shortest algorithm for all well-defined problems, International Journal of Foundations of Computer Science, 13:3 (2002) 431-443
P. Gacs, J. Tromp, P. Vitanyi, Algorithmic statistics, IEEE Trans. Inform. Theory, 47:6(2001), 2443-2463.
H. Bannai, A Theory of Discovery and Its Applications in Discovery Science", Master Thesis, Department of Information Science, Graduate School of Science, University of Tokyo, January 2000.
J. Schmidhuber. Algorithmic theories of everything. quant-ph/0011122, TR IDSIA-20-00, Version 1.0, IDSIA, Manno (Lugano), Switzerland, November 2000. http://arxiv.org/abs/quant-ph/0011122. HTML version .
M. Hutter Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions Proceedings of the 12th European Conference on Machine Learning (ECML-2001) 226-238
Chris Hillman, Kolmogorov complexity of generalized computably enumerable sets, preprint, also available in PDF.
H. Bannai and S. Miyano, A Definition of Discovery In Terms of Generalized Descriptional Complexity, Discovery Science 1999: 316-318
S. Hochreiter and J. Schmidhuber. Feature extraction through LOCOCODE. Neural Computation 11(3): 679-714, 1999
Martin Schmidt, Time-Bounded Kolmogorov Complexity May Help in Search for Extra Terrestrial Intelligence (SETI)
C. Wallace and D. Dowe, Minimum Message Length and Kolmogorov complexity, Comp. J., Vol 42, No. 4 (1999), pp270-283
John Tromp, A CL-based definition of Kolmogorov Complexity
Vladik Kreinovich and Luc Longpré, Human Visual Perception and Kolmogorov Complexity : Revisited
Vladik Kreinovich, Luc Longpre, and Misha Koshelev, Kolmogorov Complexity, Statistical Regularization of Inverse Problems, and Birkhoff's Formalization of Beauty
Sanjeev Subbaramu, Ann Q. Gates and Vladik Kreinovich, Applications of Kolmogorov Complexity to Image Compression,
Arthur De Vany, How Much Information is there in an Economic Organization and Why Can't Large Ones be Optimal?
Andrei Muchnik, Andrei Romashchenko, Alexander Shen and Nikolai Vereshchagin, Upper semi-lattice of binary strings with the relation "x is simple conditional to y". Abstract
M. Conte, et al. Genetic Programming Estimates of Kolmogorov Complexity. Proc. 7th Int. Conf. on GA, 743-750, 1997.
S. Hochreiter and J. Schmidhuber. Flat Minima. Neural Computation, 9(1):1-42, 1997.
J. Schmidhuber. Low-complexity art. Leonardo, Journal of the International Society for the Arts, Sciences, and Technology, 30(2):97-103, 1997.
Yongge Wang, Randomness and Complexity, PhD thesis, 1996
People
Research field abbreviations:Algorithmic Information Theory (AIT),Kolmogorov Complexity (KC),Algorithmic Probability (AP),Algorithmic Randomness (AR),Universal Levin Search (US),Artificial Intelligence (AI),Machine Learning (ML),Minimum Description Length (MDL),Minimum Message Length (MML),Computational Complexity (CC).
Eric Allender
Rutgers University, New Jersey, USA (AR)
Klaus Ambos-Spies
Universität Heidelberg, Germany (AR)
Eugene Asarin
Université Joseph Fourier, Grenoble, France
Luis Antunes
University of Porto, Portugal (AIT)
Maxim Babenko
Moscow State University, Russia (KC)
Janis Barzdins
University of Latvia (AIT)
Verónica Becher
University of Buenos Aires, Argentina (AR,KC)
Harry Buhrman
CWI and University of Amsterdam, The Netherlands (AR,KC)
Cristian S. Calude
University of Auckland, New Zealand (AR)
Gregory J. Chaitin
IBM Research, Yorktown Heights, NY, USA (AR,AI)
Alexey Chernov
Istituto Dalle Molle di Studi Sull'Intelligenza Artificiale (IDSIA), Lugano, Switzerland (KC,CC)
Thomas Cover
Stanford University, CA, USA (AIT,other)
David Dowe
Monash University, Melbourne, Australia (MML)
Rodney G. Downey
Victoria University, New Zealand (AR)
Bruno Durand
Université de Provence, Marseille, France
Allan Erskine
University College London, UK (ML,KC)
Santiago Figueira
University of Buenos Aires, Argentina (AR,KC)
Lance Fortnow
University of Chicago, USA (CC)
Péter Gács
Boston University, USA (AIT)
Alex Gammerman
Royal Holloway University, London, UK (AIT,ML)
Murray Gell-Mann
Santa Fe Institute, USA (AIT,other)
Peter Grünwald
CWI and Eindhoven University of Technology, The Netherlands (MDL)
Denis R. Hirschfeldt
University of Chicago, USA (AR)
John Hitchcock
University of Wyoming, Laramie, WY, USA (AIT,CC)
Achim Hoffmann
University of New South Wales, Australia (KC,Occam)
Günther Hotz
Universität Saarbrücken, Germany (AIT,CC)
Marcus Hutter
Istituto Dalle Molle di Studi Sull'Intelligenza Artificiale (IDSIA), Lugano, Switzerland (AIT,AP,AI,ML)
Yuri Kalnishkan
Royal Holloway University of London, UK (AIT,ML)
Bjorn Kjos-Hanssen
University of Connecticut, Storrs, CT, USA (AR)
Andrei Kolmogorov
Moscow, Russia (AIT)
Kevin Korb
Monash University, Melbourne, Australia (MDL)
Sophie Laplante
Université Paris-Sud, France (CC,KC,AR)
Troy Lee
CWI Amsterdam, The Netherlands (CC,KC)
Leonid A. Levin
Boston University, USA (AIT)
Ming Li
University of Waterloo, Canada (AIT)
Luc Longpré
University of Texas at El Paso, USA (KC,CC)
Jack Lutz
Iowa State University, USA (AR)
Jean-Yves Marion
LORIA Université Nancy, France (CC,KC)
Per Martin-Löf
University of Stockholm, Sweden (AR)
Elvira Mayordomo
University of Zaragossa, Spain (AR)
Nenad Mihailovic
Universität Heidelberg, Germany (AR)
Wolfgang Merkle
Universität Heidelberg, Germany (AR)
Philippe Moser
Iowa State University, USA (AR)
Andrej Muchnik
Institute of New Technologies, Moscow, Russia (AIT)
Andre Nies
University of Auckland, New Zealand (AR)
Jan Poland
Istituto Dalle Molle di Studi Sull'Intelligenza Artificiale (IDSIA), Lugano (AIT,ML)
Jan Reimann
Universität Heidelberg, Germany (KC,AR)
Jorma Rissanen
Almaden Research Center, San Jose, CA, USA (MDL)
Andrei Romashchenko
Institute of Problems of Information Transmission, Moscow, Russia (KC)
Boris Ryabko
Siberian State University of Telecommunication and Computer Science, Siberia (AR,KC)
Jürgen Schmidhuber
Istituto Dalle Molle di Studi Sull'Intelligenza Artificiale (IDSIA), Lugano, Switzerland (US,AIT,ML)
Claus-Peter Schnorr
Goethe-Universität Frankfurt, Germany (AR)
Robert Solovay
University of California, Berkeley, USA (AIT)
Alexander Kh. Shen
Institute of Problems of Information Transmission, Moscow, Russia (CC,AIT)
Theodor A. Slaman
University of California, Berkeley, USA (AR,CC)
Ray J. Solomonoff
Oxbridge Research, Cambridge, MA, USA (AP,ML,AI)
Ludwig Staiger
Universität Halle, Germany (AR)
Frank Stephan
Universität Heidelberg, Germany (AR)
Sebastiaan A. Terwijn
Technical University of Vienna, Austria (AR)
Leen Torenvliet
University of Amsterdam, The Netherlands
Boris A. Trakhtenbrot
Tel Aviv University, Israel
John Tromp
CWI Amsterdam, The Netherlands (CC,KC)
Maxim Ushakov
Moscow State University, Russia
Vladimir Uspensky
Moscow State University, Russia
Vladimir V'yugin
Institute of Problems of Information Transmission, Moscow, Russia (KC)
Michael Vyugin
Royal Holloway University, London, UK (KC,ML)
Nikolai Vereshchagin
Moscow State University, Russia (KC,CC)
Paul M. B. Vitanyi
CWI, Amsterdam, The Netherlands (KC,ML)
Volodya Vovk
Royal Holloway University, London, UK (KC,ML)
Chris Wallace
Monash University, Australia (MML)
Yongge Wang
University Heidelberg, Germany (CC,AR)
Osamu Watanabe
Tokyo Institute of Technology, Tokyo, Japan (CC,KC)
Events
Conference on Logic, Computability and Randomness,2007, January 10-13, Buenos Aires, Argentina.
Workshop on Effective Randomness,2006, August 7-11, ARCC, Palo Alto, California, USA.
Seminar on Kolmogorov Complexity and Applications,2006, January 29 - February 3, Schloss Dagstuhl, D-66687 Wadern, Saarbrücken, Germany.
Conference on Logic, Computability and Randomness,2004, September 20-24, Córdoba, Argentina.
Centennial Seminar on Kolmogorov Complexity and Applications,2003, April 27 - May 2, Schloss Dagstuhl, D-66687 Wadern, Saarbrücken, Germany.
Workshop on Computability and Randomness,2003, April 25-26, D-69120 Heidelberg, Germany.
TAI'01 5th Workshop on Algorithmic Information Theory,2001, September 24-26, Porquerolles, Hyères, France.
TAI'00, 4th Workshop on Algorithmic Information Theory,2000, June 8-9, Lille, France
TAI '99, Third Workshop on Algorithmic Information Theory,1999, May 20-21, LORIA Nancy, France
Journées françaises sur la complexité de Kolmogorov,1997, June 23 - 24, Université Charles de Gaulle, Lille 3, France
Descriptional Complexity: A Multidisciplinary Perspective,1993, May 3-7, Schloss Dagstuhl, D-66687 Wadern, Saarbrücken, Germany.
Miscellaneous
Annual Kolmogorov Lecture and Medal at RHUL
A. N. Kolmogorov resources (biography, publications, pictures, interviews, ...)
Ray Solomonoff ... Algorithmic Probability can serve as a kind of 'Gold Standard' for induction systems
Only math nerds would call 2500 finite Leonid Levin