In my last blog, I provided a little bit of background on our AI initiatives, which shifted into high-gear when I joined OutSystems towards the end of last year, coinciding with the launch of outsystems.ai. We also caught up on the Early Access Program (EAP) for customer-developers and the great insights we gleaned from that work around artificial intelligence and machine learning (AI/ML).
In this article, I want to talk through the unique research challenges presented by AI/ML, particularly when you’re dealing with code. Additionally, I’ll cover why our investment in internal research teams and collaborations with institutions like Carnegie Mellon University are so important to developing truly capable machine-learning-powered AI development assistants.
The State of Artificial Intelligence and Machine Learning in App Development Today
Creating AI/ML systems that understand code and further, can assist with the building of systems using code, is very much still an emerging research area. Arguably, it’s a much bigger challenge than computer vision or natural language processing (NLP). It is certainly trickier than classical structured data problems, such as credit default prediction or product recommendations. Its potential impact is also profound. If you can teach a machine to understand and produce code, it could effectively evolve itself, which is why this is one of the potential proposed paths toward Artificial General Intelligence and the Singularity.
But that is still several research breakthroughs away, which is partially why we framed our outsystems.ai master plan with an ambitious space exploration metaphor. We have a long-term plan to systematically and incrementally transform software development through the advancement of state-of-the-art AI/ML.
Let’s put this graphic in perspective. From a “utopian state of app dev transcendence thanks to AI” standpoint, as of right now, the industry as a whole is still orbiting the earth.
Simply put, “We have a long way to go.”
However, investment in AI/ML research has grown 10x in the past decade and 3x in the past 5 years. Computer vision, natural language processing, robotics, and information retrieval have taken the largest slices of this research area’s resources (both in time and money), which is why these areas of research are where you’re seeing the most visible advancements and resulting product improvements.
And even though we often see the resulting algorithms performing in products at a level higher than humans are capable of, the truth is we’re still not even close to a significantly better usable state.
In fact, the impacts of AI/ML on software development are still scarce; but, it’s an emerging research area we are betting will soon transform how people code.
Lets put it in perspective of other areas of application of Machine Learning and AI. Daniel Tarlow, a Researcher at Google Brain, presented the different maturity levels of AI/ML research areas. You can see in the diagram below, inspired by Tarlow’s version from mid-2018 and updated to today, that Machine Learning research and application depend not only on there being a lot of available data but also on the appearance and adoption of more ML-powered features in real-world use. The more data we have, the more research we can do, which in turns yields more real-world use-cases and better products. This creates a feedback loop that (in theory at least) keeps pushing the development of AI and machine learning towards bigger and better things. This is the idea behind the Virtuous Circle of AI, an idea presented and explained by Andrew Ng more fully here.
Machine Learning on Code is at a pivotal point where we start to have enough data to train large models but where the research is just now starting to produce the first real-world results (like OutSystems’ AI Assistant launched at last year’s NextStep). With the growing investment in the field and with the development of new techniques the upcoming 18 months are going to be of great innovation.
As in other areas, this will allow developers to do more with better quality results, allowing them to focus on higher-level tasks like delivering business value rather than performing repetitive grunt work and yak shaving. We believe that by embedding the expertise gleaned from millions of patterns, we will open up the field of app dev to more people, truly fulfilling the potential of the citizen developer. This exactly aligns with OutSystems mission of extreme agility and opening up development to skilled personnel outside of traditional development, such as business users and IT-minded technical professionals.
Why AI Research for Coding Is so Difficult
But why is research into AI and ML for code a challenge? What makes AI research for code different from research for natural language processing? I have identified five main differences:
- Reference sparsity: In code, elements and components can be referenced and used in places completely different, and distant, from where they were created and defined. In contrast, typical natural language text uses close relationships and context. Functions and variables can and will be used far away from where they were defined. In natural language, typically, the implied context is made explicit close to the sentences. For example, when you say, “She went to the store,” there is probably a sentence that identifies who “she” is nearby.
- Code is multi-relational: Code is not simply a sequence of words in a programming language. Code embodies complex relationships between concepts and data. Within code, you have dependencies between functions, data flows, class hierarchies, and the syntactic structure of the code itself (which is a tree or a graph, rather than a sequence). Natural language is much more straightforward. Looking at text sequentially will give you a good understanding of what is being said. The same applies to other ML use cases, which is why most of the successful AI research thus far has mostly dealt with fairly sequential data. But the world, like code, is filled with non-linear relationships. To date, most of the research on AI/ML has been devoted to linear relationships or modeling non-linear relationships in a quasi-linear way, and very little research has been performed on AI parsing seemingly unrelated data and using it. But this is something that is starting to shift in top research groups around the world, and we’re part of that wave.
- Extreme diversity: Code is inherently diverse. We used code to make sense of this diversity and, in doing so, create fit-to-purpose solutions (apps) that are specific to a company or a department. However, each team has its own conventions; it names things differently and even a single idea can be expressed in an infinite number of ways. Not to mention, variable and function names are super diverse. All of this variety puts an added challenge to AI and ML models, which have to deal with a lot of neologisms and out-of-vocabulary terms, as compared with natural language scenarios that deal with a more-or-less set vocabulary.
- It has to be right: Machine learning is inherently non-deterministic. Meaning, a sequence of logic can have different outcomes. These models give you results with associated probabilities, but not full certainty. Code is, on the other hand, 100 percent deterministic and guided by strict syntax and semantic rules. It reasons then, that in natural language if the text isn’t 100 percent syntactically correct, we can understand it. But machines need code to be 100 percent correct in order to process it. And anyway, developers expect that from an automated system. This raises the bar for any AI assistance.
- Professional developers expect a lot from their tools: People who code every day have very strong habits and are very efficient. That was reinforced to us in our EAP we talked about earlier. Introducing any kind of assistance into developers’ work habits and flows requires that the “help” be relevant, accurate, and perhaps most importantly, efficient. It can’t be buried in a menu tree or in any way require developers to alter their normal work habits to access and execute. In short, it has to add a lot of value before professional developers will even think about using it.
We are essentially demanding that new AI and ML tools and processes be better than the people already doing the same work.
All of these topics apply equally to regular code development as well as low-code.
Applying AI Research to Real-World Challenges
There is little value in devising AI tasks that replace simple use cases—or things we can do quickly and easily on our own. The cognitive and physical effort required to manually push a button to start your automobile is negligible. If you wanted to automate that, you could do it with sensors and switches. No, the real value in AI and ML is in augmenting the work both novices and pros alike perform in real-world, high-complexity, and high-diversity scenarios.
Thankfully, we have an amazing team working to create algorithms to manage these very challenges. Not only are we pushing algorithms to do what we need, but we’re also pushing the limits of our tools.
Recently, we contributed code to one of DeepMind's neural network libraries, improving the performance five-fold—which is critical for us, since we are dealing with massive datasets and complex network architectures. Since much of DeepMind’s work is in the public domain, our research is helping further their efforts towards solving some of the world’s most complicated challenges.The outsystems.ai mission is to accelerate true AI-assisted development. To do that, we’re not only investing in and actually performing research ourselves, we’re also forming extensive partnerships with universities around the world such as Carnegie Mellon University and Instituto Superior Técnico.
We’re currently expanding these partnerships with large projects in the area of AI-assisted development and AI-powered code and app generation. These are deep research projects that will drive us forward and closer to our goals, which you’ll see as new functionality embedded in our platform in the months and years to come. In total, these efforts are partly why industry analysts, like Gartner in their research: Top 10 Strategic Technology Trends for 2019: AI-Driven Development, highlighted us as one of the few companies adopting AI/ML for augmented development.
This is a long-term effort and you’ll see us embedding true AI-powered capabilities to make you increasingly more productive over time. Meanwhile, our research team has also been sharing technical progress at conferences and we’ll start publishing more of this technical content soon to our outsystems.ai page.
What’s Next for Low-Code and AI?
As with any research, our work in AI and ML for application development has a set of milestones and goals. But the reality is, no one knows what’s ultimately possible in the years and decades to come. OutSystems CEO, Paulo Rosado offered a glimpse into his thoughts on the ultimate (or maybe it’s the penultimate) goal for AI and ML: software that writes itself. It’s a lofty, but obtainable goal.
In the meantime, we continue to invest in and perform our own research and development into leveraging machine learning and AI to help throughout the software development lifecycle. We’re one of the few dedicated teams in the world doing this and we’re breaking new ground every day.
Stay tuned for Part 3 in which we cover how our vision and strategy for AI has evolved and how we see it impacting the software development lifecycle. Soon, we will be able to help you integrate sophisticated predictive capabilities in your own apps with zero data science knowledge. Some very exciting things are coming down the pipeline soon.