There are many reasons why developers dive into the world of Open Source contributions.
Contributing can be a way to give back to the community and use your skills for the greater good. It's a fantastic environment that allows you to network with talented developers, build relationships, and potentially find mentors or collaborators. For those seeking career advancement, contributions become a public portfolio showcasing your skills and experience. Sometimes, it's about a personal itch! You might encounter a bug or limitation in a project you rely on, and contributing a fix not only solves your own frustration but benefits everyone who uses that software.
The recipe for a successful Open Source contribution goes beyond just code. A strong desire to learn fuels the journey, as you navigate unfamiliar codebases and tackle new challenges. But learning flourishes best within a supportive environment.
A responsive community acts as a safety net, offering guidance and feedback to help you refine your contributions. Ideally, you can also "dogfood" the tool you contribute to, using it in your own work or personal projects. This firsthand experience provides valuable context for your contributions and ensures they address real-world needs. With these elements in place, you're well on your way to making a lasting impact on the Open Source community.
While I've been building my software development skills for a while now, Rust still feels like a whole new world to explore. This "newbie" feeling might make some shy away from contribution, fearing their code won't be good enough. However, I use silly mistakes as stepping stones to improve my skills, not a reason to feel discouraged.
Over a year of contributing to GreptimeDB has been a rewarding journey filled with learning experiences. Today, I'd like to walk you through one of those. Let's get our hands dirty (or should I say, claws? 🦀)
Motivation
This time we will be enhancing the Interval syntax by allowing a shortened version:
select interval '1h';
SQL standard defines syntax like:
select interval '1 hour';
Now, we would like both query syntax work, i.e., select interval '1 hour'
and select interval '1h'
should give you the same result.
Diving right into the code, I discovered that the core functionality for handling transformations already exists. The scope of a change boils down to adding a new rule specifically for the Interval data type: intervals with abbreviated time formats will be automatically expanded to their full versions. Let's take a closer look at a specific section of the code that does the logic:
fn visit_expr(&self, expr: &mut Expr) -> ControlFlow<()> {
match expr {
Expr::Interval(interval) => match *interval.value.clone() {
Expr::Value(Value::SingleQuotedString(value)) => {
if let Some(data) = expand_interval_name(&value) {
*expr = create_interval_with_expanded_name(
interval,
single_quoted_string_expr(data),
);
}
}
Expr::BinaryOp { left, op, right } => match *left {
Expr::Value(Value::SingleQuotedString(value))=> {
if let Some(data) = expand_interval_name(&value) {
let new_value = Box::new(Expr::BinaryOp {
left: single_quoted_string_expr(data),
op,
right,
});
*expr = create_interval_with_expanded_name(interval, new_value);
}
// ...
Code Review
An experienced Rust developer, Dennis, quickly identified an area for improvement. This is what exactly happened:
Code review shines as a learning tool.
Beyond simply accepting the suggested change (though the rationale for "efficiency" was clear!), I decided to take a deep dive. By analyzing the proposed improvement and attempting to explain it myself, I solidified my understanding of the code and best practices in Rust.
Avoiding unnecessary cloning and ownership transfers.
Originally, I used .clone()
on interval.value
:
match *interval.value.clone() { ... }
Cloning here creates a new instance of the data each time, which can be inefficient if the data structure is large or cloning is expensive. The suggested version avoids this by using references:
match &*interval.value { ... }
Matching on a reference (&*interval.value
) eliminates the cost of cloning. The same improvement is applied to the match on left in the binary operation case:
match &**left { ... }
This one is slightly more involved: It uses a double dereference to get a reference to the value inside a Box
, which is more efficient than cloning.
Clearer Pattern Matching
Using references in pattern matching makes emphasise the intention of only reading data, not transferring ownership:
match &*interval.value { ... }
This shows explicitly that the matched value is not being moved. It help with reasoning about the code, especially in a context with complex ownership and borrowing rules.
Reduced Cloning inside the Binary Operation
In the original code, the op
and right
fields of the Expr::BinaryOp
variant are cloned unconditionally:
let new_value = Box::new(Expr::BinaryOp {
left: single_quoted_string_expr(data),
op,
right,
});
However, they only need to be cloned if the left field is an Expr::Value
variant with a string value. The suggested enhancement moves the cloning inside the if let
block, so it only happens when necessary:
let new_value = Box::new(Expr::BinaryOp {
left: single_quoted_string_expr(data),
op: op.clone(),
right: right.clone(),
});
Using references instead of cloning
In the original code, expand_interval_name(&value)
is used, which borrows value.
However, value is of type String
, which implements the Deref<Target=str>
trait (more information could be found here). This means that it can be automatically dereferenced to a &str
when necessary. The improved version uses expand_interval_name(value)
, which avoids the need for explicit reference.
Summary
So in the context of this change, "efficiency" stands for:
- Avoiding unnecessary cloning, reducing overhead.
- Making the borrowing and ownership patterns clearer and safer.
- Enhancing overall readability and maintainability.
This is how the visit_expr
function looks like after all suggestions have been applied:
fn visit_expr(&self, expr: &mut Expr) -> ControlFlow<()> {
match expr {
Expr::Interval(interval) => match &*interval.value {
Expr::Value(Value::SingleQuotedString(value))
| Expr::Value(Value::DoubleQuotedString(value)) => {
if let Some(normalized_name) = normalize_interval_name(value) {
*expr = update_existing_interval_with_value(
interval,
single_quoted_string_expr(normalized_name),
);
}
}
Expr::BinaryOp { left, op, right } => match &**left {
Expr::Value(Value::SingleQuotedString(value))
| Expr::Value(Value::DoubleQuotedString(value)) => {
if let Some(normalized_name) = normalize_interval_name(value) {
let new_expr_value = Box::new(Expr::BinaryOp {
left: single_quoted_string_expr(normalized_name),
op: op.clone(),
right: right.clone(),
});
*expr = update_existing_interval_with_value(interval, new_expr_value);
}
}
_ => {}
},
_ => {}
},
// ...
Open Source contribution has been an incredible way to accelerate my Rust learning curve. This contribution to GreptimeDB is just one example that illustrates how I gain knowledge through open source contributions. And depending on your feedback, I'm excited to share more of these learning experiences in future posts!
A huge thanks to the entire Greptime team, especially Dennis, for their support and guidance throughout my contributions. Let's keep the contributing and learning going!